Smallest NFA accepting concatenations of two words of the length $k$ which are different at all positions

Question

Let $k\in \mathbb N$

I'm looking for a small NFA build for the language of concatenation of two words of the length $k$ which are index-wise different, i.e. $$L_k=\{u\cdot v \in \Sigma^* : |u|=|v|=k\wedge \forall i, u_i\neq v_i\}$$

$%$

The trivial DFA for the language contains $k$$|\Sigma| \choose k$$+1$ states and just "remembers" which letters it have seen during the first $k$ letters, however if $k=o(|\Sigma|)$, we can create a significantly smaller NFA.

$%$

A "simple" NFA for it would be of size $O^*(2^{2k})$ (more precisely, $O(k^2 \log |\Sigma| 2^{2k+O(\log^2 k)})$ ):

$%$

Take a $(|\Sigma|,2k)$-universal set (i.e. a set of vectors $\mathcal V\subseteq \{0,1\}^{|\Sigma|}$ such that for every vector $u\in \{0,1\}^2k$ and a $2k$ indices, there exists a vector $v\in \mathcal V$ such that $v_{|I}=u$, i.e. if we look only at these $k$ indices in $v$ we find $u$). Such families of size $O^*(2^{2k})$ are known.

We think of each vector $v$ as a function $\Sigma\to \{0,1\}$, where $v(\sigma_i)=1⟺v_i=1$.

$%$

Create an NFA as follows:
1. From the starting state $q_0$ create an $\epsilon$-transition to a new state for any $v\in \mathcal V$. Denote this state by $q_v$.
2. From each $q_v$ build a path of states which accepts all words whose first $k$ letters are mapped by $v$ to 0, and the later $k$ letters are mapped by $v$ to 1.

$%$

Basically, the idea is that the universal set allowes us to partition the letters which are allowed to appear in the first $k$ symbols from the rest, and we get every word in the language since there has to be a corresponding vector $v\in \mathcal V$ which partitions it right.

So the questions are:

What is the size of the smallest NFA for $L$?

Is this construction optimal?

What lower bound we can prove for such automaton size?

Thanks @vzn. Indeed, had $DFA\to NFA$ minimization been poly-time solvable, this question becomes useless. Unfortunately, as it's PSPACE-complete, we have to work hard to build small NFAs for interesting languages. — R B
– R B, Commented Jul 16, 2014 at 15:39
why do you say it is "useless" if it is in PTime? to me that would only mean there is an efficient algorithm for it. anyway this seems rather abstract/ theoretical (verging on research/ Theoretical Computer Science), does it have more bkg/ motivation/ application? — vzn
– vzn, Commented Jul 16, 2014 at 17:34
@vzn - this language is a special case of a language I use for developing parameterized algorithms for a family of packing problems. It uses an automaton for the language, in addition to constraint from the input and checks if it's language is empty. I can't expand too much on the usage (as it's still a work in progress), but the letter difference basically ensures that every items isn't packed "too many times". I've simplified the language as much as possible, but I believe that any technique used to build NFA for $L$ would help me improve my construction for the actual language I'm using. — R B
– R B, Commented Jul 16, 2014 at 21:37

D.W. · Accepted Answer · 2014-07-16 22:58:01Z

Update: this doesn't answer the question that the original poster was looking for, and doesn't help with the general case where $|\Sigma|>2$, so the question remains open.

Here is a simpler construction, showing that in the case where $\Sigma=\{0,1\}$, $O(k \cdot 2^{2k})$ states suffice, and in fact you can recognize the language with a DFA with that many states.

Let's start by looking at the complement language:

$$\overline{L_k} = \{u \cdot v \in \Sigma^* : |u| = |v| = k, \exists i . u_i = v_i\}.$$

Notice that this can be recognized by a NFA with $2k^2 |\Sigma|$ states. The NFA first guesses $i$ and a symbol $x$, then accepts all strings $u \cdot v$ where $u_i = v_i = x$.

Convert this to a DFA using the standard subset construction. We get a DFA with $\le 2^{|\Sigma| k + 1} k$ states. Now we can compute its complement, which is another DFA with the same number of states that recognizes $L_k$. Since it is a DFA, it is in turn automatically a NFA, too.

This beats your construction for the case where $|\Sigma|=2$, but otherwise leads a NFA that's larger than your scheme. However, I thought it might be of interest, both because it is so simple, and because it in fact provides a DFA (not just a NFA) to recognize your language $L_k$.

Thanks @D.W. for the answer. It is a nice construction, but in my application $k<<|\Sigma|$. — R B
– R B, Commented Jul 16, 2014 at 16:49
@RB, OK, no problem! Your construction is nicer. You might want to edit your question to mention that in your application $k \ll |\Sigma|$. — D.W.
– D.W. ♦, Commented Jul 16, 2014 at 17:11

Stack Exchange Network

Smallest NFA accepting concatenations of two words of the length $k$ which are different at all positions

1 Answer 1

Hot Network Questions

Smallest NFA accepting concatenations of two words of the length $k$ which are different at all positions

1 Answer 1

Related

Hot Network Questions