Can I find all the common subsequences between 2 sequences by using dynamic programming?

Question

I need to know if there's a dynamic programming algorithm that returns all common subsequences between 2 sequences not just the longest one.

Thank you.

I think there are $O(2^n)$ possible common subsequences.Consider the case where we have identical strings $s=t=abcdefg...xyz$, there are $2^{26}-1$ common subsequences. — Throckmorton
– Throckmorton, Commented Apr 5, 2020 at 4:16
However you can probably count the number of common subsequences using dynamic programming. — Throckmorton
– Throckmorton, Commented Apr 5, 2020 at 5:07
@YuvalFilmus Could you please help me find it ? I found this algorithm that returns the number of common subsequences, but what I need is to return the actual strings... geeksforgeeks.org/count-common-subsequence-in-two-strings — user118909
– user118909, Commented Apr 5, 2020 at 7:20
There could be many common subsequences. However, it shouldn't be difficult to modify the algorithm. It's a good exercise for you. — Yuval Filmus
– Yuval Filmus, Commented Apr 5, 2020 at 9:55

Yuval Filmus · Accepted Answer · 2020-04-06 08:15:52Z

Denote the two strings by $s = s_1,\ldots, s_n$ and $t = t_1,\ldots, t_m$. Let $\mathcal{U}(i,j)$ denote the multiset of common subsequences of $s_1,\ldots,s_i$ and $t_1,\ldots,t_j$ which contain $s_i$ and $t_j$. Let $\mathcal{U}(\leq i,j)$ be the union of $\mathcal{U}(I,j)$ over all $i \leq I$, and define $\mathcal{U}(i,\leq j),\mathcal{U}(\leq i, \leq j)$ similarly. These multisets obey the recurrences: $$ \mathcal{U}(i,j) = \begin{cases} \emptyset & \text{if } s_i \neq t_j, \\ \{ w s_i : w \in \mathcal{U}(\leq i-1,\leq j-1) \} & \text{if } s_i = t_j. \end{cases} \\ \mathcal{U}(\leq i,j) = \mathcal{U}(\leq i-1,j) \cup \mathcal{U}(i,j) \\ \mathcal{U}(i,\leq j) = \mathcal{U}(i,\leq j-1) \cup \mathcal{U}(i,j) \\ \mathcal{U}(\leq i,\leq j) = \mathcal{U}(\leq i-1,\leq j-1) \cup \mathcal{U}(i,\leq j-1) \cup \mathcal{U}(\leq i-1,j) \cup \mathcal{U}(i,j) $$ with the base cases $\mathcal{U}(0,0) = \{\epsilon\}$ and $\mathcal{U}(k,0) = \mathcal{U}(0,k)$ for $k > 0$.

If you want you can even keep track of the indices from which the subsequences are taken. The recurrences above ensure that each pair of matching subsequences is counted exactly once.

If you want instead to count the number of common subsequences, all you need to do is to keep track of the size of the multisets involved. Similarly, if you want instead to determine the longest common subsequence, all you need to do is keep track of that. In the latter case, double counting of subsequences is allowed (since we are only interested in the maximum length anyhow), and this simplifies the recurrence.

Indeed, if you are interested just in the set of common subsequences, then you can use a simpler recurrence. Let $\mathcal{V}(i,j)$ denote the set of common subsequences of $s_1,\ldots,s_i$ and $t_1,\ldots,t_j$. The sets $\mathcal{V}(i,j)$ obey the recurrence: $$ \mathcal{V}(i,j) = \mathcal{V}(i,j-1) \cup \mathcal{V}(i-1,j) \cup \begin{cases} \emptyset & \text{if } s_i \neq t_j, \\ \{ws_i : w \in \mathcal{V}(i-1,j-1)\} & \text{if } s_i = t_j, \end{cases} $$ with base cases $\mathcal{V}(i,0) = \mathcal{V}(0,j) = \{\epsilon\}$. This is the form of the standard recurrence for longest common subsequence.

If you're interested in multisets but prefer a simpler recurrence, you can use the geekstogeeks idea, which requires multiset subtraction to avoid double counting. Let $\mathcal{W}(i,j)$ denote the multiset of common subsequences of $s_1,\ldots,s_i$ and $t_1,\ldots,t_j$, which obeys the following recursion: $$ \mathcal{W}(i,j) = \mathcal{W}(i,j-1) \cup (\mathcal{W}(i-1,j) \setminus \mathcal{W}(i-1,j-1)) \cup \begin{cases} \emptyset & \text{if } s_i \neq t_j, \\ \{ws_i : w \in \mathcal{W}(i-1,j-1)\} & \text{if } s_i = t_j, \end{cases} $$ with base cases $\mathcal{W}(i,0) = \mathcal{W}(0,j) = \{\epsilon\}$. Depending on the application, the operation of multiset subtraction could be unavailable, and in this case you can use the idea outlined at the beginning of the answer.

The notations are so neat. Unfortunately, I cannot vote twice. — 喜欢算法和数学
– 喜欢算法和数学, Commented Apr 9, 2020 at 0:07

Stack Exchange Network

Can I find all the common subsequences between 2 sequences by using dynamic programming?

1 Answer 1

Hot Network Questions

Can I find all the common subsequences between 2 sequences by using dynamic programming?

1 Answer 1

Related

Hot Network Questions