I need to know if there's a dynamic programming algorithm that returns all common subsequences between 2 sequences not just the longest one.
Thank you.
I need to know if there's a dynamic programming algorithm that returns all common subsequences between 2 sequences not just the longest one.
Thank you.
Denote the two strings by $s = s_1,\ldots, s_n$ and $t = t_1,\ldots, t_m$. Let $\mathcal{U}(i,j)$ denote the multiset of common subsequences of $s_1,\ldots,s_i$ and $t_1,\ldots,t_j$ which contain $s_i$ and $t_j$. Let $\mathcal{U}(\leq i,j)$ be the union of $\mathcal{U}(I,j)$ over all $i \leq I$, and define $\mathcal{U}(i,\leq j),\mathcal{U}(\leq i, \leq j)$ similarly. These multisets obey the recurrences: $$ \mathcal{U}(i,j) = \begin{cases} \emptyset & \text{if } s_i \neq t_j, \\ \{ w s_i : w \in \mathcal{U}(\leq i-1,\leq j-1) \} & \text{if } s_i = t_j. \end{cases} \\ \mathcal{U}(\leq i,j) = \mathcal{U}(\leq i-1,j) \cup \mathcal{U}(i,j) \\ \mathcal{U}(i,\leq j) = \mathcal{U}(i,\leq j-1) \cup \mathcal{U}(i,j) \\ \mathcal{U}(\leq i,\leq j) = \mathcal{U}(\leq i-1,\leq j-1) \cup \mathcal{U}(i,\leq j-1) \cup \mathcal{U}(\leq i-1,j) \cup \mathcal{U}(i,j) $$ with the base cases $\mathcal{U}(0,0) = \{\epsilon\}$ and $\mathcal{U}(k,0) = \mathcal{U}(0,k)$ for $k > 0$.
If you want you can even keep track of the indices from which the subsequences are taken. The recurrences above ensure that each pair of matching subsequences is counted exactly once.
If you want instead to count the number of common subsequences, all you need to do is to keep track of the size of the multisets involved. Similarly, if you want instead to determine the longest common subsequence, all you need to do is keep track of that. In the latter case, double counting of subsequences is allowed (since we are only interested in the maximum length anyhow), and this simplifies the recurrence.
Indeed, if you are interested just in the set of common subsequences, then you can use a simpler recurrence. Let $\mathcal{V}(i,j)$ denote the set of common subsequences of $s_1,\ldots,s_i$ and $t_1,\ldots,t_j$. The sets $\mathcal{V}(i,j)$ obey the recurrence: $$ \mathcal{V}(i,j) = \mathcal{V}(i,j-1) \cup \mathcal{V}(i-1,j) \cup \begin{cases} \emptyset & \text{if } s_i \neq t_j, \\ \{ws_i : w \in \mathcal{V}(i-1,j-1)\} & \text{if } s_i = t_j, \end{cases} $$ with base cases $\mathcal{V}(i,0) = \mathcal{V}(0,j) = \{\epsilon\}$. This is the form of the standard recurrence for longest common subsequence.
If you're interested in multisets but prefer a simpler recurrence, you can use the geekstogeeks idea, which requires multiset subtraction to avoid double counting. Let $\mathcal{W}(i,j)$ denote the multiset of common subsequences of $s_1,\ldots,s_i$ and $t_1,\ldots,t_j$, which obeys the following recursion: $$ \mathcal{W}(i,j) = \mathcal{W}(i,j-1) \cup (\mathcal{W}(i-1,j) \setminus \mathcal{W}(i-1,j-1)) \cup \begin{cases} \emptyset & \text{if } s_i \neq t_j, \\ \{ws_i : w \in \mathcal{W}(i-1,j-1)\} & \text{if } s_i = t_j, \end{cases} $$ with base cases $\mathcal{W}(i,0) = \mathcal{W}(0,j) = \{\epsilon\}$. Depending on the application, the operation of multiset subtraction could be unavailable, and in this case you can use the idea outlined at the beginning of the answer.