Context:
Consider $M$ finite state systems with evolution given by: $$ x^i_{k+1} = f(x_k^i,u_k) $$ where $x_k^i\in\{1,\dots,X\}$ is the state of system $i\in\{1,\dots,M\}$, $k\geq 0$, and $u_k\in\{1,\dots, U\}$ is some input which we can use at time $k$ to control the evolution of all systems (notice that $u_k$ is shared across the $M$ systems). In addition, $f:\{1,\dots,X\}\times\{1,\dots,U\}\to\{1,\dots,X\}$ is the same for all systems as well.
The goal is to choose the correct sequence $u_0,\dots,u_K$ to minimize $$ J = \sum_{i=1}^M J_i, \ \ \ \ J_i = \sum_{k=0}^K g(x^i_k, u_k) $$ where $K>0$ is some maximum time to be considered, and given some initial states $x_0^1,\dots,x_0^M$.
If $M=1$, I have observed that this is basically a deterministic finite state system optimal problem, which can be solved using dynamic programming. The solution is to construct a transition graph of how $x_k$ evolves in time according to the decisions $u_k$ and vertices weighed by $g(x_k,u_k)$. Then, one can find the shortest path in such transition graph, starting from $x_0$. This can be solved in polynomial time in $K, X, U$. (I am following "Dynamic programming and optimal control", Ch 2 from D. Bertsekas for this part)
Issue:
For general $M$, one can stack all state vectors in $x_k = [x_k^1,\dots,x_k^M]^T\in \{1,\dots,X\}^M$ which result in a deterministic finite state system as well. However, now the amount of states is $X^M$. Thus, if we attempt to use the previous approach using dynamic programming, the most we can achieve is polynomial time in $X^M$ which is not polynomial in $M$.
Question:
Is there a way to solve this problem, resulting in a polynomial time in $M$?
I'm not sure this is possible. However, the fact that the systems do not interact with each other (other than sharing the input $u_k$) makes me think there is something smarter we can do other than stacking together the state of all systems.