I am studying HMM from "Fundamentals of Speech Recognition" by Rabiner. Regarding the problem of how to adjust the parameters of a HMM, the proposed method was Baum Welch method (Expectation-Maximization).
The text tells to maximize Baum's auxiliary function $Q(\lambda',\lambda)$ over $\lambda$, where $\lambda'$ is the re-estimated model parameters. $O$ is the given observation sequence. $q$ is a sequence of states.
$$Q(\lambda',\lambda)=\sum_{q}^{} P(O,q|\lambda').log P(O,q,|\lambda)$$
What is the intuition behind formulating such a formula?