0
$\begingroup$

I am studying HMM from "Fundamentals of Speech Recognition" by Rabiner. Regarding the problem of how to adjust the parameters of a HMM, the proposed method was Baum Welch method (Expectation-Maximization).

The text tells to maximize Baum's auxiliary function $Q(\lambda',\lambda)$ over $\lambda$, where $\lambda'$ is the re-estimated model parameters. $O$ is the given observation sequence. $q$ is a sequence of states.

$$Q(\lambda',\lambda)=\sum_{q}^{} P(O,q|\lambda').log P(O,q,|\lambda)$$

What is the intuition behind formulating such a formula?

$\endgroup$
2
  • $\begingroup$ I am not familiar with HMM or Baum Welch method, but on the first glance it looks like Entropy $\endgroup$ Commented Feb 1, 2024 at 7:58
  • $\begingroup$ Thanks a lot! Adian Liusie has a YT video that explains the intuition behind cross entropy. With that help, I see that cross entropy is derivable from KL divergence. Now, KL divergence measures the "distance" between two distributions. Baum's auxiliary function also has two distributions. One actually moves towards "true" probability with each iterations adjusting the HMM model parameters, another one is "predicted" probability but not yet optimal/close to truth. Instead of minimizing (as in the case of cross entropy loss), it makes total sense as to why we maximize the given function. $\endgroup$ Commented Feb 1, 2024 at 15:14

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.