In Expectation-Maximization, in the maximization step, do we maximize expectation of the log likelihood (wikipedia) or evidence lower bound (cs 229)?

Question

From cs 229 page 6:

Intuitively, the EM algorithm alternatively updates Q and θ by a) setting Q(z) = p(z|x; θ) following Equation (8) so that ELBO(x; Q, θ) = log p(x; θ) for x and the current θ, and b) maximizing ELBO(x; Q, θ) w.r.t θ while fixing the choice of Q.

i.e. to say, ELBO is $$\Sigma_{z}Q(z)\log[\frac{p(X,Z;\theta)}{Q(Z)}]$$

where the Q(z) is set equal to the posterior of z given x in the expectation step. This choice of $Q(Z)$ brings the ELBO closest to the evidence ($P(X,Z;\theta)$) $$Q(Z) = p(Z/X;\theta)$$

From wikipedia:

Which one is correct? There is a $Q(Z)$ difference in the denominator. If they are both the same, how?

Camille Gontier · Accepted Answer · 2024-07-10 14:01:37Z

Both are correct. The $Q(Z)$ in the denominator of the first expression does not depend on $\theta$, and thus can be discarded from the optimization problem ($argmax_{\theta}$), hence obtaining the same expression as in the Wikipedia article.

EDIT: More formally, $Q(Z)$ does not depend on the value of $\theta$ over which we optimize in the M-step. The function we define in the E-step is the following:

$$ \sum_Z Q(Z) \log \frac{p(X,Z|\theta)}{Q(Z)} $$

with $Q(Z) = p(Z|X,\theta^t)$.

$\theta^t$ is the optimal value of $\theta$ computed at the previous iteration, and is held constant during the current M-step, in which we optimize over $\theta$ (and not $\theta^t$), which only appears at the numerator of the above expression.

I think it does. We set it equal to the posterior in the expectation step which depends upon $\theta$. If the posterior is intractable then we get into the entire business with variational inference — figs_and_nuts
– figs_and_nuts, Commented Jul 9, 2024 at 4:24
I clarified the difference between $\theta$ and $\theta^t$ in my answer. — Camille Gontier
– Camille Gontier, Commented Jul 10, 2024 at 14:01

Stack Exchange Network

In Expectation-Maximization, in the maximization step, do we maximize expectation of the log likelihood (wikipedia) or evidence lower bound (cs 229)?

1 Answer 1

Hot Network Questions

In Expectation-Maximization, in the maximization step, do we maximize expectation of the log likelihood (wikipedia) or evidence lower bound (cs 229)?

1 Answer 1

Related

Hot Network Questions