3
$\begingroup$

From cs 229 page 6:

Intuitively, the EM algorithm alternatively updates Q and θ by a) setting Q(z) = p(z|x; θ) following Equation (8) so that ELBO(x; Q, θ) = log p(x; θ) for x and the current θ, and b) maximizing ELBO(x; Q, θ) w.r.t θ while fixing the choice of Q.

i.e. to say, ELBO is $$\Sigma_{z}Q(z)\log[\frac{p(X,Z;\theta)}{Q(Z)}]$$

where the Q(z) is set equal to the posterior of z given x in the expectation step. This choice of $Q(Z)$ brings the ELBO closest to the evidence ($P(X,Z;\theta)$) $$Q(Z) = p(Z/X;\theta)$$

From wikipedia: enter image description here

Which one is correct? There is a $Q(Z)$ difference in the denominator. If they are both the same, how?

$\endgroup$

1 Answer 1

0
$\begingroup$

Both are correct. The $Q(Z)$ in the denominator of the first expression does not depend on $\theta$, and thus can be discarded from the optimization problem ($argmax_{\theta}$), hence obtaining the same expression as in the Wikipedia article.

EDIT: More formally, $Q(Z)$ does not depend on the value of $\theta$ over which we optimize in the M-step. The function we define in the E-step is the following:

$$ \sum_Z Q(Z) \log \frac{p(X,Z|\theta)}{Q(Z)} $$

with $Q(Z) = p(Z|X,\theta^t)$.

$\theta^t$ is the optimal value of $\theta$ computed at the previous iteration, and is held constant during the current M-step, in which we optimize over $\theta$ (and not $\theta^t$), which only appears at the numerator of the above expression.

$\endgroup$
2
  • $\begingroup$ I think it does. We set it equal to the posterior in the expectation step which depends upon $\theta$. If the posterior is intractable then we get into the entire business with variational inference $\endgroup$ Commented Jul 9, 2024 at 4:24
  • $\begingroup$ I clarified the difference between $\theta$ and $\theta^t$ in my answer. $\endgroup$ Commented Jul 10, 2024 at 14:01

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.