0
$\begingroup$

I am going through Expectation Maximization (EM) algorithm derivation for Gaussian Mixture models. I understand it except for a small detail.

So, the general idea of EM is to maximize the expectation of complete log-likelyhood of the data $\log L(\theta;\mathbf{x},\mathbf{Z})$, where $\mathbf{x}$ is observed and $\mathbf{Z}$ is latent variables, $\theta$ are the model parameters. The expectation is taken over the posterior distribution of the latent variables: $\operatorname{E}_{\mathbf{Z}\mid\mathbf{X};\mathbf{\theta}^{(t)}} [\log L(\theta;\mathbf{x},\mathbf{Z})]$, where ${\theta}^{(t)}$ represents the parameters of the previous algorithm iteration.

Then, the derivation follows (copy-paste from Wikipedia): : \begin{align}Q(\theta\mid\theta^{(t)}) &= \operatorname{E}_{\mathbf{Z}\mid\mathbf{X}=\mathbf{x};\mathbf{\theta}^{(t)}} [\log L(\theta;\mathbf{x},\mathbf{Z}) ] \\ &= \operatorname{E}_{\mathbf{Z}\mid\mathbf{X}=\mathbf{x};\mathbf{\theta}^{(t)}} [\log \prod_{i=1}^{n}L(\theta;\mathbf{x}_i,Z_i) ] \\ &= \operatorname{E}_{\mathbf{Z}\mid\mathbf{X}=\mathbf{x};\mathbf{\theta}^{(t)}} [\sum_{i=1}^n \log L(\theta;\mathbf{x}_i,Z_i) ] \\ &= \sum_{i=1}^n\operatorname{E}_{Z_i\mid X_i=x_i;\mathbf{\theta}^{(t)}} [\log L(\theta;\mathbf{x}_i,Z_i) ] \\ &= \sum_{i=1}^n \sum_{j=1}^2 P(Z_i =j \mid X_i = \mathbf{x}_i; \theta^{(t)}) \log L(\theta_j;\mathbf{x}_i,j) \\ &= \sum_{i=1}^n \sum_{j=1}^2 T_{j,i}^{(t)} \big[ \log \tau_j -\tfrac{1}{2} \log |\Sigma_j| -\tfrac{1}{2}(\mathbf{x}_i-\boldsymbol{\mu}_j)^\top\Sigma_j^{-1} (\mathbf{x}_i-\boldsymbol{\mu}_j) -\tfrac{d}{2} \log(2\pi) \big]. \end{align}

I don't understand how we get the forth line from the third. Namely, why does the expectation value over joint distribution for all observed samples $\operatorname{E}_{\mathbf{Z}\mid\mathbf{X};\mathbf{\theta}^{(t)}}$ nicely disintegrate into expectations over individual samples $\operatorname{E}_{Z_i\mid X_i = x_i;{\theta}^{(t)}}$

$\endgroup$
2
  • $\begingroup$ I had a similar question a while back: stats.stackexchange.com/questions/473399/… See if it can help you :) $\endgroup$ Commented Dec 8, 2023 at 15:57
  • $\begingroup$ @Stochastic, thanks a lot for the link! However, I think you have a similar step that I don't understand. Namely, you have the Bayesian formula for the posterior $P(Z_i=j|X,\theta) = \frac{P(X=x_i|\theta, Z_i =j) P(Z_i=j|\theta)}{\sum_{k=1}^{K}P(X=x_i|\theta, Z_i=k)P(Z_i=k|\theta)} $. But here you went from joint probability over $X$ to probabilities over a particular observation $x_i$, so I guess ideally you have to write the multiplication of the $P\left(X=x_i|\theta, Z_i =j \right)$ for all $i$ $\endgroup$ Commented Dec 9, 2023 at 15:32

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.