Summary
So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu_{j}$ and $\Sigma_{j}$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.
EM algorithm background#background
The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $\theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(\theta|x,z)$ and therefore have to do an alternating expectation maximisation:
- set $\theta_0$
- Alternate between:
\begin{align*} & E :\text{To find an expression for} &\\ & E_z\left[l(\theta|X,Z)|X,\theta\right] &\\ & = \sum_{all Z} l(\theta|x,z) P(Z=z|x,\theta) \end{align*}
\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}
We want to maximise the log-likelihood:
$l(\theta|x)$
Problem: Difficult to maximise it directly.
\begin{align*} \theta & = \left\{\pi_1,\dots,\pi_k,\mu_1,\dots,\mu_k,\Sigma_1,\dots,\Sigma_k \right\} & \\ l(\theta|x) & = \sum_{i=1}^{n} log \left(\sum_{k=1}^{K} \pi_k \frac{1}{|\Sigma_k|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)\Sigma_{k}^{-1} (x_i-\mu_k)\right)\right) &\\ \end{align*}
Difficult to maximise $l(\theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,
\begin{align*} l(\theta|X,Z) & = \sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})\Sigma_{Z_i}^{-1} (x_i-\mu_{Z_i})\right)\right) \end{align*}
\begin{align*} P(Z_i=j|X,\theta) & = \frac{P\left(X=x_i|\theta, Z_i =j \right) P\left(Z_i=j|\theta\right)}{\sum_{k=1}^{K}P\left(X=x_i|\theta, Z_i=k \right)P\left(Z_i=k|\theta\right)} &\\ & = \frac{\frac{1}{|\Sigma_j|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_j)^T\Sigma_{j}^{-1}(x_i-\mu_j)\right)\pi_j}{\sum_{k=1}^{K}\pi_k \frac{1}{|\Sigma_k|^{1/2}(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_k)^{T}\Sigma_{k}^{-1}(x_i-\mu_j)\right)} &\\ & = w_{ij} &\\ \end{align*}
\begin{align*} & E: E_Z \left[l(\theta | X_i, Z) |X,\theta \right] &\\ & E_Z \left[\sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2} (2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})^T\Sigma_{Z_i}^{-1}(x_i-\mu_{Z_i})\right)\right)|X,\theta\right] &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right) &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} W_{ij} \left(log (\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log (2\pi) \left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)\right) &\\ & \text{set $\pi_1=1-\sum_{j=2}^{K}\pi_j$} &\\ & = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) \right) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) + &\\ & \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} (log(\pi_j)) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) & \end{align*}
for $j=2,3,\dots,K$.
My question is how do I maximize the last part above with respect to $\mu_{j}$ and $\Sigma_{j}$.
\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}
Summary
So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu$ and $\Sigma$
I have found a similar post, but it was only with regards to differentiating $\Sigma_k$ .