## Summary ##
So to summarize my question, how can I take
\begin{align}
= \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) \right)+
\sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right)
\end{align}
and maximize it with regards to $\mu_{j}$ and $\Sigma_{j}$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.
# EM algorithm background#
The [expectation maximisation algorithm][1] can be defined as an alternating (iterative) algorithm, where we start with an initial value for $\theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(\theta|x,z)$ and therefore have to do an alternating expectation maximisation:
1. set $\theta_0$ <br/>
2. Alternate between:
\begin{align*}
& E :\text{To find an expression for} &\\
& E_z\left[l(\theta|X,Z)|X,\theta\right] &\\
& = \sum_{all Z} l(\theta|x,z) P(Z=z|x,\theta)
\end{align*}
\begin{align*}
& M :\text{Maximise over $\theta$} &\\
& E_z \left[l (\theta|X,Z)| X,\theta \right] &\\
\end{align*}
We want to maximise the log-likelihood: <br/>
$l(\theta|x)$
Problem: Difficult to maximise it directly.
\begin{align*}
\theta & = \left\{\pi_1,\dots,\pi_k,\mu_1,\dots,\mu_k,\Sigma_1,\dots,\Sigma_k \right\} & \\
l(\theta|x) & = \sum_{i=1}^{n} log \left(\sum_{k=1}^{K} \pi_k \frac{1}{|\Sigma_k|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)\Sigma_{k}^{-1} (x_i-\mu_k)\right)\right) &\\
\end{align*}
Difficult to maximise $l(\theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log. <br/>
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,
\begin{align*}
l(\theta|X,Z) & = \sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})\Sigma_{Z_i}^{-1} (x_i-\mu_{Z_i})\right)\right)
\end{align*}
\begin{align*}
P(Z_i=j|X,\theta) & = \frac{P\left(X=x_i|\theta, Z_i =j \right) P\left(Z_i=j|\theta\right)}{\sum_{k=1}^{K}P\left(X=x_i|\theta, Z_i=k \right)P\left(Z_i=k|\theta\right)} &\\
& = \frac{\frac{1}{|\Sigma_j|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_j)^T\Sigma_{j}^{-1}(x_i-\mu_j)\right)\pi_j}{\sum_{k=1}^{K}\pi_k \frac{1}{|\Sigma_k|^{1/2}(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_k)^{T}\Sigma_{k}^{-1}(x_i-\mu_j)\right)} &\\
& = w_{ij} &\\
\end{align*}
\begin{align*}
& E: E_Z \left[l(\theta | X_i, Z) |X,\theta \right] &\\
& E_Z \left[\sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2} (2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})^T\Sigma_{Z_i}^{-1}(x_i-\mu_{Z_i})\right)\right)|X,\theta\right] &\\
& = \sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right) &\\
& = \sum_{i=1}^{n} \sum_{j=1}^{K} W_{ij} \left(log (\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log (2\pi) \left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)\right) &\\
& \text{set $\pi_1=1-\sum_{j=2}^{K}\pi_j$} &\\
& = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) \right) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) + &\\
& \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} (log(\pi_j)) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) &
\end{align*}
for $j=2,3,\dots,K$.
My question is how do I maximize the last part above with respect to $\mu_{j}$ and $\Sigma_{j}$.
\begin{align*}
& M :\text{Maximise over $\theta$} &\\
& E_z \left[l (\theta|X,Z)| X,\theta \right] &\\
\end{align*}
## Summary ##
So to summarize my question, how can I take
\begin{align}
= \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) \right)+
\sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right)
\end{align}
and maximize it with regards to $\mu$ and $\Sigma$
I have found a [similar post][2], but it was only with regards to differentiating $\Sigma_k$ .
[1]: https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm
[2]: https://stats.stackexchange.com/questions/243128/derivation-of-m-step-in-em-algorithm-for-mixture-of-gaussians