Skip to main content
Notice removed Draw attention by Stochastic
Bounty Ended with doubled's answer chosen by Stochastic
added 1 character in body
Source Link
Stochastic
  • 883
  • 1
  • 10
  • 29

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu_{j}$ and $\Sigma_{j}$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.

EM algorithm background#background

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $\theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(\theta|x,z)$ and therefore have to do an alternating expectation maximisation:

  1. set $\theta_0$
  2. Alternate between:

\begin{align*} & E :\text{To find an expression for} &\\ & E_z\left[l(\theta|X,Z)|X,\theta\right] &\\ & = \sum_{all Z} l(\theta|x,z) P(Z=z|x,\theta) \end{align*}

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

We want to maximise the log-likelihood:
$l(\theta|x)$

Problem: Difficult to maximise it directly.

\begin{align*} \theta & = \left\{\pi_1,\dots,\pi_k,\mu_1,\dots,\mu_k,\Sigma_1,\dots,\Sigma_k \right\} & \\ l(\theta|x) & = \sum_{i=1}^{n} log \left(\sum_{k=1}^{K} \pi_k \frac{1}{|\Sigma_k|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)\Sigma_{k}^{-1} (x_i-\mu_k)\right)\right) &\\ \end{align*}

Difficult to maximise $l(\theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,

\begin{align*} l(\theta|X,Z) & = \sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})\Sigma_{Z_i}^{-1} (x_i-\mu_{Z_i})\right)\right) \end{align*}

\begin{align*} P(Z_i=j|X,\theta) & = \frac{P\left(X=x_i|\theta, Z_i =j \right) P\left(Z_i=j|\theta\right)}{\sum_{k=1}^{K}P\left(X=x_i|\theta, Z_i=k \right)P\left(Z_i=k|\theta\right)} &\\ & = \frac{\frac{1}{|\Sigma_j|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_j)^T\Sigma_{j}^{-1}(x_i-\mu_j)\right)\pi_j}{\sum_{k=1}^{K}\pi_k \frac{1}{|\Sigma_k|^{1/2}(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_k)^{T}\Sigma_{k}^{-1}(x_i-\mu_j)\right)} &\\ & = w_{ij} &\\ \end{align*}

\begin{align*} & E: E_Z \left[l(\theta | X_i, Z) |X,\theta \right] &\\ & E_Z \left[\sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2} (2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})^T\Sigma_{Z_i}^{-1}(x_i-\mu_{Z_i})\right)\right)|X,\theta\right] &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right) &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} W_{ij} \left(log (\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log (2\pi) \left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)\right) &\\ & \text{set $\pi_1=1-\sum_{j=2}^{K}\pi_j$} &\\ & = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) \right) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) + &\\ & \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} (log(\pi_j)) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) & \end{align*}

for $j=2,3,\dots,K$.

My question is how do I maximize the last part above with respect to $\mu_{j}$ and $\Sigma_{j}$.

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu$ and $\Sigma$

I have found a similar post, but it was only with regards to differentiating $\Sigma_k$ .

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu_{j}$ and $\Sigma_{j}$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.

EM algorithm background#

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $\theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(\theta|x,z)$ and therefore have to do an alternating expectation maximisation:

  1. set $\theta_0$
  2. Alternate between:

\begin{align*} & E :\text{To find an expression for} &\\ & E_z\left[l(\theta|X,Z)|X,\theta\right] &\\ & = \sum_{all Z} l(\theta|x,z) P(Z=z|x,\theta) \end{align*}

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

We want to maximise the log-likelihood:
$l(\theta|x)$

Problem: Difficult to maximise it directly.

\begin{align*} \theta & = \left\{\pi_1,\dots,\pi_k,\mu_1,\dots,\mu_k,\Sigma_1,\dots,\Sigma_k \right\} & \\ l(\theta|x) & = \sum_{i=1}^{n} log \left(\sum_{k=1}^{K} \pi_k \frac{1}{|\Sigma_k|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)\Sigma_{k}^{-1} (x_i-\mu_k)\right)\right) &\\ \end{align*}

Difficult to maximise $l(\theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,

\begin{align*} l(\theta|X,Z) & = \sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})\Sigma_{Z_i}^{-1} (x_i-\mu_{Z_i})\right)\right) \end{align*}

\begin{align*} P(Z_i=j|X,\theta) & = \frac{P\left(X=x_i|\theta, Z_i =j \right) P\left(Z_i=j|\theta\right)}{\sum_{k=1}^{K}P\left(X=x_i|\theta, Z_i=k \right)P\left(Z_i=k|\theta\right)} &\\ & = \frac{\frac{1}{|\Sigma_j|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_j)^T\Sigma_{j}^{-1}(x_i-\mu_j)\right)\pi_j}{\sum_{k=1}^{K}\pi_k \frac{1}{|\Sigma_k|^{1/2}(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_k)^{T}\Sigma_{k}^{-1}(x_i-\mu_j)\right)} &\\ & = w_{ij} &\\ \end{align*}

\begin{align*} & E: E_Z \left[l(\theta | X_i, Z) |X,\theta \right] &\\ & E_Z \left[\sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2} (2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})^T\Sigma_{Z_i}^{-1}(x_i-\mu_{Z_i})\right)\right)|X,\theta\right] &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right) &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} W_{ij} \left(log (\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log (2\pi) \left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)\right) &\\ & \text{set $\pi_1=1-\sum_{j=2}^{K}\pi_j$} &\\ & = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) \right) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) + &\\ & \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} (log(\pi_j)) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) & \end{align*}

for $j=2,3,\dots,K$.

My question is how do I maximize the last part above with respect to $\mu_{j}$ and $\Sigma_{j}$.

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu$ and $\Sigma$

I have found a similar post, but it was only with regards to differentiating $\Sigma_k$ .

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu_{j}$ and $\Sigma_{j}$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.

EM algorithm background

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $\theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(\theta|x,z)$ and therefore have to do an alternating expectation maximisation:

  1. set $\theta_0$
  2. Alternate between:

\begin{align*} & E :\text{To find an expression for} &\\ & E_z\left[l(\theta|X,Z)|X,\theta\right] &\\ & = \sum_{all Z} l(\theta|x,z) P(Z=z|x,\theta) \end{align*}

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

We want to maximise the log-likelihood:
$l(\theta|x)$

Problem: Difficult to maximise it directly.

\begin{align*} \theta & = \left\{\pi_1,\dots,\pi_k,\mu_1,\dots,\mu_k,\Sigma_1,\dots,\Sigma_k \right\} & \\ l(\theta|x) & = \sum_{i=1}^{n} log \left(\sum_{k=1}^{K} \pi_k \frac{1}{|\Sigma_k|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)\Sigma_{k}^{-1} (x_i-\mu_k)\right)\right) &\\ \end{align*}

Difficult to maximise $l(\theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,

\begin{align*} l(\theta|X,Z) & = \sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})\Sigma_{Z_i}^{-1} (x_i-\mu_{Z_i})\right)\right) \end{align*}

\begin{align*} P(Z_i=j|X,\theta) & = \frac{P\left(X=x_i|\theta, Z_i =j \right) P\left(Z_i=j|\theta\right)}{\sum_{k=1}^{K}P\left(X=x_i|\theta, Z_i=k \right)P\left(Z_i=k|\theta\right)} &\\ & = \frac{\frac{1}{|\Sigma_j|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_j)^T\Sigma_{j}^{-1}(x_i-\mu_j)\right)\pi_j}{\sum_{k=1}^{K}\pi_k \frac{1}{|\Sigma_k|^{1/2}(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_k)^{T}\Sigma_{k}^{-1}(x_i-\mu_j)\right)} &\\ & = w_{ij} &\\ \end{align*}

\begin{align*} & E: E_Z \left[l(\theta | X_i, Z) |X,\theta \right] &\\ & E_Z \left[\sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2} (2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})^T\Sigma_{Z_i}^{-1}(x_i-\mu_{Z_i})\right)\right)|X,\theta\right] &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right) &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} W_{ij} \left(log (\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log (2\pi) \left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)\right) &\\ & \text{set $\pi_1=1-\sum_{j=2}^{K}\pi_j$} &\\ & = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) \right) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) + &\\ & \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} (log(\pi_j)) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) & \end{align*}

for $j=2,3,\dots,K$.

My question is how do I maximize the last part above with respect to $\mu_{j}$ and $\Sigma_{j}$.

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu$ and $\Sigma$

I have found a similar post, but it was only with regards to differentiating $\Sigma_k$ .

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align}\begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu_{j}$ and $\Sigma_{j}$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.

EM algorithm background#

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $\theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(\theta|x,z)$ and therefore have to do an alternating expectation maximisation:

  1. set $\theta_0$
  2. Alternate between:

\begin{align*} & E :\text{To find an expression for} &\\ & E_z\left[l(\theta|X,Z)|X,\theta\right] &\\ & = \sum_{all Z} l(\theta|x,z) P(Z=z|x,\theta) \end{align*}

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

We want to maximise the log-likelihood:
$l(\theta|x)$

Problem: Difficult to maximise it directly.

\begin{align*} \theta & = \left\{\pi_1,\dots,\pi_k,\mu_1,\dots,\mu_k,\Sigma_1,\dots,\Sigma_k \right\} & \\ l(\theta|x) & = \sum_{i=1}^{n} log \left(\sum_{k=1}^{K} \pi_k \frac{1}{|\Sigma_k|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)\Sigma_{k}^{-1} (x_i-\mu_k)\right)\right) &\\ \end{align*}

Difficult to maximise $l(\theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,

\begin{align*} l(\theta|X,Z) & = \sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})\Sigma_{Z_i}^{-1} (x_i-\mu_{Z_i})\right)\right) \end{align*}

\begin{align*} P(Z_i=j|X,\theta) & = \frac{P\left(X=x_i|\theta, Z_i =j \right) P\left(Z_i=j|\theta\right)}{\sum_{k=1}^{K}P\left(X=x_i|\theta, Z_i=k \right)P\left(Z_i=k|\theta\right)} &\\ & = \frac{\frac{1}{|\Sigma_j|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_j)^T\Sigma_{j}^{-1}(x_i-\mu_j)\right)\pi_j}{\sum_{k=1}^{K}\pi_k \frac{1}{|\Sigma_k|^{1/2}(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_k)^{T}\Sigma_{k}^{-1}(x_i-\mu_j)\right)} &\\ & = w_{ij} &\\ \end{align*}

\begin{align*} & E: E_Z \left[l(\theta | X_i, Z) |X,\theta \right] &\\ & E_Z \left[\sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2} (2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})^T\Sigma_{Z_i}^{-1}(x_i-\mu_{Z_i})\right)\right)|X,\theta\right] &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right) &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} W_{ij} \left(log (\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log (2\pi) \left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)\right) &\\ & \text{set $\pi_1=1-\sum_{j=2}^{K}\pi_j$} &\\ & = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) \right) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) + &\\ & \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} (log(\pi_j)) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) & \end{align*}

for $j=2,3,\dots,K$.

My question is how do I maximize the last part above with respect to $\mu_{j}$ and $\Sigma_{j}$.

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align}\begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu$ and $\Sigma$

I have found a similar post, but it was only with regards to differentiating $\Sigma_k$ .

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu_{j}$ and $\Sigma_{j}$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.

EM algorithm background#

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $\theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(\theta|x,z)$ and therefore have to do an alternating expectation maximisation:

  1. set $\theta_0$
  2. Alternate between:

\begin{align*} & E :\text{To find an expression for} &\\ & E_z\left[l(\theta|X,Z)|X,\theta\right] &\\ & = \sum_{all Z} l(\theta|x,z) P(Z=z|x,\theta) \end{align*}

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

We want to maximise the log-likelihood:
$l(\theta|x)$

Problem: Difficult to maximise it directly.

\begin{align*} \theta & = \left\{\pi_1,\dots,\pi_k,\mu_1,\dots,\mu_k,\Sigma_1,\dots,\Sigma_k \right\} & \\ l(\theta|x) & = \sum_{i=1}^{n} log \left(\sum_{k=1}^{K} \pi_k \frac{1}{|\Sigma_k|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)\Sigma_{k}^{-1} (x_i-\mu_k)\right)\right) &\\ \end{align*}

Difficult to maximise $l(\theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,

\begin{align*} l(\theta|X,Z) & = \sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})\Sigma_{Z_i}^{-1} (x_i-\mu_{Z_i})\right)\right) \end{align*}

\begin{align*} P(Z_i=j|X,\theta) & = \frac{P\left(X=x_i|\theta, Z_i =j \right) P\left(Z_i=j|\theta\right)}{\sum_{k=1}^{K}P\left(X=x_i|\theta, Z_i=k \right)P\left(Z_i=k|\theta\right)} &\\ & = \frac{\frac{1}{|\Sigma_j|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_j)^T\Sigma_{j}^{-1}(x_i-\mu_j)\right)\pi_j}{\sum_{k=1}^{K}\pi_k \frac{1}{|\Sigma_k|^{1/2}(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_k)^{T}\Sigma_{k}^{-1}(x_i-\mu_j)\right)} &\\ & = w_{ij} &\\ \end{align*}

\begin{align*} & E: E_Z \left[l(\theta | X_i, Z) |X,\theta \right] &\\ & E_Z \left[\sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2} (2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})^T\Sigma_{Z_i}^{-1}(x_i-\mu_{Z_i})\right)\right)|X,\theta\right] &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right) &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} W_{ij} \left(log (\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log (2\pi) \left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)\right) &\\ & \text{set $\pi_1=1-\sum_{j=2}^{K}\pi_j$} &\\ & = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) \right) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) + &\\ & \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} (log(\pi_j)) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) & \end{align*}

for $j=2,3,\dots,K$.

My question is how do I maximize the last part above with respect to $\mu_{j}$ and $\Sigma_{j}$.

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu$ and $\Sigma$

I have found a similar post, but it was only with regards to differentiating $\Sigma_k$ .

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu_{j}$ and $\Sigma_{j}$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.

EM algorithm background#

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $\theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(\theta|x,z)$ and therefore have to do an alternating expectation maximisation:

  1. set $\theta_0$
  2. Alternate between:

\begin{align*} & E :\text{To find an expression for} &\\ & E_z\left[l(\theta|X,Z)|X,\theta\right] &\\ & = \sum_{all Z} l(\theta|x,z) P(Z=z|x,\theta) \end{align*}

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

We want to maximise the log-likelihood:
$l(\theta|x)$

Problem: Difficult to maximise it directly.

\begin{align*} \theta & = \left\{\pi_1,\dots,\pi_k,\mu_1,\dots,\mu_k,\Sigma_1,\dots,\Sigma_k \right\} & \\ l(\theta|x) & = \sum_{i=1}^{n} log \left(\sum_{k=1}^{K} \pi_k \frac{1}{|\Sigma_k|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)\Sigma_{k}^{-1} (x_i-\mu_k)\right)\right) &\\ \end{align*}

Difficult to maximise $l(\theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,

\begin{align*} l(\theta|X,Z) & = \sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})\Sigma_{Z_i}^{-1} (x_i-\mu_{Z_i})\right)\right) \end{align*}

\begin{align*} P(Z_i=j|X,\theta) & = \frac{P\left(X=x_i|\theta, Z_i =j \right) P\left(Z_i=j|\theta\right)}{\sum_{k=1}^{K}P\left(X=x_i|\theta, Z_i=k \right)P\left(Z_i=k|\theta\right)} &\\ & = \frac{\frac{1}{|\Sigma_j|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_j)^T\Sigma_{j}^{-1}(x_i-\mu_j)\right)\pi_j}{\sum_{k=1}^{K}\pi_k \frac{1}{|\Sigma_k|^{1/2}(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_k)^{T}\Sigma_{k}^{-1}(x_i-\mu_j)\right)} &\\ & = w_{ij} &\\ \end{align*}

\begin{align*} & E: E_Z \left[l(\theta | X_i, Z) |X,\theta \right] &\\ & E_Z \left[\sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2} (2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})^T\Sigma_{Z_i}^{-1}(x_i-\mu_{Z_i})\right)\right)|X,\theta\right] &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right) &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} W_{ij} \left(log (\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log (2\pi) \left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)\right) &\\ & \text{set $\pi_1=1-\sum_{j=2}^{K}\pi_j$} &\\ & = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) \right) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) + &\\ & \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} (log(\pi_j)) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) & \end{align*}

for $j=2,3,\dots,K$.

My question is how do I maximize the last part above with respect to $\mu_{j}$ and $\Sigma_{j}$.

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_1|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_1)^{T} \Sigma_{1}^{-1}(x_i-\mu_1) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu$ and $\Sigma$

I have found a similar post, but it was only with regards to differentiating $\Sigma_k$ .

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu_{j}$ and $\Sigma_{j}$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.

EM algorithm background#

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $\theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(\theta|x,z)$ and therefore have to do an alternating expectation maximisation:

  1. set $\theta_0$
  2. Alternate between:

\begin{align*} & E :\text{To find an expression for} &\\ & E_z\left[l(\theta|X,Z)|X,\theta\right] &\\ & = \sum_{all Z} l(\theta|x,z) P(Z=z|x,\theta) \end{align*}

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

We want to maximise the log-likelihood:
$l(\theta|x)$

Problem: Difficult to maximise it directly.

\begin{align*} \theta & = \left\{\pi_1,\dots,\pi_k,\mu_1,\dots,\mu_k,\Sigma_1,\dots,\Sigma_k \right\} & \\ l(\theta|x) & = \sum_{i=1}^{n} log \left(\sum_{k=1}^{K} \pi_k \frac{1}{|\Sigma_k|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)\Sigma_{k}^{-1} (x_i-\mu_k)\right)\right) &\\ \end{align*}

Difficult to maximise $l(\theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,

\begin{align*} l(\theta|X,Z) & = \sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})\Sigma_{Z_i}^{-1} (x_i-\mu_{Z_i})\right)\right) \end{align*}

\begin{align*} P(Z_i=j|X,\theta) & = \frac{P\left(X=x_i|\theta, Z_i =j \right) P\left(Z_i=j|\theta\right)}{\sum_{k=1}^{K}P\left(X=x_i|\theta, Z_i=k \right)P\left(Z_i=k|\theta\right)} &\\ & = \frac{\frac{1}{|\Sigma_j|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_j)^T\Sigma_{j}^{-1}(x_i-\mu_j)\right)\pi_j}{\sum_{k=1}^{K}\pi_k \frac{1}{|\Sigma_k|^{1/2}(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_k)^{T}\Sigma_{k}^{-1}(x_i-\mu_j)\right)} &\\ & = w_{ij} &\\ \end{align*}

\begin{align*} & E: E_Z \left[l(\theta | X_i, Z) |X,\theta \right] &\\ & E_Z \left[\sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2} (2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})^T\Sigma_{Z_i}^{-1}(x_i-\mu_{Z_i})\right)\right)|X,\theta\right] &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right) &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} W_{ij} \left(log (\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log (2\pi) \left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)\right) &\\ & \text{set $\pi_1=1-\sum_{j=2}^{K}\pi_j$} &\\ & = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) \right) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) + &\\ & \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} (log(\pi_j)) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) & \end{align*}

for $j=2,3,\dots,K$.

My question is how do I maximize the last part above with respect to $\mu_{j}$ and $\Sigma_{j}$.

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu$ and $\Sigma$

I have found a similar post, but it was only with regards to differentiating $\Sigma_k$ .

EM algorithm

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $\theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(\theta|x,z)$ and therefore have to do an alternating expectation maximisation:

  1. set $\theta_0$
  2. Alternate between:

\begin{align*} & E :\text{To find an expression for} &\\ & E_z\left[l(\theta|X,Z)|X,\theta\right] &\\ & = \sum_{all Z} l(\theta|x,z) P(Z=z|x,\theta) \end{align*}

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

We want to maximise the log-likelihood:
$l(\theta|x)$

Problem: Difficult to maximise it directly.

\begin{align*} \theta & = \left\{\pi_1,\dots,\pi_k,\mu_1,\dots,\mu_k,\Sigma_1,\dots,\Sigma_k \right\} & \\ l(\theta|x) & = \sum_{i=1}^{n} log \left(\sum_{k=1}^{K} \pi_k \frac{1}{|\Sigma_k|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)\Sigma_{k}^{-1} (x_i-\mu_k)\right)\right) &\\ \end{align*}

Difficult to maximise $l(\theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,

\begin{align*} l(\theta|X,Z) & = \sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})\Sigma_{Z_i}^{-1} (x_i-\mu_{Z_i})\right)\right) \end{align*}

\begin{align*} P(Z_i=j|X,\theta) & = \frac{P\left(X=x_i|\theta, Z_i =j \right) P\left(Z_i=j|\theta\right)}{\sum_{k=1}^{K}P\left(X=x_i|\theta, Z_i=k \right)P\left(Z_i=k|\theta\right)} &\\ & = \frac{\frac{1}{|\Sigma_j|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_j)^T\Sigma_{j}^{-1}(x_i-\mu_j)\right)\pi_j}{\sum_{k=1}^{K}\pi_k \frac{1}{|\Sigma_k|^{1/2}(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_k)^{T}\Sigma_{k}^{-1}(x_i-\mu_j)\right)} &\\ & = w_{ij} &\\ \end{align*}

\begin{align*} & E: E_Z \left[l(\theta | X_i, Z) |X,\theta \right] &\\ & E_Z \left[\sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2} (2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})^T\Sigma_{Z_i}^{-1}(x_i-\mu_{Z_i})\right)\right)|X,\theta\right] &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right) &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} W_{ij} \left(log (\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log (2\pi) \left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)\right) &\\ & \text{set $\pi_1=1-\sum_{j=2}^{K}\pi_j$} &\\ & = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) \right) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) + &\\ & \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} (log(\pi_j)) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) & \end{align*}

for $j=2,3,\dots,K$.

My question is how do I maximize the last part above with respect to $\mu_{j}$ and $\Sigma_{j}$.

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu$ and $\Sigma$

I have found a similar post, but it was only with regards to differentiating $\Sigma_k$ .

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu_{j}$ and $\Sigma_{j}$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.

EM algorithm background#

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $\theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(\theta|x,z)$ and therefore have to do an alternating expectation maximisation:

  1. set $\theta_0$
  2. Alternate between:

\begin{align*} & E :\text{To find an expression for} &\\ & E_z\left[l(\theta|X,Z)|X,\theta\right] &\\ & = \sum_{all Z} l(\theta|x,z) P(Z=z|x,\theta) \end{align*}

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

We want to maximise the log-likelihood:
$l(\theta|x)$

Problem: Difficult to maximise it directly.

\begin{align*} \theta & = \left\{\pi_1,\dots,\pi_k,\mu_1,\dots,\mu_k,\Sigma_1,\dots,\Sigma_k \right\} & \\ l(\theta|x) & = \sum_{i=1}^{n} log \left(\sum_{k=1}^{K} \pi_k \frac{1}{|\Sigma_k|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)\Sigma_{k}^{-1} (x_i-\mu_k)\right)\right) &\\ \end{align*}

Difficult to maximise $l(\theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,

\begin{align*} l(\theta|X,Z) & = \sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})\Sigma_{Z_i}^{-1} (x_i-\mu_{Z_i})\right)\right) \end{align*}

\begin{align*} P(Z_i=j|X,\theta) & = \frac{P\left(X=x_i|\theta, Z_i =j \right) P\left(Z_i=j|\theta\right)}{\sum_{k=1}^{K}P\left(X=x_i|\theta, Z_i=k \right)P\left(Z_i=k|\theta\right)} &\\ & = \frac{\frac{1}{|\Sigma_j|^{1/2}} \frac{1}{(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_j)^T\Sigma_{j}^{-1}(x_i-\mu_j)\right)\pi_j}{\sum_{k=1}^{K}\pi_k \frac{1}{|\Sigma_k|^{1/2}(2\pi)^{d/2}} \operatorname{exp} \left(-\frac{1}{2}(x_i-\mu_k)^{T}\Sigma_{k}^{-1}(x_i-\mu_j)\right)} &\\ & = w_{ij} &\\ \end{align*}

\begin{align*} & E: E_Z \left[l(\theta | X_i, Z) |X,\theta \right] &\\ & E_Z \left[\sum_{i=1}^{n} log \left(\pi_{Z_i} \frac{1}{|\Sigma_{Z_i}|^{1/2} (2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_{Z_i})^T\Sigma_{Z_i}^{-1}(x_i-\mu_{Z_i})\right)\right)|X,\theta\right] &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right) &\\ & = \sum_{i=1}^{n} \sum_{j=1}^{K} W_{ij} \left(log (\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log (2\pi) \left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)\right) &\\ & \text{set $\pi_1=1-\sum_{j=2}^{K}\pi_j$} &\\ & = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) \right) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) + &\\ & \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} (log(\pi_j)) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) & \end{align*}

for $j=2,3,\dots,K$.

My question is how do I maximize the last part above with respect to $\mu_{j}$ and $\Sigma_{j}$.

\begin{align*} & M :\text{Maximise over $\theta$} &\\ & E_z \left[l (\theta|X,Z)| X,\theta \right] &\\ \end{align*}

Summary

So to summarize my question, how can I take \begin{align} = \sum_{i=1}^{n}W_{i1} \left(log (1-\sum_{j=2}^{K}\pi_j) -\frac{1}{2} log(|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j) \right)+ \sum_{i=1}^{n}\sum_{j=2}^{K} W_{ij} \left( log(\pi_j) -\frac{1}{2} log (|\Sigma_j|) -\frac{d}{2} log(2\pi) -\frac{1}{2}(x_i-\mu_j)^{T} \Sigma_{j}^{-1}(x_i-\mu_j)\right) \end{align} and maximize it with regards to $\mu$ and $\Sigma$

I have found a similar post, but it was only with regards to differentiating $\Sigma_k$ .

Tweeted twitter.com/StackStats/status/1276258901118185472
Notice added Draw attention by Stochastic
Bounty Started worth 100 reputation by Stochastic
edited body
Source Link
Stochastic
  • 883
  • 1
  • 10
  • 29
Loading
added 1 character in body
Source Link
Stochastic
  • 883
  • 1
  • 10
  • 29
Loading
Source Link
Stochastic
  • 883
  • 1
  • 10
  • 29
Loading