I am implementing a HMM (Hidden Markov Model) for time series data to assess (define) the state of the model at each time period (with continuous observations). To maximize the likelihood of the model (given an observed data, the probability of hidden state the model is in at each data point) I used an expectation maximization algorithm (in the case of HMM - the Baum-Welch algorithm).
The problem is that in the case of multidimensional data, (the observation at each time is a vector), the defined likelihood does not increase after each iteration. In the case of one-dimensional data, when my X’s are numbers at each time point and mean and the sigma are also numbers I used the CDF of the Gaussian distribution (in $\epsilon$-vicinity of data where $\epsilon$ is very small positive number). At the end, the algorithm works and, at each step, the probabilities increase.
In my model I assumed that the conditional probability distribution of the data is sampled either from an autoregressive process of order 1 or sampled from a Gaussian white noise distribution depending on which state the model is.
The steps of the algorithm are as follows:
Initialization of Emission, Transition and Pi matrices,
Calculation of forward and backward probabilities,
- Initialization $\alpha_i(1) = \pi_i \cdot p_i(\text{o}_1)$
- Induction $\alpha_i(t+1) = \big[ \sum_{j=1}^N \alpha_j(t)a_{ji} \big]p_i(\text{o}_{t+1})$
- Initialization $\beta_i(T)=1$
- Induction $\beta_i(t) = \sum_{j=1}^Na_{ij}p_j(\text{o}_{t+1})\beta_j(t+1) $
Using the forward and backward probabilities, the state probabilities are calculated as:
$$\gamma_i(t)= {{\alpha_i(t) \beta_i(t)} \over { \sum_{j=1}^N \alpha_j(t) \beta_j(t)}}$$
- Re-estimate Model Parameters
$$\xi_{ij}(t)= {{\alpha_i(t) \alpha_{ij} p_j (o_{t+1}) \beta_j(t+1)} \over {\sum_{k=1}^N \sum_{l=1}^N \alpha_k(t) \alpha_{kl} p_l (o_{t+1}) \beta_l (t+1)}}$$
• Updating the transition probability by:
$$\hat{a}_{ij} = {{\sum_{t=2}^T \xi_t(i,j)} \over { \sum_{t=2}^T \sum_{l=1}^n \xi_t(i,l)}}$$
• Updating the mean vectors and covariance matrices of the Gaussian by:
$$\hat{\mu}_{i,m} = {{\sum_{t=1}^T \gamma_{i,m}(t) \cdot o_t} \over { \sum_{t=1}^T \gamma_{i,m}(t)}}$$
$$\hat{\Sigma}_{i,m} = {{\sum_{t=1}^T \gamma_{i,m}(t) \cdot (o_t - \mu_{i,m})(o_t - \mu_{i,m})^T } \over { \sum_{t=1}^T \gamma_{i,m}(t)}}$$
I re-calculated the state probabilities using the updated model parameters and re-estimated the model parameters using the new probabilities. I repeated these steps until the maximization of the likelihood reached a desired convergence (i.e., when the improvement of the likelihood is negligible from one step to the next).
For the one dimensional case, I used scipy.stats.norm for the Emission matrix calculation.
For multi-dimensional data, I used scipy.stats.multivariate.normal with mean and covariance matrices.
Here is the calculation of Emission matrix:
from scipy.stats import multivariate_normal as mvn distr = mvn(mean=mean, cov=covariance) prob_t = abs(distr.cdf(obs + epsilion) - distr.cdf(obs - epsilion)) where epsilon is an arbitrary very small number, obs is my data point (which is a vector) at time t and prob_t is the conditional probability of my data being in particular state at time t.
Any idea why my model does not work for the multi-dimensional case?