Questions tagged [expectation-maximization]
An optimization algorithm often used for maximum-likelihood estimation in the presence of missing data.
604 questions
1 vote
1 answer
52 views
Extending mixture distribution (EM algorithm) for analysis
I have data from a survey. This data includes a key metric of interest (Y), which has an interesting distribution: a clear peak at 0, and a generally right-skewed distribution. I have modeled the ...
2 votes
1 answer
72 views
Substituting per-trial missing data for all missing data in the two-coin expectation-maximation example
A common example that I have found for explaining expectation-maximization is the example of two biased coins. The problem statement is: You have two biased coins, which you select with equal ...
2 votes
0 answers
96 views
Parameter estimation using a GMM with negative weights
I am quite new to ML methods such as GMMs but I have a problem at hand which requires me to estimate the covariance matrices of Gaussians such that the datapoints are drawn from a weighted sum of ...
0 votes
1 answer
52 views
Why do overfitted models in finite mixture regression sometimes have the smallest BIC despite the true number of components being selected frequently?
Learning about EM algorithms and finite mixture models and I've run into a particularly unintuitive problem. I'm trying to fit a finite mixture regression model on simulated data, where the true ...
1 vote
0 answers
108 views
Concrete example of degenerate expectation maximization - why does it differ from my intuition?
Inlined at the bottom is the code of a MATLAB simulation I wrote. This code very simply runs expectation maximization for three Gaussians and, as set down, is supposed to illustrate the degeneracy ...
0 votes
0 answers
44 views
Inferring the parameters of a Ornstein-Uhlenbeck process given realisations of it cannot be observed directly
I have a dataset of (noisy) test results. You can think of this as being accuracy that a chess player achieves in various games or number of points a basketballer scores in a game. I think a good ...
0 votes
0 answers
50 views
Understanding this penalised complete-data log-likelihood
I'm reading a paper on applying a SLOPE model to a Bayesian spike-and-slab framework [link]. The SLOPE penalty is given by $$ \text{pen}(\lambda) = \sigma \sum_{j=1}^{p} \lambda_{r(\beta,j)} |\beta_j|,...
0 votes
1 answer
65 views
Consistency of classical estimation of hyper-parameters from multiple samples
Let $\sigma>0$. Suppose we observe $N$ samples of sizes $T_1,\dots,T_n$ that are each generated by the following data generating process: $\theta_n$ is drawn from the distribution $\mathrm{N}(0,\...
1 vote
0 answers
98 views
Estimating state space model with exogenous states
I am trying to use DLM package in R to estimate a state space model, where the observation and state equation are as follows: $y_t=\beta_1a_t + B_t\beta_2(\frac{u_t-v_t}{u_t}) + C_t\beta_3(\frac{u_t-...
1 vote
1 answer
91 views
How do we guarantee the result is the global minimizer when dealing a function related to MLE of factor analysis?
I am reading the paper Some contributions to maximum likelihood factor analysis. Consider the factor analysis model $$y=\Lambda x+z, $$ where $y$ is a vector containing $p$ features and $x$ is the ...
4 votes
1 answer
128 views
On self-consistent estimators of survival functions. How do they work?
I'm reading about how the estimation of the survival function is "self-consistent" because as Efron showed, we can estimate the survival function in the presence of right censoring as $$ \...
0 votes
0 answers
73 views
On a non-standard application of Kalman filter
These questions arose when I was reading Online Appendix D for the paper Missing Events in Event Studies: Identifying the Effects of Partially-Measured News Surprises by R.S. Gurkaynak, B. Kisacikoglu ...
1 vote
0 answers
40 views
How to perform user assisted image segmentation using Gaussian Mixture Models?
I have a general idea of Gaussian Mixture Models. My understanding: GMM is a way of clustering data points which, unlike K means clustering, soft assigns them under different distributions by ...
3 votes
1 answer
125 views
In Expectation-Maximization, in the maximization step, do we maximize expectation of the log likelihood (wikipedia) or evidence lower bound (cs 229)?
From cs 229 page 6: Intuitively, the EM algorithm alternatively updates Q and θ by a) setting Q(z) = p(z|x; θ) following Equation (8) so that ELBO(x; Q, θ) = log p(x; θ) for x and the current θ, and ...
2 votes
2 answers
191 views
Why does Variational Inference work?
ELBO is a lower bound, and only matches the true likelihood when the q-distribution/encoder we choose equals to the true posterior distribution. Are there any guarantees that maximizing ELBO indeed ...