Background
I have been trying to understand Stanford CS 229’s lecture about Factor Analysis and the accompanying lecture notes. The lecturer introduced Factor Analysis as a way to mitigate the singular covariance matrix problem that arises when trying to model certain datasets using a Gaussian distribution.
As I understood it, the issue was with datasets that have Perfect Multicollinearity. The specific example the lecturer gave was when the number of training examples is less than or equal to the number of dimensions ($m \le n$). Trying to use Maximum Likelihood Estimation to estimate a Gaussian distribution from such datasets would result in a covariance matrix that was singular.
Question
Factor Analysis does the following:
- Introduce $k$ latent independent variables, denoted as the vector $z$, where $k < n$.
- Model the observed variables $x$ as a linear combination of $z$ plus axes-aligned Gaussian noise $\epsilon$.
$$z ∼ \mathcal{N}(0, I)$$ $$\epsilon ∼ \mathcal{N}(0, \Psi)$$ $$x = \mu + \Lambda z + \epsilon$$
The marginal distribution of $x$ ends up being the following: $$x ∼ \mathcal{N}(\mu, \Lambda \Lambda^\mathsf{T} + \Psi)$$
After optimizing the parameters $\mu$, $\Lambda$, and $\Psi$ using the Expectation Maximization algorithm, is it guaranteed that the covariance matrix $\Lambda \Lambda^\mathsf{T} + \Psi$ is invertible? If yes, is it unconditionally true or only true under certain conditions (e.g. $m > k$)? If possible, please explain both the conceptual reason and the mathematical reason.
