Does Factor Analysis completely mitigate the singular covariance matrix problem?

Question

Background

I have been trying to understand Stanford CS 229’s lecture about Factor Analysis and the accompanying lecture notes. The lecturer introduced Factor Analysis as a way to mitigate the singular covariance matrix problem that arises when trying to model certain datasets using a Gaussian distribution.

As I understood it, the issue was with datasets that have Perfect Multicollinearity. The specific example the lecturer gave was when the number of training examples is less than or equal to the number of dimensions ($m \le n$). Trying to use Maximum Likelihood Estimation to estimate a Gaussian distribution from such datasets would result in a covariance matrix that was singular.

Question

Factor Analysis does the following:

Introduce $k$ latent independent variables, denoted as the vector $z$, where $k < n$.
Model the observed variables $x$ as a linear combination of $z$ plus axes-aligned Gaussian noise $\epsilon$.

$$z ∼ \mathcal{N}(0, I)$$ $$\epsilon ∼ \mathcal{N}(0, \Psi)$$ $$x = \mu + \Lambda z + \epsilon$$

The marginal distribution of $x$ ends up being the following: $$x ∼ \mathcal{N}(\mu, \Lambda \Lambda^\mathsf{T} + \Psi)$$

After optimizing the parameters $\mu$, $\Lambda$, and $\Psi$ using the Expectation Maximization algorithm, is it guaranteed that the covariance matrix $\Lambda \Lambda^\mathsf{T} + \Psi$ is invertible? If yes, is it unconditionally true or only true under certain conditions (e.g. $m > k$)? If possible, please explain both the conceptual reason and the mathematical reason.

In your last paragraph you have $m \gt k$ but I think you mean n. — Peter Flom
– Peter Flom, Commented Apr 9, 2024 at 9:17
@PeterFlom I actually did mean $m \gt k$. The original issue happened when $m \le n$. Since Factor Analysis is a dimensionality reduction technique, I was wondering if the original issue would still happen when $m \le k$ where $k$ is the new number of dimensions. — fumoboy007
– fumoboy007, Commented Apr 9, 2024 at 20:56

Peter Flom · Accepted Answer · 2024-04-09 09:26:13Z

2

I can't give you the math, but

That's an odd way to introduce factor analysis (FA)! And I might prefer principal component analysis (PCA), depending on the circumstances. FA is a way of uncovering latent variables. PCA is a way of reducing the dimensionality while losing as little information as possible.
Factor analysis can involve orthogonal or oblique rotations. The former methods are (I think) more commonly used and, well, orthogonal is pretty much the opposite of colinear.

answered Apr 9, 2024 at 9:26

Peter Flom

141k37 gold badges201 silver badges484 bronze badges

$\begingroup$ 1. The subsequent lecture made it a little clearer: youtube.com/watch?v=I_c6w1SJSJs&t=3198s. For probability density estimation, one would use something like Mixture of Gaussians or Factor Analysis. When you don’t care about estimating the probability, one could use something like K-Means Clustering or Principal Component Analysis. $\endgroup$

fumoboy007
– fumoboy007

2024-04-13 05:48:37 +00:00
Commented Apr 13, 2024 at 5:48
$\begingroup$ 2. Not sure I understood this comment. Is this something that is done in machine learning? (That’s my focus here; sorry, I should have made that clear.) $\endgroup$

fumoboy007
– fumoboy007

2024-04-13 05:51:38 +00:00
Commented Apr 13, 2024 at 5:51
$\begingroup$ Your #1 doesn't really make sense to me. Clustering is not really an alternative to factor analysis. #2 It is something in factor analysis. Whether you use FA in machine learning or anything else, you have to know about rotations. (This is another reason why this is a strange way to introduce factor analysis). $\endgroup$

Peter Flom
– Peter Flom

2024-04-13 07:32:43 +00:00
Commented Apr 13, 2024 at 7:32

Add a comment |

fumoboy007 · Accepted Answer · 2024-04-13 05:56:07Z

Factor Analysis does not completely mitigate the singular covariance matrix problem!

I created a Jupyter notebook to gather some empirical evidence. The notebook uses Factor Analysis to generate a model and then finds the minimum training set size needed to avoid the singular covariance matrix problem. Here are the results:

As you can see, the training set size still needs to be greater than the number of factors to avoid the singular covariance matrix problem.

I don’t know the mathematical reason but I guess it makes sense conceptually. Factor Analysis reduces the dimensionality to $k$ factors, so the training set still needs to have more than $k$ examples to avoid Perfect Multicollinearity. That being said, Factor Analysis is still useful for mitigating the singular covariance matrix problem in small training sets because one can choose $k$ to be less than the training set size.

Stack Exchange Network

Does Factor Analysis completely mitigate the singular covariance matrix problem?

Background

Question

2 Answers 2

Hot Network Questions

Does Factor Analysis completely mitigate the singular covariance matrix problem?

Background

Question

2 Answers 2

Related

Hot Network Questions