3
$\begingroup$

Background

I have been trying to understand Stanford CS 229’s lecture about Factor Analysis and the accompanying lecture notes. The lecturer introduced Factor Analysis as a way to mitigate the singular covariance matrix problem that arises when trying to model certain datasets using a Gaussian distribution.

As I understood it, the issue was with datasets that have Perfect Multicollinearity. The specific example the lecturer gave was when the number of training examples is less than or equal to the number of dimensions ($m \le n$). Trying to use Maximum Likelihood Estimation to estimate a Gaussian distribution from such datasets would result in a covariance matrix that was singular.

Question

Factor Analysis does the following:

  1. Introduce $k$ latent independent variables, denoted as the vector $z$, where $k < n$.
  2. Model the observed variables $x$ as a linear combination of $z$ plus axes-aligned Gaussian noise $\epsilon$.

$$z ∼ \mathcal{N}(0, I)$$ $$\epsilon ∼ \mathcal{N}(0, \Psi)$$ $$x = \mu + \Lambda z + \epsilon$$

The marginal distribution of $x$ ends up being the following: $$x ∼ \mathcal{N}(\mu, \Lambda \Lambda^\mathsf{T} + \Psi)$$

After optimizing the parameters $\mu$, $\Lambda$, and $\Psi$ using the Expectation Maximization algorithm, is it guaranteed that the covariance matrix $\Lambda \Lambda^\mathsf{T} + \Psi$ is invertible? If yes, is it unconditionally true or only true under certain conditions (e.g. $m > k$)? If possible, please explain both the conceptual reason and the mathematical reason.

$\endgroup$
4
  • $\begingroup$ In your last paragraph you have $m \gt k$ but I think you mean n. $\endgroup$ Commented Apr 9, 2024 at 9:17
  • $\begingroup$ @PeterFlom I actually did mean $m \gt k$. The original issue happened when $m \le n$. Since Factor Analysis is a dimensionality reduction technique, I was wondering if the original issue would still happen when $m \le k$ where $k$ is the new number of dimensions. $\endgroup$ Commented Apr 9, 2024 at 20:56
  • $\begingroup$ OK, but, if so, could you define m in your question please? $\endgroup$ Commented Apr 10, 2024 at 11:16
  • $\begingroup$ $m$ is the number of training examples. $\endgroup$ Commented Apr 11, 2024 at 1:52

2 Answers 2

2
$\begingroup$

I can't give you the math, but

  1. That's an odd way to introduce factor analysis (FA)! And I might prefer principal component analysis (PCA), depending on the circumstances. FA is a way of uncovering latent variables. PCA is a way of reducing the dimensionality while losing as little information as possible.

  2. Factor analysis can involve orthogonal or oblique rotations. The former methods are (I think) more commonly used and, well, orthogonal is pretty much the opposite of colinear.

$\endgroup$
3
  • $\begingroup$ 1. The subsequent lecture made it a little clearer: youtube.com/watch?v=I_c6w1SJSJs&t=3198s. For probability density estimation, one would use something like Mixture of Gaussians or Factor Analysis. When you don’t care about estimating the probability, one could use something like K-Means Clustering or Principal Component Analysis. $\endgroup$ Commented Apr 13, 2024 at 5:48
  • $\begingroup$ 2. Not sure I understood this comment. Is this something that is done in machine learning? (That’s my focus here; sorry, I should have made that clear.) $\endgroup$ Commented Apr 13, 2024 at 5:51
  • $\begingroup$ Your #1 doesn't really make sense to me. Clustering is not really an alternative to factor analysis. #2 It is something in factor analysis. Whether you use FA in machine learning or anything else, you have to know about rotations. (This is another reason why this is a strange way to introduce factor analysis). $\endgroup$ Commented Apr 13, 2024 at 7:32
0
$\begingroup$

Factor Analysis does not completely mitigate the singular covariance matrix problem!

I created a Jupyter notebook to gather some empirical evidence. The notebook uses Factor Analysis to generate a model and then finds the minimum training set size needed to avoid the singular covariance matrix problem. Here are the results:

A graph of the relationship between the number of factors and the minimum training set size needed to avoid singularity

As you can see, the training set size still needs to be greater than the number of factors to avoid the singular covariance matrix problem.

I don’t know the mathematical reason but I guess it makes sense conceptually. Factor Analysis reduces the dimensionality to $k$ factors, so the training set still needs to have more than $k$ examples to avoid Perfect Multicollinearity. That being said, Factor Analysis is still useful for mitigating the singular covariance matrix problem in small training sets because one can choose $k$ to be less than the training set size.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.