3
$\begingroup$

I'm new to PCA and I'm trying to apply it to a dataset I have with 15 different features. I normalized my dataset before applying PCA and used the PCA method in the decomposition function from sklearn. I was hoping that a few PCAs (probably less than 10) would be able to explain 90% of the variance of the data matrix. But the cumulative variance I got is

 [0.21 0.323 0.413 0.486 0.555 0.619 0.681 0.74 0.794 0.844 0.89 0.922 0.953 0.981 0.9998] 

which means that I need at least 12 PCs to explain 90% of the variance and 15 PCs to explain 100%. So it seems like PCA does not reduce the dimensionality of my dataset. Does it mean that the 15 features I have are not redundant? like there is no redundancy in my dataset and it's better not to eliminate any of the 15 features I currently have?

Below is my features correlation heatmap. enter image description here

$\endgroup$
4
  • $\begingroup$ Hard to say withouth looking to all factor loadings. Perhaps two or three variables are highly correlated. You could check a correlation matrix of your variables to see if it is the case: towardsdatascience.com/… $\endgroup$ Commented Nov 20, 2020 at 1:35
  • $\begingroup$ Thanks for the input! I added the correlation heatmap. $\endgroup$ Commented Nov 20, 2020 at 12:50
  • 3
    $\begingroup$ Most of those correlations are quite small in magnitude so your PCA results are not surprising. $\endgroup$ Commented Nov 20, 2020 at 13:05
  • 1
    $\begingroup$ Take it to the extreme and consider what PCA does when your observed variables are uncorrelated (so a diagonal (empirical) covariance matrix). $\endgroup$ Commented Nov 20, 2020 at 13:09

1 Answer 1

2
$\begingroup$

Yes. Because you have a fairly small amount of features already (15), it makes sense if you weren't able to reduce the dimensionality much further without reducing the explanation for variance. PCA is often done on datasets with hundreds or thousands of features to reduce the dataset. Although, to note, if you did have highly-redundant features, it is also totally possible that you could have reduced your dataset down to fewer features.

Yes, you are right that there is not much redundancy in your dataset and you shouldn't reduce the features.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.