I'm working with a large tabular dataset (~1.2 million rows) that includes 7 qualitative features and 3 quantitative ones. For dimensionality reduction, I'm using FAMD (Factor Analysis for Mixed Data) in R, using FactoMineR and factoextra.
I've tried several encoding strategies and grouped categories to reduce sparsity, but the best I can get is 4.5% variance explained by the first component, and 2.5% by the second.
My main goal is to use the 2D representation for distance-based analysis (e.g., clustering, similarity), though it would be great if it could also support some modeling.
Has anyone here used FAMD in a similar context? Is it normal to get such low explained variance with mixed data? Would you still proceed with it, or consider other approaches?
Thanks!