Questions tagged [pca]
Principal component analysis (PCA) is a linear dimensionality reduction technique. It reduces a multivariate dataset to a smaller set of constructed variables preserving as much information (as much variance) as possible. These variables, called principal components, are linear combinations of the input variables.
3,455 questions
1 vote
1 answer
50 views
Vector direction of individual clusters after PCA
Suppose I have two multi-dimensional population samples - $A$ and $B$. I hypothesise that $\mathbb{E}[A]$ and $\mathbb{E}[B]$ are orthogonal in this high-dimensional space. To test this hypothesis, I ...
0 votes
0 answers
111 views
What is causing my feature importance weights to be so polarized?
I'm new to machine learning and don't post here much, but myself and my lab are a bit stumped here. I have trained an elastic net classifier on some cortical thickness (CT) data by region of interest (...
2 votes
0 answers
287 views
Interpreting angles between variables in a biplot
Upon reading the abstract of a recently published paper in ecology, I came across the claim: Our results suggest that the chromatic contrasts of colours are non-redundant with the intensity of ...
0 votes
1 answer
65 views
FAMD on large mixed dataset: low explained variance, still worth using?
I'm working with a large tabular dataset (~1.2 million rows) that includes 7 qualitative features and 3 quantitative ones. For dimensionality reduction, I'm using FAMD (Factor Analysis for Mixed Data) ...
4 votes
1 answer
69 views
What does the singular vectors in a SVD represent when having repeated measurements in the original data matrix?
I'm wondering if this is correct reasoning: SVD constructs new orthogonal vectors as linear combinations of the rows and columns in the data. In effect correlation among the original variables are ...
1 vote
0 answers
91 views
Applying Principal Component Analysis (PCA) to reduce dimensionality in multiple datasets for a classification task
I’m working with two malware datasets (dataset‑1 and dataset‑2) each with 256 features, but different ratios of malicious vs. benign samples. I’ve merged them into a third set (dataset‑3). The sample ...
0 votes
0 answers
66 views
Selecting number of PCs (principal components) to include in PCR (principal component regression)
How do you decide the number of principal components (PC) to include in principal component regression (PCR)? I have seen these methods: choosing the lowest RMSEP with the pls() package Choosing PC's ...
0 votes
0 answers
63 views
How to test for homogeneity of z-scores across members of a clinical population?
For N participants I have M measures for which a normative model is avalable. Let's assume these measures are hand finger lengths (so M=5), z=0 means the length of that finger is the mean in the ...