0
$\begingroup$

I'd like to transform my data into pca (preprocessing data before I use data into classification model). I separate my data into data training and data testing.

I used princomp in R to process pca with my data training, and I've got the loadings(eigenvector) of the pca data training.

Is it correct if I multiplied datatest with the loadings from the datatraining and get the scores of pca datatesting?

$\endgroup$
0

2 Answers 2

1
$\begingroup$

Yes, that is correct; for any preprocessing done on the training set, the same must be done on the testing set.

$\endgroup$
0
1
$\begingroup$

Is it correct if I multiplied datatest with the loadings from the datatraining and get the scores of pca datatesting?

It depends on whether the you set cor = TRUE in your call to princomp. In any case, just use predict which will make sure to potentially scale, potentially center, and rotate the data. The code below shows both the "manual" way to get the result and by using predict.princomp

pc.cr <- princomp(USArrests, cor = TRUE) loadings(pc.cr) #R #R Loadings: #R Comp.1 Comp.2 Comp.3 Comp.4 #R Murder 0.536 0.418 0.341 0.649 #R Assault 0.583 0.188 0.268 -0.743 #R UrbanPop 0.278 -0.873 0.378 0.134 #R Rape 0.543 -0.167 -0.818 #R #R Comp.1 Comp.2 Comp.3 Comp.4 #R SS loadings 1.00 1.00 1.00 1.00 #R Proportion Var 0.25 0.25 0.25 0.25 #R Cumulative Var 0.25 0.50 0.75 1.00 # get rotation without `princomp` loads <- local({ # use same scales as `princomp` X <- scale(as.matrix(USArrests), scale = sapply(USArrests, function(x) sqrt(mean((x - mean(x))^2)))) C <- cov(X) out <- eigen(C)$vectors dimnames(out) <- list(colnames(USArrests), paste0("PC", seq_len(ncol(X)))) # we save the output from scale to later attributes(out) <- c(attributes(out), attributes(X)[c( "scaled:scale", "scaled:center")]) out }) loads # loadings are unique up to sign #R PC1 PC2 PC3 PC4 #R Murder -0.536 0.418 -0.341 0.649 #R Assault -0.583 0.188 -0.268 -0.743 #R UrbanPop -0.278 -0.873 -0.378 0.134 #R Rape -0.543 -0.167 0.818 0.089 #R attr(,"scaled:scale") #R Murder Assault UrbanPop Rape #R 4.31 82.50 14.33 9.27 #R attr(,"scaled:center") #R Murder Assault UrbanPop Rape #R 7.79 170.76 65.54 21.23 # there is a predict function for class class(pc.cr) #R [1] "princomp" # predict for first two rows predict(pc.cr, newdata = USArrests[1:2, ]) #R Comp.1 Comp.2 Comp.3 Comp.4 #R Alabama 0.986 1.13 0.444 0.156 #R Alaska 1.950 1.07 -2.040 -0.439 # reproduce the above tmp <- scale( USArrests[1:2, ], scale = attr(loads, "scaled:scale"), center = attr(loads, "scaled:center")) tmp %*% loads # they are unique up to sign #R PC1 PC2 PC3 PC4 #R Alabama -0.986 1.13 -0.444 0.156 #R Alaska -1.950 1.07 2.040 -0.439 
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.