0
$\begingroup$

Please forgive if this is a repeat but I couldn't find a similar question (at least as it pertains to me).

I have a database of 30,000 images of digits (0-9). Every image is 28*28. So, every image is represented by a row of 785 columns with the first column as a label (whether the digit is 0 or 1 or … 9) and columns 2: 785 having a value of black(from 0-255).

Now, I have the pca using pca <- prcomp(df[, -1], center=TRUE) with 784 PCAs. I also have a mean of all the digits using meanDig <- apply(df[, -1], 2, mean) (I don't know if this mean is useful or not).

Now, I am being asked to recreate the '100th' image from the database using first 15 PCs.

I understand from this question and other related questions how to recreate one single image from the PCA of that single image.

But if I have PCA of a collective of 30000 images, is it possible to recreate the one single image?

I tried:

recreation <- pca$x[, 1:15] %*% t(pca$rotation[, 1:15]) # which gives me a matrix of 23520000 elements. # I am not sure how I can recreate a 28*28 image? # Then I thought maybe I can do this for the "15th" row: recreation <- pca$x[100, 1:15] %*% t(pca$rotation[100, 1:15]) # but I am not sure what this even means. 

Any suggestions?

Edit #1

I am adding more information after taking @chechy_levas suggestions into consideration.

# Read the data: df <- read.csv("classDigits.csv") head(df[, 1:5]) label pixel0 pixel1 pixel2 pixel3 1 2 0 0 0 0 2 4 0 0 0 0 3 7 0 0 0 0 4 2 0 0 0 0 5 8 0 0 0 0 6 9 0 0 0 0 # Calculate PCA: pca <- prcomp(df[, 2:785], center = TRUE) head(pca$rotation[, 1:3]) PC1 PC2 PC3 pixel0 2.219274e-20 -5.732181e-19 6.287447e-20 pixel1 2.081668e-17 1.110223e-16 2.081668e-17 pixel2 -1.942890e-16 0.000000e+00 4.857226e-17 pixel3 -1.387779e-16 1.110223e-16 4.336809e-17 pixel4 5.551115e-17 0.000000e+00 -1.387779e-17 pixel5 1.110223e-16 1.387779e-16 2.081668e-17 # calculate the mean digit meanImage <- apply(df[, 2:785], 2, mean) # mean image looks like this: 

Mean image

# The 15th image for reference: 

Image on 15th row

# recreate the image at row 15 with 15 PC. img15 <- pca$x[15, 1:15] %*% t(pca$rotation[, 1:15]) img15 <- img15 + meanImage # Image with 15 pc: 

Image 15 with 15 PC

# recreate the image at row 15 with 100 PC. img15 <- pca$x[15, 1:100] %*% t(pca$rotation[, 1:100]) img15 <- img15 + meanImage # Image with 100 pc: 

Image 15 with 100 PC

# Image with 200 PC 

Image 15 with 200 PC

$\endgroup$
0

1 Answer 1

1
$\begingroup$

Note that PCA finds a set of orthogonal vectors that point in the direction of most variation. These vectors represent a rotation of the data. In R, these vectors are in the 'rotation' matrix.

If you multiply your input data by the rotation matrix, then you get a rotated version of your data. This is contained in the 'x' matrix. You can think of the columns of x as independent factors in decreasing order of importance.

Usually, you can retain most of the variation in the data by discarding most of the least important factors. It looks like you are trying to keep the most important 15 factors. Nothing wrong with this. You also need to add back the mean.

sig = matrix(0.5, ncol = 5, nrow = 5) means = c(5,4,3,2,1) diag(sig) = c(5,4,3,2,1) library(MASS) Y = mvrnorm(n = 10000, mu = means, Sigma = sig) p = prcomp(Y, center = T) factors = p$x[,1:2] unrotated = factors %*% t(p$rotation[, 1:2]) recreated = t(apply(unrotated, 1, function(row){row + means})) plot(x = 1:5, y = Y[1,1:5], type = "l", lwd = 3) lines(x = 1:5, y = recreated[1, 1:5], col = "red") #recreating just the first row of Y U1 = factors[1,] %*% t(p$rotation[, 1:2]) R1 = U1 + means all.equal(R1[1,], recreated[1,]) 

In your case, each row represents a different image.

$\endgroup$
1
  • $\begingroup$ Thanks Chechy. With your suggestions, I was able to understand how to recreate an image from a single row. $\endgroup$ Commented Sep 27, 2018 at 3:59

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.