Suppose I use Least Squares to estimate coefficients in the standard linear model with design matrix $X$'s columns standardized, so the model is $$ E[y] = X^*\beta^* $$ where $X^*$ is $X$ with columns centered and scaled. Assume $X$ has full column rank. To recover the regression coefficients $\beta$ in the model with unstandardized X $$ E[y] = X\beta $$ I should be able to use the equation $$ \qquad \quad X\beta = X^*\beta^* \\ \implies X^T X\beta = X^TX^*\beta^* \\ \qquad \quad \; \; \; \; \implies \qquad \beta = (X^T X)^{-1}X^TX^*\beta^* $$ To test this I ran some simple R code:
set.seed(100) # set number of samples N <- 10 # set number of regressors p <- 3 y <- runif(N) X <- matrix(runif(N*p), N) # Least squares coefficients with unstandardized X (b1 <- solve(crossprod(X), crossprod(X,y))) # now standardize X and get new coefficients Xs <- scale(X) (b2 <- solve(crossprod(Xs), crossprod(Xs,y))) # b.orig should be exactly the same as b1, but its not! (b.orig <- solve(crossprod(X), crossprod(X, Xs %*% b2))) Here is the output of b1:
[,1] [1,] 0.17109189 [2,] 0.52204169 [3,] -0.02115178 and the output of b.orig, which should equal b1, but does not:
[,1] [1,] -0.15935376 [2,] 0.16915433 [3,] -0.04696604 What is going wrong here? I have already looked at:http://bit.ly/1HxHS6U, who seems to use a similar idea (equating expected or fitted values of y in both equations).