I am trying to fit a multivariate linear regression model with approximately 60 predictor variables and 30 observations, so I am using the glmnet package for regularized regression because p>n.
I have been going through documentation and other questions but I still can't interpret the results, here's a sample code (with 20 predictors and 10 observations to simplify):
I create a matrix x with num rows = num observations and num cols = num predictors and a vector y which represents the response variable
> x=matrix(rnorm(10*20),10,20) > y=rnorm(10) I fit a glmnet model leaving alpha as default (= 1 for lasso penalty)
> fit1=glmnet(x,y) > print(fit1) I understand I get different predictions with decreasing values of lambda (i.e. penalty)
Call: glmnet(x = x, y = y) Df %Dev Lambda [1,] 0 0.00000 0.890700 [2,] 1 0.06159 0.850200 [3,] 1 0.11770 0.811500 [4,] 1 0.16880 0.774600 . . . [96,] 10 0.99740 0.010730 [97,] 10 0.99760 0.010240 [98,] 10 0.99780 0.009775 [99,] 10 0.99800 0.009331 [100,] 10 0.99820 0.008907 Now I predict my Beta values choosing, for example, the smallest lambda value given from glmnet
> predict(fit1,type="coef", s = 0.008907) 21 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) -0.08872364 V1 0.23734885 V2 -0.35472137 V3 -0.08088463 V4 . V5 . V6 . V7 0.31127123 V8 . V9 . V10 . V11 0.10636867 V12 . V13 -0.20328200 V14 -0.77717745 V15 . V16 -0.25924281 V17 . V18 . V19 -0.57989929 V20 -0.22522859 If instead I choose lambda with
cv <- cv.glmnet(x,y) model=glmnet(x,y,lambda=cv$lambda.min) All of the variables would be (.).
Doubts and questions:
- I am not sure about how to choose lambda.
- Should I use the non (.) variables to fit another model? In my case I would like to keep as much variables as possible.
- How do I know the p-value, i.e. which variables significantly predict the response?
I apologize for my poor statistical knowledge! And thank you for any help.