45
$\begingroup$

I am trying to fit a multivariate linear regression model with approximately 60 predictor variables and 30 observations, so I am using the glmnet package for regularized regression because p>n.

I have been going through documentation and other questions but I still can't interpret the results, here's a sample code (with 20 predictors and 10 observations to simplify):

I create a matrix x with num rows = num observations and num cols = num predictors and a vector y which represents the response variable

> x=matrix(rnorm(10*20),10,20) > y=rnorm(10) 

I fit a glmnet model leaving alpha as default (= 1 for lasso penalty)

> fit1=glmnet(x,y) > print(fit1) 

I understand I get different predictions with decreasing values of lambda (i.e. penalty)

Call: glmnet(x = x, y = y) Df %Dev Lambda [1,] 0 0.00000 0.890700 [2,] 1 0.06159 0.850200 [3,] 1 0.11770 0.811500 [4,] 1 0.16880 0.774600 . . . [96,] 10 0.99740 0.010730 [97,] 10 0.99760 0.010240 [98,] 10 0.99780 0.009775 [99,] 10 0.99800 0.009331 [100,] 10 0.99820 0.008907 

Now I predict my Beta values choosing, for example, the smallest lambda value given from glmnet

> predict(fit1,type="coef", s = 0.008907) 21 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) -0.08872364 V1 0.23734885 V2 -0.35472137 V3 -0.08088463 V4 . V5 . V6 . V7 0.31127123 V8 . V9 . V10 . V11 0.10636867 V12 . V13 -0.20328200 V14 -0.77717745 V15 . V16 -0.25924281 V17 . V18 . V19 -0.57989929 V20 -0.22522859 

If instead I choose lambda with

cv <- cv.glmnet(x,y) model=glmnet(x,y,lambda=cv$lambda.min) 

All of the variables would be (.).

Doubts and questions:

  1. I am not sure about how to choose lambda.
  2. Should I use the non (.) variables to fit another model? In my case I would like to keep as much variables as possible.
  3. How do I know the p-value, i.e. which variables significantly predict the response?

I apologize for my poor statistical knowledge! And thank you for any help.

$\endgroup$
2
  • $\begingroup$ Maybe have a look at CRAN package hdi, that one provides inference for high-dimensional models... $\endgroup$ Commented Apr 6, 2017 at 15:12
  • $\begingroup$ For the full explanation of the methods used I refer you to this paper: projecteuclid.org/euclid.ss/1449670857 $\endgroup$ Commented Apr 6, 2017 at 15:30

2 Answers 2

50
$\begingroup$

Here's an unintuitive fact - you're not actually supposed to give glmnet a single value of lambda. From the documentation here:

Do not supply a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.

cv.glmnet will help you choose lambda, as you alluded to in your examples. The authors of the glmnet package suggest cv$lambda.1se instead of cv$lambda.min, but in practice I've had success with the latter.

After running cv.glmnet, you don't have to rerun glmnet! Every lambda in the grid (cv$lambda) has already been run. This technique is called "Warm Start" and you can read more about it here. Paraphrasing from the introduction, the Warm Start technique reduces running time of iterative methods by using the solution of a different optimization problem (e.g., glmnet with a larger lambda) as the starting value for a later optimization problem (e.g., glmnet with a smaller lambda).

To extract the desired run from cv.glmnet.fit, try this:

small.lambda.index <- which(cv$lambda == cv$lambda.min) small.lambda.betas <- cv$glmnet.fit$beta[, small.lambda.index] 

Revision (1/28/2017)

No need to hack to the glmnet object like I did above; take @alex23lemm's advice below and pass the s = "lambda.min", s = "lambda.1se" or some other number (e.g., s = .007) to both coef and predict. Note that your coefficients and predictions depend on this value which is set by cross validation. Use a seed for reproducibility! And don't forget that if you don't supply an "s" in coef and predict, you'll be using the default of s = "lambda.1se". I have warmed up to that default after seeing it work better in a small data situation. s = "lambda.1se" also tends to provide more regularization, so if you're working with alpha > 0, it will also tend towards a more parsimonious model. You can also choose a numerical value of s with the help of plot.glmnet to get to somewhere in between (just don't forget to exponentiate the values from the x axis!).

$\endgroup$
11
  • 1
    $\begingroup$ Thank you! This helps... do you maybe have an answer for questions 2 and 3? $\endgroup$ Commented Nov 24, 2013 at 23:01
  • 3
    $\begingroup$ Ha no worries. The (.)s represent zeros. Since you went with Lasso, you've specified that you want a "sparse" solution (i.e., lots of zeros). If you want them all to have values, set alpha = 0. Now you've went from Lasso to Ridge regression. p-values for glmnet are conceptually tricky. If you google search "p-values for lasso", for instance, you'll see a lot of recent research and debate. I even read one account (source amnesia) where the author argued that p-values do not make sense for biased regressions such as lasso and ridge regression. $\endgroup$ Commented Nov 24, 2013 at 23:29
  • 7
    $\begingroup$ An alternative way to extract the coefficients associated with the value of lambda that gives the minimum cvm is the following: small.lambda.betas <- coef(cv, s = "lambda.min") $\endgroup$ Commented Nov 24, 2015 at 10:55
  • 1
    $\begingroup$ @BenOgorek, excellent update! Another useful reference is Friedman J, Hastie T, Hoefling H, Tibshirani R. Pathwise coordinate optimization. Annals of Applied Statistics. 2007;2(1):302–332. (arxiv.org/pdf/0708.1485.pdf) $\endgroup$ Commented Jun 11, 2016 at 20:00
  • 2
    $\begingroup$ @erosennin, check out the lambda argument of cv.glmnet: "Optional user-supplied lambda sequence; default is NULL, and glmnet chooses its own sequence." You'll want to use the warm start principle and begin the sequence with some larger values of lambda before decreasing to the range you're interested in. $\endgroup$ Commented Jan 28, 2019 at 16:38
3
$\begingroup$

Q1) I am not sure about how to choose lambda. Q2) Should I use the non (.) variables to fit another model? In my case I would like to keep as much variables as possible.

As per @BenOgorek's great answer, typically you let fitting use an entire lambda sequence, then when extracting the optimal coefficients use the lambda.1se value (unlike what you did).

As long as you follow the three caveats below, then do not fight the regularization or tweak the model: if a variable was omitted, then it was because it gave lower overall penalty. The caveats are:

  1. For the regularized coeffts to be meaningful, make sure you explicitly normalized the variable's mean and stdev beforehand with scale(); don't rely on glmnet(standardize=T). For justification see Is standardisation before Lasso really necessary?; basically a variable with large values might get unfairly punished in regularization.

  2. To be reproducible, run with set.seed with several random-seeds and check the regularized coefficients for stability.

  3. If you want less harsh regularization i.e. more variables included, use alpha < 1 (i.e. proper elastic-net) rather than simple ridge. I suggest you sweep alpha from 0 to 1. If you're going to do that, then to avoid overfitting the hyperparameter alpha and the regression error, you must use crossvalidation, i.e. use cv.glmnet() rather than simple glmnet():

.

for (alpha in c(0,.1,.3,.5,.7,.9,1)) { fit <- cv.glmnet(..., alpha=alpha, nfolds=...) # Look at the CVE at lambda.1se to find the minimum for this alpha value... } 

If you want to automate such a gridsearch with CV, you can either code it yourself or use caret package on top of glmnet; caret does this well. For cv.glmnet nfolds parameter value, pick 3 (minimum) if your dataset is small, or 5 or 10 if it's large.

Q3) How do I know the p-value, i.e. which variables significantly predict the response?

Don't, they're not meaningful. As explained in detail in Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?

Just let cv.glmnet() do the variable selection automatically. With the caveats above. And of course the distribution of the response variable should be normal (assuming you're using family='gaussian').

$\endgroup$
2
  • $\begingroup$ Thanks for the very helpful comment! I also experienced that standardizing the variables myself seems to work rather than using the glmnet(standardize=T). $\endgroup$ Commented Sep 1, 2017 at 8:44
  • $\begingroup$ I have a question @smci, about the beta values returned by cvglmnet though. I understand that they are beta values at each grid point of attempted lambda values. However, are the beta values returned for each lambda value (1) the average coefficient values from the 10 folds (assuming I used 10foldCV), (2) the beta values from the fold that gave best accuracy, or (3) the coefficients from re-running the model on the whole dataset? $\endgroup$ Commented Sep 1, 2017 at 8:51

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.