5
$\begingroup$

I am trying to understand the mechanism behind the lasso. However I want to gain some intuition in the case, that "what happens if we dont standardize our data". I find many posts but none was directly associated with the lasso path. I generated data with 3 features. Then I multiplied the first feature by 40 to see what would happen to this coefficient, since OLS would do $\frac{\beta_1}{40}$

N = 500 p = 3 X = matrix(rnorm(N*p), ncol=p) b = c(3, -.5, 1) y = X %*% b + rnorm(N, sd=.5) beta <- rep(0,dim(X)[2]) fit <- glmnet(x=X, y=y, standardize=F, intercept=F) plot(fit, xvar="lambda" , label=T) X[,1] <- X[,1] * 40 fit2 <- glmnet(x=X, y=y, standardize=F, intercept=F) plot(fit2, xvar="lambda" , label=T) coef(cv.glmnet(x=X, y=y, standardize=F, intercept=F)) 

For the first lasso solution:enter image description here

and for the 2nd (with multiplied by 40)

enter image description here

My Intuition was that Lasso would shrink this coefficient to zero, but it doesnt. What whould be the exact intuition for the lasso path and solution with "what happens if we are multiplying or dividing" one feature.

$\endgroup$
4
  • $\begingroup$ I would call this an answer, but this is really an R questions and not a stat question. glmnet automatically standardizes the data for you. Check the documentation. Of course, with things like this, 'it's trust, but verify', so doing it yourself is always a good idea. In fact the documentation claims that for logistic regressions, you probably don't want to do this. $\endgroup$ Commented Jun 20, 2018 at 16:01
  • 1
    $\begingroup$ It isnt really an R Question ,since I dont ask for the standardization itself. I just want to know what happens there if we doesnt standardize with glmnet and doing some multplication $\endgroup$ Commented Jun 20, 2018 at 16:03
  • 1
    $\begingroup$ I actually withdraw my comment and I'm working on an answer. $\endgroup$ Commented Jun 20, 2018 at 16:17
  • $\begingroup$ But off the bat I would point out that the lambda's used are very different for the two fits- reflecting the scaling of the variable. $\endgroup$ Commented Jun 20, 2018 at 16:21

1 Answer 1

3
$\begingroup$

I don't think this is a very good answer because more details should be provided, but it is an answer. The variables you have are essentially orthogonal. For orthogonal variables lasso with regularization $\lambda $ takes the ols coefficient $\beta$ and shrinks it towards zero by $\lambda$ with the caveat that once it gets to zero, it becomes zero. See "Elements of Statistical Learning" appx p. 71 for more information. That is what is happening. I think if you examine the actual set of 'lambda's the ones in the second fit are about 40x the lambdas in the first fit.

$\endgroup$
3
  • $\begingroup$ A bit late, but you are totally right. Could you may explain intuitively why the lamdas are 40 times higher ? $\endgroup$ Commented Aug 13, 2018 at 14:17
  • $\begingroup$ @Leo96- from the original question "Then I multiplied the first feature by 40 to see what would happen to this coefficient" so for one set the corresponding $\lambda $ should shrink by 40 to put the two sets of coefficients/$\lambda$ into a sensible correspondence. $\endgroup$ Commented Aug 13, 2018 at 15:14
  • $\begingroup$ What do you mean with "to put the two sets of coefficients/lambda into a sensible correspondence" $\endgroup$ Commented Aug 13, 2018 at 15:17

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.