Timeline for LASSO: Deriving the smallest lambda at which all coefficient are zero
Current License: CC BY-SA 4.0
18 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Sep 13, 2024 at 4:18 | answer | added | ritkid | timeline score: 0 | |
| Apr 3, 2024 at 3:27 | history | edited | User1865345 | CC BY-SA 4.0 | added 12 characters in body; edited tags |
| S Sep 4, 2018 at 16:16 | history | suggested | meTchaikovsky | CC BY-SA 4.0 | change two $\lambda_{max}$ into $\lambda_{min}$ |
| Sep 4, 2018 at 6:49 | review | Suggested edits | |||
| S Sep 4, 2018 at 16:16 | |||||
| Jun 1, 2018 at 16:53 | answer | added | Sextus Empiricus | timeline score: 3 | |
| Jun 1, 2018 at 16:05 | comment | added | Sextus Empiricus | I do not know the KKT conditions for non differentiable functions but the equation (9) that you quote does not seem right, it is probably a typo or something like that. The left hand side is the gradient of $\frac{1}{2}||y-X\beta||_2^2$ but it should be the directional derivative ($\nabla_\vec{s}$) instead. | |
| Jun 1, 2018 at 14:54 | comment | added | rook1996 | Look at my edit. That are notes from lecture slide of Tibshirani in Stanford. Also look at the other forum post. They did the same | |
| Jun 1, 2018 at 14:53 | history | edited | rook1996 | CC BY-SA 4.0 | added 127 characters in body |
| Jun 1, 2018 at 14:30 | comment | added | Sextus Empiricus | @rook1996 can you give a reference where you obtained the condition using subgradients. I believe there must be some mistake there. There is not a single 'the subgradient' as any vector is a subgradient for $\Vert \beta \Vert_1$ when $\beta = 0$. So you will have to maximize over all the different possible subgradients. Also you will probably have to take something like the directional derivative (along the direction of the 'chosen' subgradient) for the first term instead of the gradient. If you apply this you should be able to continue working it out. | |
| Jun 1, 2018 at 13:49 | comment | added | rook1996 | Already read that, but I couldnt continue in my mathematical derivation | |
| Jun 1, 2018 at 13:32 | comment | added | Sextus Empiricus | In this question you may find a lot of intuition stats.stackexchange.com/questions/289075/… | |
| Jun 1, 2018 at 13:31 | comment | added | Sextus Empiricus | The KKT conditions are tricky when $\Vert \beta \Vert_1$ has no well defined gradient everywhere. I don' know your expression so well but are you sure you should not have something like the subgradient on both sides of the equation? | |
| Jun 1, 2018 at 9:02 | comment | added | rook1996 | Why I can apply there the infinity norm on both sides? I've read first time there about this norm (my math skills are very horrible). Thats the step: "$\frac{1}{n} \|X^T y \|_\infty = \lambda \|\hat{z}_{\lambda}\|_\infty.$" I dont get. Is the infinity norm just like an operator, that I can apply on both sides ? | |
| Jun 1, 2018 at 8:49 | comment | added | deasmhumnha | You're not too far from the derivation you linked to. Just remember that $s$ is a vector so you can't just divide $X^Ty$ by $s$. Look over the other answer again, noting $\hat{z}_{\lambda,j}$ is the subgradient and the infinity norm is the maximum absolute value of the elements of a vector. | |
| Jun 1, 2018 at 8:20 | history | edited | rook1996 | CC BY-SA 4.0 | added 234 characters in body |
| Jun 1, 2018 at 8:16 | comment | added | rook1996 | This should be the solution where all coefficients are zeor. My last equation is in Matrix/vector form, so $\beta$ is a vector. The result, stated and the end is from the texbook. I updated my post, regarding the exercise | |
| Jun 1, 2018 at 8:13 | comment | added | Ruben van Bergen | It looks to me like you've derived the solution for when there is a singe variable in the model (with a single coefficient $\beta$). How would this generalize to a case with multiple variables, where all the coefficients need to be 0? What $\lambda$ would you need then? | |
| Jun 1, 2018 at 7:58 | history | asked | rook1996 | CC BY-SA 4.0 |