Skip to main content
18 events
when toggle format what by license comment
Sep 13, 2024 at 4:18 answer added ritkid timeline score: 0
Apr 3, 2024 at 3:27 history edited User1865345 CC BY-SA 4.0
added 12 characters in body; edited tags
S Sep 4, 2018 at 16:16 history suggested meTchaikovsky CC BY-SA 4.0
change two $\lambda_{max}$ into $\lambda_{min}$
Sep 4, 2018 at 6:49 review Suggested edits
S Sep 4, 2018 at 16:16
Jun 1, 2018 at 16:53 answer added Sextus Empiricus timeline score: 3
Jun 1, 2018 at 16:05 comment added Sextus Empiricus I do not know the KKT conditions for non differentiable functions but the equation (9) that you quote does not seem right, it is probably a typo or something like that. The left hand side is the gradient of $\frac{1}{2}||y-X\beta||_2^2$ but it should be the directional derivative ($\nabla_\vec{s}$) instead.
Jun 1, 2018 at 14:54 comment added rook1996 Look at my edit. That are notes from lecture slide of Tibshirani in Stanford. Also look at the other forum post. They did the same
Jun 1, 2018 at 14:53 history edited rook1996 CC BY-SA 4.0
added 127 characters in body
Jun 1, 2018 at 14:30 comment added Sextus Empiricus @rook1996 can you give a reference where you obtained the condition using subgradients. I believe there must be some mistake there. There is not a single 'the subgradient' as any vector is a subgradient for $\Vert \beta \Vert_1$ when $\beta = 0$. So you will have to maximize over all the different possible subgradients. Also you will probably have to take something like the directional derivative (along the direction of the 'chosen' subgradient) for the first term instead of the gradient. If you apply this you should be able to continue working it out.
Jun 1, 2018 at 13:49 comment added rook1996 Already read that, but I couldnt continue in my mathematical derivation
Jun 1, 2018 at 13:32 comment added Sextus Empiricus In this question you may find a lot of intuition stats.stackexchange.com/questions/289075/…
Jun 1, 2018 at 13:31 comment added Sextus Empiricus The KKT conditions are tricky when $\Vert \beta \Vert_1$ has no well defined gradient everywhere. I don' know your expression so well but are you sure you should not have something like the subgradient on both sides of the equation?
Jun 1, 2018 at 9:02 comment added rook1996 Why I can apply there the infinity norm on both sides? I've read first time there about this norm (my math skills are very horrible). Thats the step: "$\frac{1}{n} \|X^T y \|_\infty = \lambda \|\hat{z}_{\lambda}\|_\infty.$" I dont get. Is the infinity norm just like an operator, that I can apply on both sides ?
Jun 1, 2018 at 8:49 comment added deasmhumnha You're not too far from the derivation you linked to. Just remember that $s$ is a vector so you can't just divide $X^Ty$ by $s$. Look over the other answer again, noting $\hat{z}_{\lambda,j}$ is the subgradient and the infinity norm is the maximum absolute value of the elements of a vector.
Jun 1, 2018 at 8:20 history edited rook1996 CC BY-SA 4.0
added 234 characters in body
Jun 1, 2018 at 8:16 comment added rook1996 This should be the solution where all coefficients are zeor. My last equation is in Matrix/vector form, so $\beta$ is a vector. The result, stated and the end is from the texbook. I updated my post, regarding the exercise
Jun 1, 2018 at 8:13 comment added Ruben van Bergen It looks to me like you've derived the solution for when there is a singe variable in the model (with a single coefficient $\beta$). How would this generalize to a case with multiple variables, where all the coefficients need to be 0? What $\lambda$ would you need then?
Jun 1, 2018 at 7:58 history asked rook1996 CC BY-SA 4.0