Equivalence between Elastic Net formulations

Question

According to Hastie's paper, the elastic net has two equivalent formulations:

$$\hat{\beta} = \underset{\beta}{\operatorname{argmin}} \left\{ \sum_{i=1}^N\left(y_i-\sum_{j=1}^p x_{ij} \beta_j\right)^2 + \lambda_1 \sum_{j=1}^p |\beta_j|+ \lambda_2 \sum_{j=1}^p \beta_j^2 \right\}$$

and

$$\hat{\beta} = \underset{\beta}{\operatorname{argmin}} \left\{ \sum_{i=1}^N\left(y_i - \sum_{j=1}^p x_{ij} \beta_j\right)^2\right\} \;\text{ s.t. } \;(1-\alpha)\sum_{j=1}^p |\beta_j| + \alpha\sum_{j=1}^p \beta_j^2 \leq t$$

where $\alpha = \frac{\lambda_2}{\lambda_1 + \lambda_2}$

My question is how to prove this equivalence formally. Ridge regression and the lasso also have these two possible formulations, but I could not find any reference where this equivalence is proven. A similar question I found in CrossValidated is this one

Lagrangian relaxation in the context of ridge regression

but I'm unable to understand Tristan's explanation. I have some understanding of Lagrange optimization theory, and I guess the answer is around those lines, but since all the papers treat the equivalence as obvious I would like to find a proper reference where this is explicitly demonstrated.

$\begingroup$ This question on just Ridge Regression is highly related. $\endgroup$

Matthew Gunn
– Matthew Gunn

2018-11-10 07:06:41 +00:00
Commented Nov 10, 2018 at 7:06 — Matthew Gunn
– Matthew Gunn, Commented Nov 10, 2018 at 7:06

Vincent Guillemot · Accepted Answer · 2013-08-26 16:43:18Z

Starting from $$\hat{\beta} = \arg \min_\beta \|X\beta - y\|_2^2 \text{ s.t. } (1-\alpha)\|\beta\|_1 + \alpha\|\beta\|_2^2 \leq t,$$ we can write the dual Lagragian formulation of this optimization problem as $$ \begin{array}{rcl} L(\beta,\alpha,\lambda) & = & \|X\beta - y\|_2^2 + \lambda \left( (1-\alpha)\|\beta\|_1 + \alpha\|\beta\|_2^2 - t\right) \\ & = & \|X\beta - y\|_2^2 + \lambda (1-\alpha)\|\beta\|_1 + \lambda\alpha\|\beta\|_2^2 - \lambda t, \end{array} $$ and we see that this indeed looks like the first problem that you wrote, with parameters $\lambda_1=\lambda (1-\alpha)$ and $\lambda_2=\lambda \alpha$, which leads to the expression of the "elastic" parameter: $$\alpha = \frac{\lambda_2}{\lambda_1+\lambda_2}.$$ That being said, to go from this point to Zou and Hastie's assertion that both problems are equivalent, I admit that I miss a step or two...

I know how to do the equivalence between ridge regression formulations (since it is differenciable) but I'm wondering if it would be any easier to consider only the $l_1$ norm instead of the elastic net penalty? — skd
– skd, Commented Sep 3, 2013 at 15:00

Stack Exchange Network

Equivalence between Elastic Net formulations

1 Answer 1

Linked

Hot Network Questions

Equivalence between Elastic Net formulations

1 Answer 1

Linked

Related

Hot Network Questions