7
$\begingroup$

According to Hastie's paper, the elastic net has two equivalent formulations:

$$\hat{\beta} = \underset{\beta}{\operatorname{argmin}} \left\{ \sum_{i=1}^N\left(y_i-\sum_{j=1}^p x_{ij} \beta_j\right)^2 + \lambda_1 \sum_{j=1}^p |\beta_j|+ \lambda_2 \sum_{j=1}^p \beta_j^2 \right\}$$

and

$$\hat{\beta} = \underset{\beta}{\operatorname{argmin}} \left\{ \sum_{i=1}^N\left(y_i - \sum_{j=1}^p x_{ij} \beta_j\right)^2\right\} \;\text{ s.t. } \;(1-\alpha)\sum_{j=1}^p |\beta_j| + \alpha\sum_{j=1}^p \beta_j^2 \leq t$$

where $\alpha = \frac{\lambda_2}{\lambda_1 + \lambda_2}$

My question is how to prove this equivalence formally. Ridge regression and the lasso also have these two possible formulations, but I could not find any reference where this equivalence is proven. A similar question I found in CrossValidated is this one

Lagrangian relaxation in the context of ridge regression

but I'm unable to understand Tristan's explanation. I have some understanding of Lagrange optimization theory, and I guess the answer is around those lines, but since all the papers treat the equivalence as obvious I would like to find a proper reference where this is explicitly demonstrated.

$\endgroup$
1
  • $\begingroup$ This question on just Ridge Regression is highly related. $\endgroup$ Commented Nov 10, 2018 at 7:06

1 Answer 1

5
$\begingroup$

Starting from $$\hat{\beta} = \arg \min_\beta \|X\beta - y\|_2^2 \text{ s.t. } (1-\alpha)\|\beta\|_1 + \alpha\|\beta\|_2^2 \leq t,$$ we can write the dual Lagragian formulation of this optimization problem as $$ \begin{array}{rcl} L(\beta,\alpha,\lambda) & = & \|X\beta - y\|_2^2 + \lambda \left( (1-\alpha)\|\beta\|_1 + \alpha\|\beta\|_2^2 - t\right) \\ & = & \|X\beta - y\|_2^2 + \lambda (1-\alpha)\|\beta\|_1 + \lambda\alpha\|\beta\|_2^2 - \lambda t, \end{array} $$ and we see that this indeed looks like the first problem that you wrote, with parameters $\lambda_1=\lambda (1-\alpha)$ and $\lambda_2=\lambda \alpha$, which leads to the expression of the "elastic" parameter: $$\alpha = \frac{\lambda_2}{\lambda_1+\lambda_2}.$$ That being said, to go from this point to Zou and Hastie's assertion that both problems are equivalent, I admit that I miss a step or two...

$\endgroup$
1
  • $\begingroup$ I know how to do the equivalence between ridge regression formulations (since it is differenciable) but I'm wondering if it would be any easier to consider only the $l_1$ norm instead of the elastic net penalty? $\endgroup$ Commented Sep 3, 2013 at 15:00

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.