What is the behavior of a Lasso estimator if it is used in a dataset with more predictors (p) than observations (n), where all predictors are uncorrelated but highly relevant to π¦ y with exactly the same correlation with π¦ y? Which predictors does the Lasso estimator shrink to zero and which does it retain?
A consistent estimator would not reduce any of the π p variables to zero. However, as I understand, the Lasso estimator would select at most π n predictors. My question is: given these conditions, which predictors does Lasso select and why?