Revisions to Updating the lasso fit with new observations

Notice removed Canonical answer required by user2763361

occurred Jun 12, 2014 at 13:48

Bounty Ended with user603's answer chosen by user2763361

occurred Jun 12, 2014 at 13:48

Tweeted twitter.com/#!/StackStats/status/474759150144999424

occurred Jun 6, 2014 at 3:46

Notice added Canonical answer required by user2763361

occurred Jun 6, 2014 at 1:44

Bounty Started worth 100 reputation by user2763361

occurred Jun 6, 2014 at 1:44

added 255 characters in body

Source Link

edited Nov 17, 2010 at 15:53

NPE

5.7k
6
39
45

I am fitting an L1-regularized linear regression to a very large dataset (n>>pwith n>>p.) The variables are known in advance, but the observations arrive in small chunks. I would like to maintain the lasso fit after each chunk.

I can obviously re-fit the entire model after seeing each new set of observations. This, however, would be pretty inefficient given that there is a lot of data. The amount of new data that arrives at each step is very small, and the fit is unlikely to change much between steps.

Is there anything I can do to reduce the overall computational burden?

I was looking at the LARS algorithm of Efron et al., but would be happy to consider any other fitting method if it can be made to "warm-start" in the way described above.

Notes:

I am mainly looking for an algorithm, but pointers to existing software packages that can do this may also prove insightful.

In addition to the current lasso trajectories, the algorithm is of course welcome to keep other state.

Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression, Annals of Statistics (with discussion) (2004) 32(2), 407--499.

I am fitting an L1-regularized linear regression to a very large dataset (n>>p.) The variables are known in advance, but the observations arrive in small chunks. I would like to maintain the lasso fit after each chunk.

I can obviously re-fit the entire model after seeing each new set of observations. This, however, would be pretty inefficient given that there is a lot of data. The amount of new data that arrives at each step is very small, and the fit is unlikely to change much between steps.

Is there anything I can do to reduce the overall computational burden?

I was looking at the LARS algorithm of Efron et al., but would be happy to consider any other fitting method if it can be made to "warm-start" in the way described above.

Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression, Annals of Statistics (with discussion) (2004) 32(2), 407--499.

I am fitting an L1-regularized linear regression to a very large dataset (with n>>p.) The variables are known in advance, but the observations arrive in small chunks. I would like to maintain the lasso fit after each chunk.

I can obviously re-fit the entire model after seeing each new set of observations. This, however, would be pretty inefficient given that there is a lot of data. The amount of new data that arrives at each step is very small, and the fit is unlikely to change much between steps.

Is there anything I can do to reduce the overall computational burden?

I was looking at the LARS algorithm of Efron et al., but would be happy to consider any other fitting method if it can be made to "warm-start" in the way described above.

Notes:

I am mainly looking for an algorithm, but pointers to existing software packages that can do this may also prove insightful.

In addition to the current lasso trajectories, the algorithm is of course welcome to keep other state.

Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression, Annals of Statistics (with discussion) (2004) 32(2), 407--499.

add link to LARS article

Source Link

edited Nov 17, 2010 at 14:45

chl

55.4k
23
235
411

I am fitting an L1-regularized linear regression to a very large dataset (n>>p.) The variables are known in advance, but the observations arrive in small chunks. I would like to maintain the lasso fit after each chunk.

I can obviously re-fit the entire model after seeing each new set of observations. This, however, would be pretty inefficient given that there is a lot of data. The amount of new data that arrives at each step is very small, and the fit is unlikely to change much between steps.

Is there anything I can do to reduce the overall computational burden?

I was looking at the LARS algorithm of [1]Efron et al., but would be happy to consider any other fitting method if it can be made to "warm-start" in the way described above.

[1] Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression Annals of Statistics (with discussion) (2004) 32(2), 407--499.

Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression, Annals of Statistics (with discussion) (2004) 32(2), 407--499.