Skip to main content
Notice removed Canonical answer required by user2763361
Bounty Ended with user603's answer chosen by user2763361
Tweeted twitter.com/#!/StackStats/status/474759150144999424
Notice added Canonical answer required by user2763361
Bounty Started worth 100 reputation by user2763361
added 255 characters in body
Source Link
NPE
  • 5.7k
  • 6
  • 39
  • 45

I am fitting an L1-regularized linear regression to a very large dataset (n>>pwith n>>p.) The variables are known in advance, but the observations arrive in small chunks. I would like to maintain the lasso fit after each chunk.

I can obviously re-fit the entire model after seeing each new set of observations. This, however, would be pretty inefficient given that there is a lot of data. The amount of new data that arrives at each step is very small, and the fit is unlikely to change much between steps.

Is there anything I can do to reduce the overall computational burden?

I was looking at the LARS algorithm of Efron et al., but would be happy to consider any other fitting method if it can be made to "warm-start" in the way described above.

Notes:

  1. I am mainly looking for an algorithm, but pointers to existing software packages that can do this may also prove insightful.
  2. In addition to the current lasso trajectories, the algorithm is of course welcome to keep other state.

Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression, Annals of Statistics (with discussion) (2004) 32(2), 407--499.

I am fitting an L1-regularized linear regression to a very large dataset (n>>p.) The variables are known in advance, but the observations arrive in small chunks. I would like to maintain the lasso fit after each chunk.

I can obviously re-fit the entire model after seeing each new set of observations. This, however, would be pretty inefficient given that there is a lot of data. The amount of new data that arrives at each step is very small, and the fit is unlikely to change much between steps.

Is there anything I can do to reduce the overall computational burden?

I was looking at the LARS algorithm of Efron et al., but would be happy to consider any other fitting method if it can be made to "warm-start" in the way described above.

Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression, Annals of Statistics (with discussion) (2004) 32(2), 407--499.

I am fitting an L1-regularized linear regression to a very large dataset (with n>>p.) The variables are known in advance, but the observations arrive in small chunks. I would like to maintain the lasso fit after each chunk.

I can obviously re-fit the entire model after seeing each new set of observations. This, however, would be pretty inefficient given that there is a lot of data. The amount of new data that arrives at each step is very small, and the fit is unlikely to change much between steps.

Is there anything I can do to reduce the overall computational burden?

I was looking at the LARS algorithm of Efron et al., but would be happy to consider any other fitting method if it can be made to "warm-start" in the way described above.

Notes:

  1. I am mainly looking for an algorithm, but pointers to existing software packages that can do this may also prove insightful.
  2. In addition to the current lasso trajectories, the algorithm is of course welcome to keep other state.

Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression, Annals of Statistics (with discussion) (2004) 32(2), 407--499.

add link to LARS article
Source Link
chl
  • 55.4k
  • 23
  • 235
  • 411

I am fitting an L1-regularized linear regression to a very large dataset (n>>p.) The variables are known in advance, but the observations arrive in small chunks. I would like to maintain the lasso fit after each chunk.

I can obviously re-fit the entire model after seeing each new set of observations. This, however, would be pretty inefficient given that there is a lot of data. The amount of new data that arrives at each step is very small, and the fit is unlikely to change much between steps.

Is there anything I can do to reduce the overall computational burden?

I was looking at the LARS algorithm of [1]Efron et al., but would be happy to consider any other fitting method if it can be made to "warm-start" in the way described above.

[1] Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression Annals of Statistics (with discussion) (2004) 32(2), 407--499.

Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression, Annals of Statistics (with discussion) (2004) 32(2), 407--499.

I am fitting an L1-regularized linear regression to a very large dataset (n>>p.) The variables are known in advance, but the observations arrive in small chunks. I would like to maintain the lasso fit after each chunk.

I can obviously re-fit the entire model after seeing each new set of observations. This, however, would be pretty inefficient given that there is a lot of data. The amount of new data that arrives at each step is very small, and the fit is unlikely to change much between steps.

Is there anything I can do to reduce the overall computational burden?

I was looking at the LARS algorithm of [1], but would be happy to consider any other fitting method if it can be made to "warm-start" in the way described above.

[1] Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression Annals of Statistics (with discussion) (2004) 32(2), 407--499.

I am fitting an L1-regularized linear regression to a very large dataset (n>>p.) The variables are known in advance, but the observations arrive in small chunks. I would like to maintain the lasso fit after each chunk.

I can obviously re-fit the entire model after seeing each new set of observations. This, however, would be pretty inefficient given that there is a lot of data. The amount of new data that arrives at each step is very small, and the fit is unlikely to change much between steps.

Is there anything I can do to reduce the overall computational burden?

I was looking at the LARS algorithm of Efron et al., but would be happy to consider any other fitting method if it can be made to "warm-start" in the way described above.

Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression, Annals of Statistics (with discussion) (2004) 32(2), 407--499.

edited tags
Link
user88
user88
added 7 characters in body
Source Link
NPE
  • 5.7k
  • 6
  • 39
  • 45
Loading
Source Link
NPE
  • 5.7k
  • 6
  • 39
  • 45
Loading