I am fitting an L1-regularized linear regression to a very large dataset (n>>pwith n>>p.) The variables are known in advance, but the observations arrive in small chunks. I would like to maintain the lasso fit after each chunk.
I can obviously re-fit the entire model after seeing each new set of observations. This, however, would be pretty inefficient given that there is a lot of data. The amount of new data that arrives at each step is very small, and the fit is unlikely to change much between steps.
Is there anything I can do to reduce the overall computational burden?
I was looking at the LARS algorithm of Efron et al., but would be happy to consider any other fitting method if it can be made to "warm-start" in the way described above.
Notes:
- I am mainly looking for an algorithm, but pointers to existing software packages that can do this may also prove insightful.
- In addition to the current lasso trajectories, the algorithm is of course welcome to keep other state.
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, Least Angle Regression, Annals of Statistics (with discussion) (2004) 32(2), 407--499.