Glmnet - Introduction

Here is a brief introduction of the package. The best way of getting started with this package is to look at the Jupyter example file created to demonstrate its features. This should serve as the primary guidepost for the use of this package.

Additional detailed documentation and usage information are provided within each python function.

Description

Suppose X is the input matrix and Y the response vector. For the Gaussian family, glmnet solves the penalized residual sum of squares,

 min_{(beta_0, beta) in mathbb{R}^{p+1}}frac{1}{2N} sum_{i=1}^N (y_i -beta_0-x_i^T beta)^2+lambda big[ (1-alpha)||beta||_2^2/2 + alpha||beta||_1big],

where lambda geq 0 is a complexity parameter and 0 leq alpha leq 1 is a compromise between ridge and lasso. Note that it becomes the lasso when alpha = 1 and the ridge regression when alpha = 0.

For other families, glmnet maximizes the appropriate penalized log-likelihood (partial likelihood for the cox model), or minimize the penalized negative one. Take the binomial model for example, it solves

 min_{(beta_0, beta) in mathbb{R}^{p+1}} -frac{1}{N} sum_{i=1}^N y_i cdot (beta_0 + x_i^T beta) + log (1+e^{(beta_0+x_i^T beta)}) + lambda big[ (1-alpha)||beta||_2^2/2 + alpha||beta||_1big].

The algorithm uses cyclical coordinate descent in a pathwise fashion. In addition to basic settings, many more options are available: observation weights, choice of lambda sequence, grouping, etc. For more information, see the reference papers, help file or the documentation (in progress).

Two central functions of the package are:

  • glmnet.py - primary fitting function that returns a dictionary containing all essential information for further use (e.g. printing, plotting and prediction).

  • cvglmnet.py - a more commonly used function that returns a dictionary after selecting the tuning parameter by cross-validation.

Example

We give a simple example here just to point the way. More exploration can be done by referring to the help files or the illustrative documentation. Note that the functions must be loaded before using them, and all data are required to be scipy objects.

Suppose x is the input matrix and y the response vector. Then,

  • fit = glmnet(x, y) – fits the model under all default settings, the structure variable fit saves all necessary information.

  • glmnetPrint(fit) – prints relevant information of the fitted object, fit here.

  • glmnetPlot(fit) – plots the coefficients from the fitted object.

  • cvfit = cvglmnet(x, y) – fits the model by cross-validation under all default settings, with results saved in cvfit.

  • cvglmnetPlot(cvfit) – plots the cross-validation curve.

List of Major Functions

  • cvglmnet.py

    • cross-validation for glmnet

  • cvglmnetCoef.py

    • extract the coefficients from a 'cvglmnet’ returned dictionary

  • cvglmnetPlot.py

    • plot the cross-validation curve produced by cvglmnet.py

  • cvglmnetPredict.py

    • make predictions from a 'cvglmnet’ dict

  • glmnet.py

    • fit a GLM with lasso or elasticnet regularization

  • glmnetCoef.py

    • extract the coefficients from a 'glmnet’ dict

  • glmnetControl.py

    • internal glmnet parameters

  • glmnetPlot.py

    • plot coefficients from a 'glmnet’ dict

  • glmnetPredict.py

    • make predictions from a 'glmnet’ dict

  • glmnetPrint.py

    • print a 'glmnet’ dict

  • glmnetSet.py

    • creates or alters an options structure for glmnet.py