Ordinary Least Squares (OLS) using statsmodels

Ordinary Least Squares (OLS) using statsmodels

Ordinary Least Squares (OLS) is a method used to estimate the parameters in a linear regression model. The statsmodels library in Python offers a comprehensive approach to fit linear models using OLS.

Here's a step-by-step guide on how to perform OLS regression using statsmodels:

  1. Setup: First, ensure you have both numpy, pandas, and statsmodels installed:

    pip install numpy pandas statsmodels 
  2. Perform OLS Regression:

    import numpy as np import pandas as pd import statsmodels.api as sm # Sample data np.random.seed(42) # for reproducibility X = 2.5 * np.random.randn(100) + 1.5 # Array of 100 values with mean = 1.5, stddev = 2.5 res = 0.5 * np.random.randn(100) # Generate 100 residual terms y = 2 + 0.3 * X + res # Actual values of Y # Convert X to a DataFrame since this is how the model expects the input df = pd.DataFrame({'X': X}) # Add a constant to the model (i.e., bias or intercept) X_sm = sm.add_constant(df) # Model: y ~ X + c model = sm.OLS(y, X_sm) results = model.fit() # Print out the statistics print(results.summary()) 

The summary() function provides a comprehensive overview of the regression output, including coefficients for each predictor, the intercept, R-squared value, p-values, and much more.

  1. Making Predictions:

    After fitting the OLS model, you can make predictions on new data using the predict() method:

    new_X = [[1, 3], [1, 4], [1, 5]] # 1 is for constant term predictions = results.predict(new_X) print(predictions) 
  2. Interpreting the Output:

    • coef: Estimated coefficient for the predictor(s).
    • std err: Standard error of the estimate of the coefficient.
    • t: t-statistic value. This is a measure of how statistically significant the coefficient is.
    • P>|t|: P-value of t-statistic.
    • [0.025 0.975]: 95% Confidence Interval of the coefficient.
    • R-squared: Percentage of variance explained by the model.

When working with real-world datasets, it's crucial to perform necessary preprocessing steps like handling missing values, encoding categorical variables, scaling features, etc., before applying OLS regression.


More Tags

angular5 external amazon-ec2 backend iis-7 floating-point-precision ringtone metadata slideshow pandas-styles

More Programming Guides

Other Guides

More Programming Examples