0
$\begingroup$

I'm getting to know Python regression tools with the intention of benchmarking against ML tools available on a couple of cloud based services. I'm using the boston dataset distributed with scikit-learn, and am testing with both statsmodels OLS and scikit-learn linearregression. The two models are identical: no Y intercept (not clear why this fits better, but that's what I'm seeing), same two IVs plus an interaction term, same DV. And the models give the same beta coefficients on the independent variables. The only difference I'm noticing is in the R^2 values.

#statsmodels X = bos[['RM', 'LSTAT', 'RMxLSTAT']] y = target['MEDV'] model = sm.OLS(y, X).fit() predictions = model.predict(X) model.summary() 

Gives

  • RM 5.3906
  • LSTAT 0.9631
  • RMxLSTAT -0.306
  • R-squared 0.957
#scikit-learn
X = bos[['RM', 'LSTAT', 'RMxLSTAT']]
y = bos[['MEDV']]
bos_linreg = LinearRegression(fit_intercept=False)
bos_linreg.fit(X, y)
print(f'Coefficients: {bos_linreg.coef_}')
print(f'Intercept: {bos_linreg.intercept_}')
print(f'R^2 score: {bos_linreg.score(X, y)}')

Gives

  • Coefficients: [[ 5.39059216 0.96309233 -0.30632371]]
  • Intercept: 0.0
  • R^2 score: 0.7009604508111584

I've seen similar questions posted, but haven't seen an answer that applies. What am I missing?

Thanks, community.

P.S. Random: Why won't code formatting work for the second code block?

$\endgroup$
2
  • $\begingroup$ Is the intercept fit with the statsmodels code? $\endgroup$ Commented Jun 17, 2019 at 17:18
  • $\begingroup$ @Noah - No intercept on either version of the model. $\endgroup$ Commented Jun 17, 2019 at 18:02

2 Answers 2

0
$\begingroup$

A reproducible example is always appreciated.

There are various R^2 definitions, but the standard ones give identical results when your model has an intercept. When there is no intercept, as you can see below, no two of these definitions agree:

import numpy as np from sklearn.linear_model import LinearRegression import pandas as pd import statsmodels.api as sm from numpy.testing import assert_allclose n = 100 for intercept in False, True: X = np.random.normal(size=(n, 3)) if intercept: print("\nWith intercept:") X[:, 0] = 1 else: print("Without intercept:") y = 2 + X[:, 0] - X[:, 1] + np.random.normal(size=n) sm_model = sm.OLS(y, X).fit() sm_predictions = sm_model.predict(X) print("Statsmodels: ", sm_model.rsquared) print("Corr(y, y_hat)^2: ", np.corrcoef(y, sm_predictions)[0, 1]**2) centered_resid = sm_model.resid - sm_model.resid.mean() centered_y = y - y.mean() print("1 - ssr_u/tss_c= ", 1 - np.sum(sm_model.resid**2) / np.sum(centered_y**2)) print("1 - ssr_c/tss_c= ", 1 - np.sum(centered_resid**2) / np.sum(centered_y**2)) print("1 - ssr_u/tss_u= ", 1 - np.sum(sm_model.resid**2) / np.sum(y**2)) sk_model = LinearRegression(fit_intercept=False) sk_model.fit(X, y) print("sklearn: ", sk_model.score(X, y)) assert_allclose(sm_model.params, sk_model.coef_) 

Below is the result:

Without intercept: Statsmodels: 0.14261994999123073 Corr(y, y_hat)^2: 0.43150433543477057 1 - ssr_u/tss_c= -1.266391530932561 1 - ssr_c/tss_c= 0.4296360038850652 1 - ssr_u/tss_u= 0.14261994999123062 sklearn: -1.266391530932561 With intercept: Statsmodels: 0.5664929089923038 Corr(y, y_hat)^2: 0.5664929089923038 1 - ssr_u/tss_c= 0.5664929089923038 1 - ssr_c/tss_c= 0.5664929089923039 1 - ssr_u/tss_u= 0.9268815196720518 sklearn: 0.5664929089923038 
$\endgroup$
1
  • $\begingroup$ I'm sorry it wasn't clear how to reproduce. You can load the boston dataset distributed with scikit-learn with from sklearn import datasets ## imports datasets from scikit-learn boston = datasets.load_boston() ## loads Boston dataset from datasets library to run these models. $\endgroup$ Commented Jun 23, 2019 at 22:02
0
$\begingroup$

I've learned that R^2 just isn't a useful assessment for a model with no y intercept. cardinal does a great job of explaining this here:

Removal of statistically significant intercept term increases $R^2$ in linear model

Thanks, @cardinal!

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.