When is R squared negative? [duplicate]

Question

My understanding is that $R^2$ cannot be negative as it is the square of R. However I ran a simple linear regression in SPSS with a single independent variable and a dependent variable. My SPSS output give me a negative value for $R^2$. If I was to calculate this by hand from R then $R^2$ would be positive. What has SPSS done to calculate this as negative?

R=-.395 R squared =-.156 B (un-standardized)=-1261.611

Code I've used:

DATASET ACTIVATE DataSet1. REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT valueP /METHOD=ENTER ageP

I get a negative value. Can anyone explain what this means?

Negative RSquared

enter image description here

Does this answer your question? stats.stackexchange.com/questions/6181/… If not, then please provide more information: this is the "SPSS output" of what procedure? — whuber
– whuber ♦, Commented Jul 11, 2011 at 17:14
@Anne I suggest you disregard the time series reply, because your data are not time series and you're not using a time series procedure. Are you really sure the R squared is given as a negative value? Its magnitude is correct: $(-0.395)^2=0.156$. I have looked through SPSS help to see whether perhaps as a convention the R-squared value for negative R's is negated, but I don't see any evidence that this is the case. Perhaps you could post a screen shot of the output where you are reading the R-squared? — whuber
– whuber ♦, Commented Jul 11, 2011 at 20:26
@Anne There's nothing the matter with large standard errors: they merely reflect the units in which the dependent variable is measured. However, it is possible the strange results arise from numerical instabilities. Sometimes it helps to re-express the data in a way that reduces the potential effects of floating point error. In this case, the stats suggest you should compute y = (valueP - 100000)/1000 and try again to regress y against ageP. Do you still get a negative R square? — whuber
– whuber ♦, Commented Jul 18, 2011 at 12:41

Community · Accepted Answer · 2022-04-15 05:07:03Z

$R^2$ compares the fit of the chosen model with that of a horizontal straight line (the null hypothesis). If the chosen model fits worse than a horizontal line, then $R^2$ is negative. Note that $R^2$ is not always the square of anything, so it can have a negative value without violating any rules of math. $R^2$ is negative only when the chosen model does not follow the trend of the data, so fits worse than a horizontal line.

Example: fit data to a linear regression model constrained so that the $Y$ intercept must equal $1500$.

enter image description here

The model makes no sense at all given these data. It is clearly the wrong model, perhaps chosen by accident.

The fit of the model (a straight line constrained to go through the point (0,1500)) is worse than the fit of a horizontal line. Thus the sum-of-squares from the model $(SS_\text{res})$ is larger than the sum-of-squares from the horizontal line $(SS_\text{tot})$.

If $R^2$ is computed as $1 - \frac{SS_\text{res}}{SS_\text{tot}}$. (here, $SS_{res}$ = residual error.)
When $SS_\text{res}$ is greater than $SS_\text{tot}$, that equation could compute a negative value for $R^2$, if the value of the coeficient is greater than 1.

With linear regression with no constraints, $R^2$ must be positive (or zero) and equals the square of the correlation coefficient, $r$. A negative $R^2$ is only possible with linear regression when either the intercept or the slope are constrained so that the "best-fit" line (given the constraint) fits worse than a horizontal line. With nonlinear regression, the $R^2$ can be negative whenever the best-fit model (given the chosen equation, and its constraints, if any) fits the data worse than a horizontal line.

Bottom line: a negative $R^2$ is not a mathematical impossibility or the sign of a computer bug. It simply means that the chosen model (with its constraints) fits the data really poorly.

@JMS That's the opposite of what my Googling indicates: "/ORIGIN" fixes the intercept at 0; "/NOORIGIN" "tells SPSS not to suppress the constant" (An Introductory Guide to SPSS for Windows) — whuber
– whuber ♦, Commented Jul 13, 2011 at 18:13
@whuber Correct. @harvey-motulsky A negative R^2 value is a mathematical impossibility (and suggests a computer bug) for regular OLS regression (with an intercept). This is what the 'REGRESSION' command does and what the original poster is asking about. Also, for OLS regression, R^2 is the squared correlation between the predicted and the observed values. Hence, it must be non-negative. For simple OLS regression with one predictor, this is equivalent to the squared correlation between the predictor and the dependent variable -- again, this must be non-negative. — Wolfgang
– Wolfgang, Commented Jul 14, 2011 at 7:17
@whuber Indeed. My bad; obviously I don't use SPSS - or read, apparently :) — JMS
– JMS, Commented Jul 14, 2011 at 16:56
@whuber. I added a paragraph pointing out that with linear regression, R2 can be negative only when the intercept (or perhaps the slope) is constrained. With no constraints, the R2 must be positive and equals the square of r, the correlation coefficient. — Harvey Motulsky
– Harvey Motulsky, Commented Jul 16, 2011 at 15:55
@Nate. Yes, the null hypothesis of linear regression (with no constraints, and equal weighting of all points) is a straight line at Y = Mean — Harvey Motulsky
– Harvey Motulsky, Commented Dec 16, 2021 at 17:19

jefflovejapan · Accepted Answer · 2011-07-12 07:04:06Z

Have you forgotten to include an intercept in your regression? I'm not familiar with SPSS code, but on page 21 of Hayashi's Econometrics:

If the regressors do not include a constant but (as some regression software packages do) you nevertheless calculate $R^2$ by the formula

$R^2=1-\frac{\sum_{i=1}^{n}e_i^2}{\sum_{i=1}^{n}(y_i-\bar{y})^2}$

then the $R^2$ can be negative. This is because, without the benefit of an intercept, the regression could do worse than the sample mean in terms of tracking the dependent variable (i.e., the numerator could be greater than the denominator).

I'd check and make sure that SPSS is including an intercept in your regression.

NOORIGIN subcommand in her code tells that intercept was included in the model — ttnphns
– ttnphns, Commented Jul 12, 2011 at 10:12
that's weird. I would have guessed that NOORIGIN would mean that intercept was not included in the model, just going off the name. — tumultous_rooster
– tumultous_rooster, Commented Nov 8, 2015 at 4:29

IrishStat · Accepted Answer · 2011-07-13 17:57:33Z

This can happen if you have a time series that is N.i.i.d. and you construct an inappropriate ARIMA model of the form(0,1,0) which is a first difference random walk model with no drift then the variance (sum of squares - SSE ) of the residuals will be larger than the variance (sum of squares SSO) of the original series. Thus the equation 1-SSE/SSO will yield a negative number as SSE execeedS SSO . We have seen this when users simply fit an assumed model or use inadequate procedures to identify/form an appropriate ARIMA structure. The larger message IS that a model can distort (much like a pair of bad glasses ) your vision. Without having access to your data I would otherwise have a problem in explaining your faulty results. Have you brought this to the attention of IBM ?

The idea of an assumed model being counter-productive has been echoed by Harvey Motulsky. Great post Harvey !

stat. Thanks. No I have not spoken to IBM. The data is not time series. It is from point in time data. — Anne
– Anne, Commented Jul 11, 2011 at 19:55
@Anne and others: Since your data are not time series and you're not using a time series procedure please disregard my answer. Others who have observed negative R Squares when involved with time series might find my post interesting and tangentially informative. Others unfortunately may not. — IrishStat
– IrishStat, Commented Jul 11, 2011 at 21:36
@IrishStat: Could you please add a link to the Harvey Motulsky post? — kjetil b halvorsen
– kjetil b halvorsen ♦, Commented Aug 27, 2018 at 8:33

Fernando Wittmann · Accepted Answer · 2025-03-28 15:38:27Z

For those from the Machine Learning field:
A negative R squared ($R^2$) means that the model is predicting worse than a dummy model that simply uses the mean of the target values ($\bar{y}$) as the prediction for all instances.

Mathematically:

$R^2 = 1 - \frac{MSE(y, \hat{y})}{MSE(y, \bar{y})}$

Where:

y = the true target values
ŷ (y_pred) = the predicted values from the model
ȳ (y_mean) = the mean of the true target values
MSE(y, y_pred) = mean squared error of the model
MSE(y, y_mean) = mean squared error of a dummy model that always predicts the mean

If the model is bad enough that MSE(y, y_pred) is greater than MSE(y, y_mean), the R² score becomes negative.

Here's an example in Python:

from sklearn.metrics import r2_score, mean_squared_error import numpy as np # True target values y = np.array([3, 5, 7, 9, 11]) # Poor model predictions y_pred = np.array([10, 10, 10, 10, 10]) # Dummy model predictions (mean of y) y_mean = np.full_like(y, y.mean()) # R squared calculation r2 = r2_score(y, y_pred) r2_using_mean = r2_score(y, y_mean) # Output results print("True values (y):", y) print("Model predictions (y_pred):", y_pred) print("Mean of y (ȳ):", y_mean[0]) print("R² score:", r2) print("R² score using mean as pred:", r2_using_mean)

Which prints:

True values (y): [ 3 5 7 9 11] Model poor predictions (y_pred): [10 10 10 10 10] Mean of y (ȳ): 7 R² score: -1.125 R² score using mean as pred: 0.0

Also here's a plot:

import matplotlib.pyplot as plt plt.plot(y, label="True values (y)", marker='o') plt.plot(y_pred, label=f"Model predictions (y_pred) - R2: {r2}", linestyle='--', marker='x') plt.plot(y_mean, label=f"Dummy predictions (y_mean) - R2: {r2_using_mean}", linestyle=':', marker='s') plt.title("True vs Model vs Mean Predictions") plt.xlabel("Sample Index") plt.ylabel("Value") plt.legend() plt.grid(True) plt.tight_layout() plt.show()

Stack Exchange Network

When is R squared negative? [duplicate]

4 Answers 4

Linked

Hot Network Questions

When is R squared negative? [duplicate]

4 Answers 4

Linked

Related

Hot Network Questions