For those from the Machine Learning field:
A negative R squared ($R^2$) means that the model is predicting worse than a dummy model that simply uses the mean of the target values ($\bar{y}$) as the prediction for all instances.
Mathematically:
$R^2 = 1 - \frac{MSE(y, \hat{y})}{MSE(y, \bar{y})}$
Where:
y = the true target values ŷ (y_pred) = the predicted values from the model ȳ (y_mean) = the mean of the true target values MSE(y, y_pred) = mean squared error of the model MSE(y, y_mean) = mean squared error of a dummy model that always predicts the mean
If the model is bad enough that MSE(y, y_pred) is greater than MSE(y, y_mean), the R² score becomes negative.
Here's an example in Python:
from sklearn.metrics import r2_score, mean_squared_error import numpy as np # True target values y = np.array([3, 5, 7, 9, 11]) # Poor model predictions y_pred = np.array([10, 10, 10, 10, 10]) # Dummy model predictions (mean of y) y_mean = np.full_like(y, y.mean()) # R squared calculation r2 = r2_score(y, y_pred) r2_using_mean = r2_score(y, y_mean) # Output results print("True values (y):", y) print("Model predictions (y_pred):", y_pred) print("Mean of y (ȳ):", y_mean[0]) print("R² score:", r2) print("R² score using mean as pred:", r2_using_mean)
Which prints:
True values (y): [ 3 5 7 9 11] Model poor predictions (y_pred): [10 10 10 10 10] Mean of y (ȳ): 7 R² score: -1.125 R² score using mean as pred: 0.0
Also here's a plot:
import matplotlib.pyplot as plt plt.plot(y, label="True values (y)", marker='o') plt.plot(y_pred, label=f"Model predictions (y_pred) - R2: {r2}", linestyle='--', marker='x') plt.plot(y_mean, label=f"Dummy predictions (y_mean) - R2: {r2_using_mean}", linestyle=':', marker='s') plt.title("True vs Model vs Mean Predictions") plt.xlabel("Sample Index") plt.ylabel("Value") plt.legend() plt.grid(True) plt.tight_layout() plt.show()
