0
$\begingroup$

I have been reading about the Coefficient of Determination and am wondering why it is necessarily less than or equal to 1.

formula

I understand that RSS is the sum of the difference between each dependent variable and it's prediction, squared.

So it makes sense that RSS will be zero if the independent variables perfectly predict the dependent ones.

I understand that TSS is the sum of the difference between the dependent variable and the mean, squared.

But why is RSS/TSS necessarily less than 1?

$\endgroup$
4
  • $\begingroup$ Squares are not negative. Therefore, $RSS/TSS$ is not negative. Done. $\endgroup$ Commented Jul 12, 2022 at 15:09
  • $\begingroup$ @whuber Sorry I don't see how $RSS/TSS$ being negative would cause $R^2$ to be negative. $\endgroup$ Commented Jul 12, 2022 at 21:03
  • $\begingroup$ I wrote not negative. When you subtract a non-negative number from something, you don't get a larger number: that is, $R^2$ cannot exceed $1.$ Moreover, your question is not about $R^2$ being negative: it's about it being "necessarily less than 1." Did you perhaps mean to ask a different question? $\endgroup$ Commented Jul 12, 2022 at 22:05
  • $\begingroup$ I could re-phrase my question as "What stops RSS/TSS being large and positive." ? Thank you for clarifying why it cant be negative. I understand Spur's answer on this. $\endgroup$ Commented Jul 13, 2022 at 4:24

1 Answer 1

0
$\begingroup$

A simple thought experiment will help answer your question.

TSS is the sum of (Yi - meanY)^2.

Let us assume that we have a regression line with the value of the mean. If the mean is 5, it will simply be a horizontal line with the value of 5 throughout. And, the squared deviations from this line will be (Yi - meanY)^2 because it represents the mean. Now, we have 2 scenarios here:

  • The regression line that we estimate and predict is the same as this mean line. In that case, TSS = RSS because both the lines are the same. The squared deviations from the mean (Yi - meanY) and squared residuals (Yi - Yhat) will be exactly the same because the meanY and Yhat are the same.
  • The regression line that we estimate is different from the mean line. In that case, RSS < TSS because the model minimizes squared residuals by definition. Since the squared residuals (Yi - Yhat) are minimum by definition, they will always be less than the squared deviations from mean (Yi - meanY).

So, we only have 2 possible scenarios: either RSS = TSS or RSS < TSS. This implies that R-square will always be between 0 and 1.

$\endgroup$
3
  • $\begingroup$ "the model minimizes squared residuals by definition." That was what I was missing. $\endgroup$ Commented Jul 12, 2022 at 8:07
  • 2
    $\begingroup$ No, when computing $R^2$ "out of sample" the sum of squared errors can exceed the total SS. When predictions are worse than random $R^{2} < 0$. $\endgroup$ Commented Jul 12, 2022 at 11:08
  • $\begingroup$ True. But then we don't call it R square, we specifically call it "Out of sample R-square". It is different from the usual R-square we use to assess models. $\endgroup$ Commented Jul 12, 2022 at 12:05

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.