Skip to main content
1 of 5
Carl
  • 13.5k
  • 7
  • 56
  • 122

Actually there are two different measures that are called correlations. Let us then call them little $r$, which is the Pearson correlation coefficient, and big $R$, which is what you have; a correlation (usually as $R^2$) adjusted for a generalized residual. Now $|r|=|R|$ only when we restrict ourselves to ordinary least squares linear regression in $Y$. If for example, we restrict our linear regression to slope only and set the intercept to zero, we would then use $R$, not $r$. Little $r$ is still the same, it just won't describe the correlation between the new regression line and the data anymore.

Little r is normalized covariance, i.e., $ r= \frac{\operatorname{cov}(X,Y)}{\sigma_X \sigma_Y}=\frac{\sum ^n _{i=1}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum ^n _{i=1}(x_i - \bar{x})^2} \sqrt{\sum ^n _{i=1}(y_i - \bar{y})^2}}$. Finally, $r^2$ is called the coefficient of determination only for the linear case.

Big $R$ is usually explained using ANOVA intermediary quantities:

  • The total sum of squares proportional to the variance of the data: $\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }$ $SS_\text{tot}=\sum_i (y_i-\bar{y})^2,$

  • The regression sum of squares, also called the explained sum of squares: $SS_\text{reg}=\sum_i (f_i -\bar{y})^2,$

  • The sum of squares of residuals, also called the residual sum of squares: $SS_\text{res}=\sum_i (y_i - f_i)^2=\sum_i e_i^2\,$

The most general definition of the coefficient of determination is

$R^2 \equiv 1 - {SS_{\rm res}\over SS_{\rm tot}}.\,$

Now, what is the meaning of this $r^2$ or more generally $R^2$? $R^2$ is the explained fraction and $1-R^2$ is the unexplained fraction of the total variance.

Carl
  • 13.5k
  • 7
  • 56
  • 122