This is a common way to define $R^2$ in a regression problem.
$$ R^2=1-\left(\dfrac{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\hat y_i \right)^2 }{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\bar y \right)^2 }\right) $$
In an OLS linear regression with an intercept, this winds up being equivalent to other calculations.
Squared Pearson correlation between true and predicted values: $\left[ \text{corr}\left( y, \hat y \right) \right]^2$.
In a simple linear regression with just one predictor, $x$, the squared Pearson correlation between the outcome $y$ and that predictor, $x$: $\left[ \text{corr}\left( y, x \right) \right]^2$.
Consequently, there is a straightforward interpretation of $\sqrt{1-\left(\frac{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\hat y_i \right)^2 }{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\bar y \right)^2 }\right)} $ as a correlation coefficient, at least in the case of OLS linear regression (with an intercept).
This leads me to two related questions.
In the situation of OLS linear regression with an intercept, $R^2\ge0$, and probably $R^2>0$. Thus, we (probably) get two roots, one positive and one negative. Only one of those values equals a correlation from above. How can we interpret the other root, particularly in multiple regression with multiple $x$-values?
In general, $R^2\in\mathbb R$, and we are not guaranteed of having real roots. When $R^2 < 0$, when the fit is so poor that a "predict $\bar y$ every time model gives a better value of square loss, how do we interpret the complex roots of such a poor model?
# https://stackoverflow.com/questions/14966814/multiple-roots-in-the-complex-plane-with-r # nRoot <- function(x, root) { polyroot(c(-x, rep(0, root-1), 1)) } y <- c(1, 2, 3) # Observed outcomes yhat <- c(-2, -3, -4) # Predictions from some (bad) model r2 <- 1 - (sum((y - yhat)^2))/(sum((y - mean(y))^2)) # Take the complex square roots # zs <- nRoot(r2, 2) real_parts <- imaginary_parts <- rep(NA, 2) for (i in 1:2){ real_parts[i] <- Re(zs[i]) imaginary_parts[i] <- Im(zs[i]) } # prepare "circle data" # https://stackoverflow.com/a/22266105/11751799 # radius = sqrt(abs(r2)) center_x = 0 center_y = 0 theta = seq(0, 2 * pi, length = 200) # angles for drawing points around the circle # # Draw a circle # plot( x = radius * cos(theta) + center_x, y = radius * sin(theta) + center_y, type = "l", xlab = "Real", ylab = "Imaginary", main = paste( "Complex Roots of \nR^2 =", round(r2, 3) ) ) # # Plot the complex roots # points(0, 0) points(c(real_parts[1]), c(imaginary_parts[1]), col = 'red') lines(c(0, real_parts[1]), c(0, imaginary_parts[1]), col = 'red') points(c(real_parts[2]), c(imaginary_parts[2]), col = 'blue') lines(c(0, real_parts[2]), c(0, imaginary_parts[2]), col = 'blue') IDEA
We solve for the OLS solution using matrix calculus and get an explicit formula to calculate an estimate of the regression parameter, the usual $\hat\beta = (X^TX)^{-1}X^Ty$. However, OLS means that we calculate the predictions across all possible parameter vectors and the pick the parameter vector giving the predictions with the lowest sum of squared residuals.
$$\hat\beta_{\text{OLS}} \in \left\{\underset{\left( \tilde\beta_0, \tilde\beta_1, \dots,\tilde\beta_p \right)\in\mathbb{R}^{p+1}}{\arg\min}\left\{ \underset{i = 1}{\overset{N}{\sum}}\left( y_i - \hat y_i \right)^2\bigg\vert \hat y_i = \tilde\beta_0 + \tilde\beta_1 x_{i1} +\dots + \tilde\beta_p x_{ip} \right\}\right\}$$
While the familiar $\hat\beta = (X^TX)^{-1}X^Ty$ is such an $\arg\min$, every combination of real numbers is a candidate estimate that gives some set of predictions with some sum of squared residuals and some $R^2$ value with associated roots in $\mathbb C$.
In the image below, I created some synthetic data and made three guesses about the simple linear regression parameters that might minimize the sum of squared residuals. Sure, there is more to regression that OLS linear regression, but might the green and red roots have some interpretation?
library(data.table) library(ggplot2) x <- c(0, 1, 2) y <- c(2, 7, 6) params1 <- c(2, 7) yhat1 <- cbind(1, x) %*% params1 r2_1 <- 1 - (sum((y - yhat1)^2))/(sum((y - mean(y))^2)) params2 <- c(-1, -4) yhat2 <- cbind(1, x) %*% params2 r2_2 <- 1 - (sum((y - yhat2)^2))/(sum((y - mean(y))^2)) params3 <- c(3, 2) # This happens to be the OLS solution, confirmed by lm(y ~ x) yhat3 <- cbind(1, x) %*% params3 r2_3 <- 1 - (sum((y - yhat3)^2))/(sum((y - mean(y))^2)) params <- rbind(params1, params2, params3) # https://stackoverflow.com/questions/14966814/multiple-roots-in-the-complex-plane-with-r # nRoot <- function(x, root = 2) { polyroot(c(-x, rep(0, root-1), 1)) } r2_1 # -6.428571 r2_2 # -26 r2_3 # 0.5714286 r2 <- c(r2_1, r2_2, r2_3) L <- list() for (i in 1:length(r2)){ zs <- nRoot(r2[i]) real_parts <- imaginary_parts <- rep(NA, 2) for (j in 1:2){ real_parts[j] <- Re(zs[j]) imaginary_parts[j] <- Im(zs[j]) } L[[i]] <- data.frame( Real = c(0, 0, real_parts), Imaginary = c(0, 0, imaginary_parts), Estimate = paste(params[i, 1], ", ", params[i, 2], sep = "") ) } d <- data.table::rbindlist(L) ggplot(d, aes(x = Real, y = Imaginary, col = Estimate)) + geom_line() + geom_point() 
