Covariance of Residuals and Fitted Values in Linear Regression

Question

Consider the simple linear regression model

$Y_i = \beta_0 + \beta_1x_i + \epsilon_i$

where $\epsilon_i \sim^{indep} N(0, \sigma^2)$ for $i = 1,...,n$. Let $\hat{\beta_{0}}$ and $\hat{\beta_{1}}$ be the usual maximum likelihood estimators of $\beta_0$ and $\beta_1$, respectively. The $i$th residual is defined as $\hat{\epsilon_{i}} = Y_i - \hat{Y_{i}}$, where $\hat{Y_i} = \hat{\beta_{0}} + \hat{\beta_{1}}x_i$ is the $i$th fitted value.

Derive $Cov(\hat{\epsilon_{i}},\hat{Y_i})$

This is what I've got so far, I keep getting stuck though even trying to do this different ways

$$ \begin{align} Cov(\hat{\epsilon_{i}},\hat{Y_i}) &= E[\hat{\epsilon_{i}}\hat{Y_i}] - E[\hat{\epsilon_{i}}]E[\hat{Y_i}]\\ &= E[\hat{\epsilon_{i}}\hat{Y_i}] \text{ (as }E[\hat{\epsilon_{i}}]=0)\\ &= E[\hat{\epsilon_{i}}(Y_i - \hat{\epsilon_{i}})]\\ &= Y_iE[\hat{\epsilon_{i}}] - E[\hat{\epsilon_{i}}^2]\\ &= - E[\hat{\epsilon_{i}}^2] \end{align} $$

But I don't know how to proceed further with this as the residuals are not independent. Proceeding similarly from the second line but rearranging differently, I also got

$$ Cov(\hat{\epsilon_{i}},\hat{Y_i}) = Y_i^2 - E[\hat{Y_{i}}^2] = - E[\hat{\epsilon_{i}}^2] $$

Which I'm having the same problem with, I feel like the answer is supposed to be zero but I'm just missing some piece of info to prove it.

Edit: I've been retrying this question and I think this may be a better method, although I'm stuck at a point with this too: $$ \begin{align} Cov(\hat{\epsilon_{i}},\hat{Y_i}) &= Cov(Y_i - \hat{Y_i}, \hat{Y_i})\\ &= Cov(Y_i ,\hat{Y_i}) - Cov(\hat{Y_i},\hat{Y_i})\\ &= Cov(Y_i ,\hat{Y_i}) - Var(\hat{Y_i}) \end{align} $$ So, I know what $Var(\hat{Y_i})$ is, but I do not know what $Cov(Y_i ,\hat{Y_i})$ is, although I suspect it is $Var(\hat{Y_i})$. If someone could help me with a derivation for this, that would be amazing.

$E[Y_i\epsilon_i]=E[(\beta_0 + \beta_1x_i + \epsilon_i)\epsilon_i]=E[\beta_0]E[\epsilon_i]+E[\beta_1x_i]E[\epsilon_i]+E[\epsilon_i^2]$ since $\epsilon_i$ is independent of $\beta_0, \beta_1, x_i$ — G Frazao
– G Frazao, Commented Mar 17, 2023 at 16:25
@GFrazao that still leaves me with the issue of calculating $E[\hat{\epsilon_i}^2]$? — spooleey
– spooleey, Commented Mar 17, 2023 at 16:32
It cancels with the $-E[\epsilon_i^2]$ which you already have. Take care with hats and no hats. I just gave you the piece that was not correct, which was $E[Y_i\hat \epsilon_i]\neq Y_iE[\hat\epsilon_i]$. — G Frazao
– G Frazao, Commented Mar 17, 2023 at 17:24
@GFrazao so in you original comment you have not used hats? So this surely does not help me, as using a similar method to you but with the variables I'm actually interested in would give $E[\hat{Y_i} \hat{\epsilon_i}] = E[\hat{\epsilon_i}(\hat{\beta_0} + \hat{\beta_1}x_i)] = E[\hat{\epsilon_i}\hat{\beta_0}] + E[\hat{\epsilon_i}\hat{\beta_1}x_i]$ and now if $\hat{\epsilon_i}$ is independent of $\hat{\beta_0},\hat{\beta_1},x_i$ then I can show it is equal to 0 — spooleey
– spooleey, Commented Mar 18, 2023 at 11:55

V. Vancak · Accepted Answer · 2023-03-18 19:23:09Z

\begin{align} cov(Y_i, \hat Y_i) &= cov(Y_i, \hat \beta_0 + \hat \beta_1 X_i ) \\ & = cov(Y_i, \hat \beta_0 + \hat \beta_1 X_i )\\ & = cov(Y_i, \bar Y - \hat \beta_1 \bar X + \hat \beta_1 X_i )\\ & = cov(Y_i, \bar Y - \hat \beta_1 (X_i - \bar X) )\\ & = \frac{1}{n}Var(Y_i) + cov(Y_i , \frac{(X_i - \bar X) \sum (X_i - \bar X ) Y_i }{\sum ( X_i - \bar X ) ^ 2})\\ & = \frac{1}{n}\sigma ^ 2 + cov(Y_i , \frac{(X_i - \bar X) ^ 2 Y_i }{\sum ( X_i - \bar X ) ^ 2})\\ & = \frac{1}{n}\sigma ^ 2 + \frac{(X_i - \bar X) ^ 2 Var(Y_i) }{\sum ( X_i - \bar X ) ^ 2}\\ & = \frac{1}{n}\sigma ^ 2 + \frac{\sigma ^ 2 (X_i - \bar X) ^ 2 }{\sum ( X_i - \bar X ) ^ 2}\\ \end{align}

which is the same as $Var(\hat Y_i)$.

William M. · Accepted Answer · 2023-03-17 17:49:20Z

A much more straightforward and cleaner approach is to consider a matrix approach. Here the model is $$ y = \beta_0 \mathbf{1} + \beta_1 x + e = X \beta + e $$ where $X = [\mathbf{1}, x]$ and $\beta^\intercal = [\beta_0, \beta_1].$ Then, $\hat y = X (X^\intercal X)^{-1} X^\intercal y = P_Xy,$ where $P_X$ is the orthogonal projector onto the linear subspace $V_X = \langle X \rangle.$ Then $\hat e = y - \hat y = (I - P_X)y = P_X^\perp y.$ Finally, $$ \mathbf{E}(\hat e \hat y^\intercal) = \mathbf{E}(P_X^\perp y y^\intercal P_X) = P_X^\perp \mathbf{E}(yy^\intercal) P_X = P_X^\perp (\sigma^2 I) P_X = 0, $$ since $P_X^\perp P_X = 0$ (project onto $V_X$ and then onto $V_X^\perp$ gives zero). Then, entry $(i,i)$ of $\mathbf{E}(\hat e \hat y^\perp)$ is zero, i.e. $\mathbf{E}(\hat e_i \hat y_i) = 0.$ (Note that we never used normality, just that $\mathbf{E}(yy^\intercal) = \sigma^2 I,$ meaning uncorrelated observations with a common variance.)

I've not covered any matrix nor vectors in my statistics courses so this is not very helpful. Could you explain in the usual notation? — spooleey
– spooleey, Commented Mar 18, 2023 at 12:10
@spooleey that is quite sad: linear regression can be shown as a projection problem. Given a vector $y,$ find the projection on $V_X$ which is the space spanned by the columns of $X.$ This view completely has the usual normality assumptions as a particular case plus it is a straightforward calculation. — William M.
– William M., Commented Mar 20, 2023 at 1:00
@abhishek both quantities $\hat e^\intercal \hat y$ and $\hat e \hat y^\intercal$ are defined. The former being the dot product and the latter the exterior product. — William M.
– William M., Commented Aug 30, 2023 at 14:46

Stack Exchange Network

Covariance of Residuals and Fitted Values in Linear Regression

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Covariance of Residuals and Fitted Values in Linear Regression

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions