1
$\begingroup$
Consider the simple linear regression model

$Y_i = \beta_0 + \beta_1x_i + \epsilon_i$

where $\epsilon_i \sim^{indep} N(0, \sigma^2)$ for $i = 1,...,n$. Let $\hat{\beta_{0}}$ and $\hat{\beta_{1}}$ be the usual maximum likelihood estimators of $\beta_0$ and $\beta_1$, respectively. The $i$th residual is defined as $\hat{\epsilon_{i}} = Y_i - \hat{Y_{i}}$, where $\hat{Y_i} = \hat{\beta_{0}} + \hat{\beta_{1}}x_i$ is the $i$th fitted value.

Derive $Cov(\hat{\epsilon_{i}},\hat{Y_i})$

This is what I've got so far, I keep getting stuck though even trying to do this different ways

$$ \begin{align} Cov(\hat{\epsilon_{i}},\hat{Y_i}) &= E[\hat{\epsilon_{i}}\hat{Y_i}] - E[\hat{\epsilon_{i}}]E[\hat{Y_i}]\\ &= E[\hat{\epsilon_{i}}\hat{Y_i}] \text{ (as }E[\hat{\epsilon_{i}}]=0)\\ &= E[\hat{\epsilon_{i}}(Y_i - \hat{\epsilon_{i}})]\\ &= Y_iE[\hat{\epsilon_{i}}] - E[\hat{\epsilon_{i}}^2]\\ &= - E[\hat{\epsilon_{i}}^2] \end{align} $$

But I don't know how to proceed further with this as the residuals are not independent. Proceeding similarly from the second line but rearranging differently, I also got

$$ Cov(\hat{\epsilon_{i}},\hat{Y_i}) = Y_i^2 - E[\hat{Y_{i}}^2] = - E[\hat{\epsilon_{i}}^2] $$

Which I'm having the same problem with, I feel like the answer is supposed to be zero but I'm just missing some piece of info to prove it.


Edit: I've been retrying this question and I think this may be a better method, although I'm stuck at a point with this too: $$ \begin{align} Cov(\hat{\epsilon_{i}},\hat{Y_i}) &= Cov(Y_i - \hat{Y_i}, \hat{Y_i})\\ &= Cov(Y_i ,\hat{Y_i}) - Cov(\hat{Y_i},\hat{Y_i})\\ &= Cov(Y_i ,\hat{Y_i}) - Var(\hat{Y_i}) \end{align} $$ So, I know what $Var(\hat{Y_i})$ is, but I do not know what $Cov(Y_i ,\hat{Y_i})$ is, although I suspect it is $Var(\hat{Y_i})$. If someone could help me with a derivation for this, that would be amazing.

$\endgroup$
4
  • $\begingroup$ $E[Y_i\epsilon_i]=E[(\beta_0 + \beta_1x_i + \epsilon_i)\epsilon_i]=E[\beta_0]E[\epsilon_i]+E[\beta_1x_i]E[\epsilon_i]+E[\epsilon_i^2]$ since $\epsilon_i$ is independent of $\beta_0, \beta_1, x_i$ $\endgroup$ Commented Mar 17, 2023 at 16:25
  • $\begingroup$ @GFrazao that still leaves me with the issue of calculating $E[\hat{\epsilon_i}^2]$? $\endgroup$ Commented Mar 17, 2023 at 16:32
  • $\begingroup$ It cancels with the $-E[\epsilon_i^2]$ which you already have. Take care with hats and no hats. I just gave you the piece that was not correct, which was $E[Y_i\hat \epsilon_i]\neq Y_iE[\hat\epsilon_i]$. $\endgroup$ Commented Mar 17, 2023 at 17:24
  • $\begingroup$ @GFrazao so in you original comment you have not used hats? So this surely does not help me, as using a similar method to you but with the variables I'm actually interested in would give $E[\hat{Y_i} \hat{\epsilon_i}] = E[\hat{\epsilon_i}(\hat{\beta_0} + \hat{\beta_1}x_i)] = E[\hat{\epsilon_i}\hat{\beta_0}] + E[\hat{\epsilon_i}\hat{\beta_1}x_i]$ and now if $\hat{\epsilon_i}$ is independent of $\hat{\beta_0},\hat{\beta_1},x_i$ then I can show it is equal to 0 $\endgroup$ Commented Mar 18, 2023 at 11:55

2 Answers 2

2
$\begingroup$

\begin{align} cov(Y_i, \hat Y_i) &= cov(Y_i, \hat \beta_0 + \hat \beta_1 X_i ) \\ & = cov(Y_i, \hat \beta_0 + \hat \beta_1 X_i )\\ & = cov(Y_i, \bar Y - \hat \beta_1 \bar X + \hat \beta_1 X_i )\\ & = cov(Y_i, \bar Y - \hat \beta_1 (X_i - \bar X) )\\ & = \frac{1}{n}Var(Y_i) + cov(Y_i , \frac{(X_i - \bar X) \sum (X_i - \bar X ) Y_i }{\sum ( X_i - \bar X ) ^ 2})\\ & = \frac{1}{n}\sigma ^ 2 + cov(Y_i , \frac{(X_i - \bar X) ^ 2 Y_i }{\sum ( X_i - \bar X ) ^ 2})\\ & = \frac{1}{n}\sigma ^ 2 + \frac{(X_i - \bar X) ^ 2 Var(Y_i) }{\sum ( X_i - \bar X ) ^ 2}\\ & = \frac{1}{n}\sigma ^ 2 + \frac{\sigma ^ 2 (X_i - \bar X) ^ 2 }{\sum ( X_i - \bar X ) ^ 2}\\ \end{align}

which is the same as $Var(\hat Y_i)$.

$\endgroup$
0
$\begingroup$

A much more straightforward and cleaner approach is to consider a matrix approach. Here the model is $$ y = \beta_0 \mathbf{1} + \beta_1 x + e = X \beta + e $$ where $X = [\mathbf{1}, x]$ and $\beta^\intercal = [\beta_0, \beta_1].$ Then, $\hat y = X (X^\intercal X)^{-1} X^\intercal y = P_Xy,$ where $P_X$ is the orthogonal projector onto the linear subspace $V_X = \langle X \rangle.$ Then $\hat e = y - \hat y = (I - P_X)y = P_X^\perp y.$ Finally, $$ \mathbf{E}(\hat e \hat y^\intercal) = \mathbf{E}(P_X^\perp y y^\intercal P_X) = P_X^\perp \mathbf{E}(yy^\intercal) P_X = P_X^\perp (\sigma^2 I) P_X = 0, $$ since $P_X^\perp P_X = 0$ (project onto $V_X$ and then onto $V_X^\perp$ gives zero). Then, entry $(i,i)$ of $\mathbf{E}(\hat e \hat y^\perp)$ is zero, i.e. $\mathbf{E}(\hat e_i \hat y_i) = 0.$ (Note that we never used normality, just that $\mathbf{E}(yy^\intercal) = \sigma^2 I,$ meaning uncorrelated observations with a common variance.)

$\endgroup$
6
  • $\begingroup$ I've not covered any matrix nor vectors in my statistics courses so this is not very helpful. Could you explain in the usual notation? $\endgroup$ Commented Mar 18, 2023 at 12:10
  • $\begingroup$ @spooleey that is quite sad: linear regression can be shown as a projection problem. Given a vector $y,$ find the projection on $V_X$ which is the space spanned by the columns of $X.$ This view completely has the usual normality assumptions as a particular case plus it is a straightforward calculation. $\endgroup$ Commented Mar 20, 2023 at 1:00
  • $\begingroup$ @abhishek both quantities $\hat e^\intercal \hat y$ and $\hat e \hat y^\intercal$ are defined. The former being the dot product and the latter the exterior product. $\endgroup$ Commented Aug 30, 2023 at 14:46
  • $\begingroup$ Im sorry, i misread the answer $\endgroup$ Commented Aug 30, 2023 at 15:55
  • $\begingroup$ But still i do not see how $E[yy^T]=\sigma^2 I$ $\endgroup$ Commented Aug 30, 2023 at 17:08

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.