Covariance between $\hat{\beta}_1$ and $\hat{\beta}_0$ for a simple linear model with correlated errors

Question

$\newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}}$I've found this assignment, given to undergrad students in a university in Cyprus, in 2022, where a simple linear model is given, i.e. $y_i=\beta_0+\beta_1 x_i+\varepsilon_{i}$, with the assumptions about the errors that $E\left[\varepsilon_{i}\right]=0$, $\Var(\varepsilon_i)=\sigma^2$ and $\Cov(\varepsilon_{i},\varepsilon_{j})=\rho$, if $\left|i-j\right|=1$, and $0$, otherwise.

It is asked to prove that $$\Cov(\hat{\beta}_0,\hat{\beta}_1)=-\sigma^2\frac{\bar{x}}{s_{xx}}-\rho\left\{\frac{1}{n}\frac{x_1+x_n-2\bar{x}}{s_{xx}}+2\frac{\bar{x}}{s_{xx}^{2}}\sum_{i=1}^{n-1}(x_i-\bar{x})(x_{i+1}-\bar{x})\right\}.$$

I've already tried to expand the expression of $\hat{\beta}_0$ and $\hat{\beta}_1$, giving $$\Cov(\hat{\beta}_0,\hat{\beta}_1)=\Cov(\bar{y}-\hat{\beta}_1\bar{x},\hat{\beta}_1)=\Cov(\bar{y},\hat{\beta}_1)-\bar{x} \Var(\hat{\beta}_1),$$ where $$\Cov(\bar{y},\hat{\beta}_1)=\Cov\left(\frac{1}{n}\sum y_i,\frac{1}{s_{xx}}\sum(x_i-\bar{x})(y_i-\bar{y})\right)$$ and substituting the expression for $y_i$, I got $$=\Cov\left(\frac{1}{n}\sum(\beta_0+\beta_1 x_i+\varepsilon_i),\frac{1}{s_{xx}}\sum(x_i-\bar{x})(\beta_0+\beta_1 x_i+\varepsilon_i)\right)=\dots$$ I've spent about two days doing tedious computations, but as near I get to the result, I cannot find the expression of the covariance between the errors.

Any help would be deeply appreciated, as I am about to leave this exercise. Thank you in advance!

(1) What estimator are you using to find $\hat\beta_0$ and $\hat\beta_1$? That should be your point of departure. (2) Your assumptions allow you to assume, without loss of generality, that $\sigma^2=1,$ $\bar x = 0,$ and $s_{xx}=1$ (but you will have to work out how this standardization affects the covariance). (3) When calculations are tedious, consider resorting to matrix notation or first consider the smallest possible values of $n$ and write everything out explicitly to see the patterns. — whuber
– whuber ♦, Commented Mar 16 at 13:29
what is your definition of $s_{xx}$? Please write it down explicitly. — Zhanxiong
– Zhanxiong, Commented Mar 20 at 17:41

whuber · Accepted Answer · 2025-03-20 14:13:16Z

The reference to undergraduates suggests the most elementary possible solution is sought. I will therefore rely solely on (a) high-school level algebra and (b) basic properties of variances and covariances: namely, their bilinearity.

The point of this presentation is to demonstrate both the computational and conceptual value of choosing an appropriate way to express the variables: namely, to standardize them to unit variance and (in the case of the explanatory variables) to zero mean. This enables you to see what the solution must be before you do any algebraic work at all. It also leads to effective algorithms in case you need to code a solution.

Begin with a basic operation: standardizing the variables. Changing the units of measure, express the variables as $y_i = \sigma\eta_i$ (thereby rescaling the error terms in the model to have unit variance) and $x_i = m + s \xi_i$ (rescaled and centered) where $$0 = n\bar \xi_i = \sum_i \xi_i$$ and $$\sum_i \xi^2 = n.$$ As usual, we find

$$m=\bar x;\ s^2 = \frac{1}{n}\sum_i (x_i-\bar x)^2;\ \text{and }\xi_i = \frac{x_i - \bar x}{s}.\tag{*}$$

In these units, and writing $\tau = 1/\sigma$ for convenience, the model is

$$\eta_i/\tau = y_i = \beta_0+\beta_1 x_i + \varepsilon_i = \beta_0 + \beta_1(m + s\xi_i) + \varepsilon_i.$$

Equivalently, multiplying both sides by $\tau$ and expanding,

$$\eta_i = \tau(\beta_0 + m\beta_1) + (\tau s \beta_1)\xi_i + \delta_i = \alpha_0 + \alpha_1 \xi_i + \delta_i.$$

where $\operatorname{Var}(\delta_i) = \tau^2 \sigma^2 = 1.$

If we employ the usual least squares solutions (not the weighted least squares solutions!) then their linearity implies

$$\tau(\hat\beta_0 + m\hat\beta_1) = \hat\alpha_0 = 0\ \text{ and }\ \tau s \hat\beta_1 = \hat\alpha_1 = \frac{\sum_i \eta_i \xi_i}{\sum_i \xi_i^2} = \frac{1}{n}\sum_i \eta_i \xi_i.$$

Use these to compute the variances and covariances of the $\hat\alpha_*:$

$0 = \operatorname{Var}(0) = \operatorname{Var}(\hat\alpha_0) = \tau^2\color{red}{\operatorname{Var}(\hat\beta_0)} + 2\tau m \color{red}{\operatorname{Cov}(\hat\beta_0,\hat\beta_1)} + (\tau m)^2 \color{red}{\operatorname{Var}(\hat\beta_1)}.$
$0 = \operatorname{Cov}(0,\hat\alpha_1) = \tau(\tau s)\color{red}{\operatorname{Cov}(\hat\beta_0,\hat\beta_1)} + \tau(m)(\tau s)\color{red}{\operatorname{Var}(\hat\beta_1)}.$
$\operatorname{Var}(n\hat\alpha_1) = (n \tau s)^2\color{red}{\operatorname{Var}(\hat\beta_1)} = \operatorname{Var}\left(\sum_i \eta_i\xi_i\right) = \color{blue}{\sum_i \sum_j \xi_i\xi_j \operatorname{Cov}(\delta_i,\delta_j)}.$

These are three simultaneous linear equations for the variances and covariances of $(\hat\beta_0,\hat\beta_1)$ (shown in red type). Moreover, they are triangular, making them straightforward to solve:

From (3) we find $$\operatorname{Var}(\hat\beta_1) = \frac{1}{(\tau s)^2}\color{blue}{\sum_i \sum_j \xi_i\xi_j \operatorname{Cov}(\delta_i,\delta_j)}.$$
With that solution in hand, from (2) we find $$\operatorname{Cov}(\hat\beta_0,\hat\beta_1) = -m\operatorname{Var}(\hat\beta_1) = - \frac{m}{(\tau s)^2}\color{blue}{\sum_i \sum_j \xi_i\xi_j \operatorname{Cov}(\delta_i,\delta_j)}.$$
At this point we have all we need, but it's worth noticing that $\operatorname{Var}(\hat\beta_0)$ can be found by plugging the two preceding solutions into (1).

The only coefficient requiring calculation is the last, which is determined by the covariance assumptions in the question: $\rho = \operatorname{Cov}(\varepsilon_i, \varepsilon_j) = \sigma^2 \operatorname{Cov}(\delta_i, \delta_j).$ Thus

$$\color{blue}{\sum_i \sum_j \xi_i\xi_j \operatorname{Cov}(\delta_i,\delta_j)} = \sum_{i=1}^n \xi_i^2 + \left(\frac{\rho}{\sigma^2}\right) 2 \sum_{i=1}^{n-1} \xi_i \xi_{i+1} = n + 2\left(\frac{\rho}{\sigma^2}\right) \sum_{i=1}^{n-1} \xi_i \xi_{i+1}.$$

Only the final term is new to this problem: everything else is the algebra of ordinary least squares regression. Clearly, it arises from the correlations between successive observations.

You can see the purported solution (presented in the question) taking shape here. The rest is merely a matter of plugging everything in from the solutions to $(1)$ - $(3)$ along with $(*)$ and doing the algebraic simplification, which can be left to the undergraduates to perform, because they have already seen it in the context of ordinary least squares ;-).

Can you finish off your computation to check if the expression in the OP is indeed correct? — Zhanxiong
– Zhanxiong, Commented Mar 20 at 18:29

Zhanxiong · Accepted Answer · 2025-03-21 14:12:48Z

Here is a proof heavily utilizing matrix operations, which begins with defining the following notations (as in conventional linear model calculations): \begin{align*} & \boldsymbol{y} = \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix} \in \mathbb{R}^n, \; \boldsymbol{e} = \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix} \in \mathbb{R}^n, \; \boldsymbol{x} = \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} \in \mathbb{R}^n. \\ & \boldsymbol{\varepsilon} = \begin{bmatrix} \varepsilon_1 \\ \vdots \\ \varepsilon_n \end{bmatrix} \in \mathbb{R}^n, \; X = \begin{bmatrix} \boldsymbol{e} & \boldsymbol{x} \end{bmatrix}. \end{align*} By condition, the covariance matrix of $\boldsymbol{\varepsilon}$ is \begin{align*} \operatorname{Cov}(\boldsymbol{\varepsilon}) = \sigma^2I_{(n)} + \begin{bmatrix}\boldsymbol{0}_{n - 1} & \rho I_{(n - 1)} \\ 0 & \boldsymbol{0}_{n - 1}^\top\end{bmatrix} + \begin{bmatrix}\boldsymbol{0}_{n - 1}^\top & 0 \\ \rho I_{(n - 1)} & \boldsymbol{0}_{n - 1}\end{bmatrix}. \tag{1}\label{1} \end{align*} The OLS estimate of $\boldsymbol{\beta}$ is \begin{align*} \hat{\boldsymbol{\beta}} = \begin{bmatrix} \hat{\beta_0} \\ \hat{\beta_1} \end{bmatrix} = (X^\top X)^{-1}X^\top\boldsymbol{y} = \boldsymbol{\beta} + (X^\top X)^{-1}X^\top\boldsymbol{\varepsilon}. \tag{2}\label{2} \end{align*} It thus follows by $\eqref{1}$ and $\eqref{2}$ that \begin{align*} & \operatorname{Cov}(\hat{\boldsymbol{\beta}}) = (X^\top X)^{-1}X^\top \operatorname{Cov}(\boldsymbol{\varepsilon}) X(X^\top X)^{-1} \\ =& \sigma^2(X^\top X)^{-1} + (X^\top X)^{-1}X^\top\begin{bmatrix}\boldsymbol{0}_{n - 1} & \rho I_{(n - 1)} \\ 0 & \boldsymbol{0}_{n - 1}^\top\end{bmatrix}X(X^\top X)^{-1} \\ & + (X^\top X)^{-1}X^\top\begin{bmatrix}\boldsymbol{0}_{n - 1}^\top & 0 \\ \rho I_{(n - 1)} & \boldsymbol{0}_{n - 1}\end{bmatrix}X(X^\top X)^{-1}. \tag{3}\label{3} \end{align*}

It is then clear that the core calculation lies in evaluating $X^\top\begin{bmatrix}\boldsymbol{0}_{n - 1} & \rho I_{(n - 1)} \\ 0 & \boldsymbol{0}_{n - 1}^\top\end{bmatrix}X$ (because the third term is just a transpose of it, which does not require additional computation), which is spelled out as follows (where $\boldsymbol{x}_{[-1]} = \begin{bmatrix} x_2 \\ \vdots \\ x_{n}\end{bmatrix}, \boldsymbol{x}_{[-n]} = \begin{bmatrix} x_1 \\ \vdots \\ x_{n - 1}\end{bmatrix}$, $\boldsymbol{e}_{n - 1}$ is a $(n - 1)$-long column vector of all ones): \begin{align*} & X^\top\begin{bmatrix}\boldsymbol{0}_{n - 1} & \rho I_{(n - 1)} \\ 0 & \boldsymbol{0}_{n - 1}^\top\end{bmatrix}X \\ =& \begin{bmatrix} \boldsymbol{e}^\top \\ \boldsymbol{x}^\top \end{bmatrix} \begin{bmatrix}\boldsymbol{0}_{n - 1} & \rho I_{(n - 1)} \\ 0 & \boldsymbol{0}_{n - 1}^\top\end{bmatrix} \begin{bmatrix} \boldsymbol{e} & \boldsymbol{x} \end{bmatrix} \\ =& \begin{bmatrix} \boldsymbol{e}_{n - 1}^\top & 1 \\ \boldsymbol{x}_{[-n]}^\top & x_n \end{bmatrix} \begin{bmatrix}\boldsymbol{0}_{n - 1} & \rho I_{(n - 1)} \\ 0 & \boldsymbol{0}_{n - 1}^\top\end{bmatrix} \begin{bmatrix} 1 & x_1 \\ \boldsymbol{e}_{n - 1} & \boldsymbol{x}_{[-1]} \end{bmatrix} \\ =& \begin{bmatrix} 0 & \rho\boldsymbol{e}_{n - 1}^\top \\ 0 & \rho\boldsymbol{x}_{[-n]}^\top \end{bmatrix} \begin{bmatrix} 1 & x_1 \\ \boldsymbol{e}_{n - 1} & \boldsymbol{x}_{[-1]} \end{bmatrix} \\ =& \rho\begin{bmatrix} n - 1 & \boldsymbol{e}_{n - 1}^\top\boldsymbol{x}_{[-1]} \\ \boldsymbol{e}_{n - 1}^\top\boldsymbol{x}_{[-n]} & \boldsymbol{x}_{[-1]}^\top\boldsymbol{x}_{[-n]} \end{bmatrix} \tag{4}\label{4} \end{align*} On the other hand, it is easy to see that (where $s_{xx} = \sum_{i = 1}^n(x_i - \bar{x})^2 = \sum_{i = 1}^n x_i^2 - n\bar{x}^2$) \begin{align*} (X^\top X)^{-1} = \begin{bmatrix} n & \sum_{i = 1}^n x_i \\ \sum_{i = 1}^n x_i & \sum_{i = 1}^n x_i^2 \end{bmatrix}^{-1} = \frac{1}{ns_{xx}}\begin{bmatrix} \boldsymbol{x}^\top\boldsymbol{x} & -\boldsymbol{e}^\top\boldsymbol{x} \\ -\boldsymbol{e}^\top\boldsymbol{x} & n \end{bmatrix}. \tag{5}\label{5} \end{align*} Substituting $\eqref{4}$ and $\eqref{5}$ into $\eqref{3}$ then yields \begin{align*} & \operatorname{Cov}(\hat{\boldsymbol{\beta}}) \\ =& \sigma^2\frac{1}{ns_{xx}}\begin{bmatrix} \boldsymbol{x}^\top\boldsymbol{x} & -\boldsymbol{e}^\top\boldsymbol{x} \\ -\boldsymbol{e}^\top\boldsymbol{x} & n \end{bmatrix} \\ & + \frac{\rho}{n^2s_{xx}^2}\begin{bmatrix} \boldsymbol{x}^\top\boldsymbol{x} & -\boldsymbol{e}^\top\boldsymbol{x} \\ -\boldsymbol{e}^\top\boldsymbol{x} & n \end{bmatrix} \begin{bmatrix} n - 1 & \boldsymbol{e}_{n - 1}^\top\boldsymbol{x}_{[-1]} \\ \boldsymbol{e}_{n - 1}^\top\boldsymbol{x}_{[-n]} & \boldsymbol{x}_{[-1]}^\top\boldsymbol{x}_{[-n]} \end{bmatrix} \begin{bmatrix} \boldsymbol{x}^\top\boldsymbol{x} & -\boldsymbol{e}^\top\boldsymbol{x} \\ -\boldsymbol{e}^\top\boldsymbol{x} & n \end{bmatrix} \\ & + \frac{\rho}{n^2s_{xx}^2}\begin{bmatrix} \boldsymbol{x}^\top\boldsymbol{x} & -\boldsymbol{e}^\top\boldsymbol{x} \\ -\boldsymbol{e}^\top\boldsymbol{x} & n \end{bmatrix} \begin{bmatrix} n - 1 & \boldsymbol{e}_{n - 1}^\top\boldsymbol{x}_{[-n]} \\ \boldsymbol{e}_{n - 1}^\top\boldsymbol{x}_{[-1]} & \boldsymbol{x}_{[-1]}^\top\boldsymbol{x}_{[-n]} \end{bmatrix} \begin{bmatrix} \boldsymbol{x}^\top\boldsymbol{x} & -\boldsymbol{e}^\top\boldsymbol{x} \\ -\boldsymbol{e}^\top\boldsymbol{x} & n \end{bmatrix} \end{align*} Therefore, \begin{align*} & \operatorname{Cov}(\hat{\beta_0}, \hat{\beta_1}) = \begin{bmatrix} 1 & 0 \end{bmatrix} \operatorname{Cov}(\hat{\boldsymbol{\beta}})\begin{bmatrix} 0 \\ 1 \end{bmatrix} \\ =& -\sigma^2\frac{\bar{x}}{s_{xx}} \\ &+ \frac{\rho}{n^2s_{xx}^2}\begin{bmatrix} \boldsymbol{x}^\top\boldsymbol{x} & -\boldsymbol{e}^\top\boldsymbol{x} \end{bmatrix} \begin{bmatrix} n - 1 & \boldsymbol{e}_{n - 1}^\top\boldsymbol{x}_{[-1]} \\ \boldsymbol{e}_{n - 1}^\top\boldsymbol{x}_{[-n]} & \boldsymbol{x}_{[-1]}^\top\boldsymbol{x}_{[-n]} \end{bmatrix} \begin{bmatrix} -\boldsymbol{e}^\top\boldsymbol{x} \\ n \end{bmatrix} \\ & + \frac{\rho}{n^2s_{xx}^2}\begin{bmatrix} \boldsymbol{x}^\top\boldsymbol{x} & -\boldsymbol{e}^\top\boldsymbol{x} \end{bmatrix} \begin{bmatrix} n - 1 & \boldsymbol{e}_{n - 1}^\top\boldsymbol{x}_{[-n]} \\ \boldsymbol{e}_{n - 1}^\top\boldsymbol{x}_{[-1]} & \boldsymbol{x}_{[-1]}^\top\boldsymbol{x}_{[-n]} \end{bmatrix} \begin{bmatrix} -\boldsymbol{e}^\top\boldsymbol{x} \\ n \end{bmatrix} \\ =& -\sigma^2\frac{\bar{x}}{s_{xx}} \\ & + \rho\left\{\frac{\bar{x} - x_1}{ns_{xx}} - \frac{\bar{x}}{s_{xx}^2}\sum_{i = 1}^{n - 1}(x_i - \bar{x})(x_{i + 1} - \bar{x}) \right\} \\ & + \rho\left\{\frac{\bar{x} - x_n}{ns_{xx}} - \frac{\bar{x}}{s_{xx}^2}\sum_{i = 1}^{n - 1}(x_i - \bar{x})(x_{i + 1} - \bar{x}) \right\} \\ =& -\sigma^2\frac{\bar{x}}{s_{xx}} - \rho\left\{\frac{x_1 + x_n - 2\bar{x}}{ns_{xx}} + 2\frac{\bar{x}}{s_{xx}^2}\sum_{i = 1}^{n - 1}(x_i - \bar{x})(x_{i + 1} - \bar{x}) \right\}. \end{align*} This completes the proof.

The simulation study below also empirically supports the answer:

Nsim <- 1000 N <- 1000 rho <- 0.5 sigma_sq <- 1 + rho^2 beta <- matrix(c(1, 1), 2, 1) beta_mat = matrix(0, nrow = Nsim, ncol = 2) op_ans <- numeric(Nsim) for (n in 1:Nsim) { w <- rnorm(N) eps <- w + rho * c(rnorm(1), w[1:(N - 1)]) x <- rnorm(N, 1, 1) y <- beta[1] + beta[2] * x + eps beta_mat[n, ] <- coef(lm(y ~ x)) xbar <- mean(x) s_xx = sum(x^2) - N * xbar^2 # OP answer op_ans[n] <- -sigma_sq * xbar / s_xx - rho * ((x[1] + x[N] - 2 * xbar)/(N * s_xx) + 2 * xbar * sum((x[1:(N - 1)] - xbar) * (x[2:N] - xbar))/s_xx^2) } > cov(beta_mat) [,1] [,2] [1,] 0.003642634 -0.00125121 [2,] -0.001251210 0.00121486 > mean(op_ans) [1] -0.001253316

(+1) When I wrote my answer I suspected $\rho$ was intended to be a correlation rather than a covariance, so I left it to the OP to discover (or settle) that issue. — whuber
– whuber ♦, Commented Mar 20 at 18:59
Might be one of my brain fade moments, but could you tell why $\operatorname{Cov}(\hat{\beta_0}, \hat{\beta_1}) = \begin{bmatrix} 0 & 1 \end{bmatrix} \operatorname{Cov}(\hat{\boldsymbol{\beta}})\begin{bmatrix} 0 \\ 1 \end{bmatrix}$? I thought the first vector would be $\begin{bmatrix}1 & 0\end{bmatrix}.$ — User1865345
– User1865345, Commented Mar 21 at 2:30
@User1865345 You are absolutely right! This must be the reason of I did not verify OP's answer. Will redo the last step calculation. — Zhanxiong
– Zhanxiong, Commented Mar 21 at 12:57

Sextus Empiricus · Accepted Answer · 2025-03-21 13:39:37Z

An alternative way to work it out:

Ordinary linear regression can always be written as a linear estimator (a sum of the $y_i$)

$$\beta_0 = \sum_{i=1}^n a_i y_i \\ \beta_1 = \sum_{i=1}^n b_i y_i$$

for simple linear regression this is with

$$\begin{array}{rcl} a_i &=& \frac{1}{n} -\left( \frac{x_i-\bar{x}}{S_{xx}} - \frac{(1-n)\bar{x}}{S_{xx}} \bar{y}\right) \bar{x}\\ b_i &=& \frac{x_i-\bar{x}}{S_{xx}} - \frac{(1-n)\bar{x}}{S_{xx}} \bar{y}\end{array}$$

then the covariance is

$$Cov(\beta_0,\beta_1) = \sum_{i=1}^n\sum_{j=1}^n a_i b_j Cov(\epsilon_i,\epsilon_j)$$

with

$$Cov(\epsilon_i,\epsilon_j) = \begin{cases} \sigma^2 & \quad\text{if $i = j$} \\ \rho & \quad\text{if $|i - j| = 1$} \\ 0 & \quad \text{else} \end{cases}$$

the rest is algebraic simplifications.

About the formula from the exercise

It can be tedious computations. If the direct manipulation of the formulas doesn't work out. Then using simulations or explicit computations can help out to find the culprit more easily

Also, if $\rho = 0$ then the equations are easier and it should be simple to make a quick verification of the formula. I believe that in that case the covariance should be $$-\sigma^2 \frac{\bar{x}}{s_{xx}-n\bar{x}^2}$$ where $s_{xx} = \sum_{i=1}^n x_i^2$. So the equation from your question seems not right, unless it defined $s_{xx} = \sum_{i=1}^n (x_i-\bar{x})^2$.

Below is some R computer code to estimate the covariance and it verifies that for $\rho = 0$ the correct formula is with $s_{xx}-n\bar{x}^2$ in the denominator.

### some settings set.seed(1) n = 10 x = 1:n ### direct computation of covariance sxx = sum(x^2) mx = mean(x) -mx/(sxx-mx^2*n) # result = -0.06666667 ### using simulations ### sigma = 1 beta = replicate(10^4, expr = { y = 1+x+rnorm(n) mod = lm(y~x) mod$coefficients } ) cov(t(beta)) ### result = -0.06687391

Your specification of $\operatorname{Cov}(\epsilon_i, \epsilon_j)$ is off. — Zhanxiong
– Zhanxiong, Commented Mar 20 at 18:21
@Zhanxiong I see now, the off-diagonal terms should be $\rho$ instead of $\rho\sigma^2$. — Sextus Empiricus
– Sextus Empiricus, Commented Mar 20 at 18:25
Also not all off-diagonals are $\rho$, it is just a tri-diagonal matrix. — Zhanxiong
– Zhanxiong, Commented Mar 20 at 18:26

Stack Exchange Network

Covariance between $\hat{\beta}_1$ and $\hat{\beta}_0$ for a simple linear model with correlated errors

3 Answers 3

About the formula from the exercise

Linked

Hot Network Questions

Covariance between $\hat{\beta}_1$ and $\hat{\beta}_0$ for a simple linear model with correlated errors

3 Answers 3

About the formula from the exercise

Linked

Related

Hot Network Questions