Trouble understanding how the variance is calculated in a linear regression problem.

Question

I have the following problem:

The regression model 𝑌=𝛽𝑥+𝑒 (𝑒∼𝑁(𝑜,𝜎2)) is a model that passes by the origin meaning that $𝐸(𝑌|𝑋)=0$. You dispose of n independent observations. $(X_i, Y_i), i = 1,...,n$ of this model. 1) What is the estimator $\beta$ of the least squares of $\beta$ ? 2) What is the sampling law of $\beta$?

1) for this question the solution goes: $$ \sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n} (y_i - \beta x_i)^2 = L(\beta)$$ $$L'(\beta) = -2 \sum_{i=1}^{n} x_i(y_i -\beta x_i) = 0$$ $$L'(\beta) = \sum_{i=1}^{n} x_iy_i - \beta x_i^2 = 0$$ $$\hat{\beta} = \dfrac{1}{\sum x_i^2}\sum x_iy_i$$

Now we have our estimate of $\beta$.

When it comes to question 2) :

First we want the expectation and variance of y because it will be needed to find the law of $\beta$. I'll spare the details of how we got them because I want to focus on $\beta$

$$E(y_i) = \beta x_i$$ $$var(y_i) = \sigma^2$$

Secondly we can compute the expectation and variance of $\hat{\beta}$ we will just call it $\beta$ from now on.

$$ E(\beta) = \dfrac{1}{\sum x_i^2}\sum x_iE(y_i)$$ $$ E(\beta) = \dfrac{1}{\sum x_i^2}\sum x_i \beta x_i$$ $$ E(\beta) = \beta \dfrac{\sum x_i^2}{\sum x_i^2}$$ $$ E(\beta) = \beta$$

I simply do not understand how the var($\beta$) was found. I tried applying the theorems $E[(\beta - E[\beta])^2]$. I never end up arriving to the solution. I always have some extra constants added to the sigma below. Down below is the correct answer. It would be helpful if someone could show me step by step how we arrived to it. $$var(\beta) = \dfrac{\sum x_i^{2}}{\sum x_i^{2}} n\sigma^2$$

Finally we can state that $\beta \sim N(\beta, \dfrac{\sum x_i^{2}}{\sum x_i^{2}} n\sigma^2)$.

Should not the $\sum x_i^2$ in both the numerator and the denominator be gone away? — Mostafa Ayaz
– Mostafa Ayaz, Commented Dec 16, 2019 at 12:27
Assuming $x_i$'s are fixed, variance of $\hat\beta$ is $\sigma^2/\sum x_i^2$. — StubbornAtom
– StubbornAtom, Commented Jan 29, 2020 at 19:34

RLC · Accepted Answer · 2019-12-11 22:35:30Z

In general, the least square estimate is $\hat{\beta}=(X^{T}X)^{-1}X^{T}Y$ where $Y=(y_1,...,y_n)^{T}$ and in this case $X=(x_1,...,x_n)^{T}$. Because $Y\sim N_n(X\beta,\sigma^2I_n)$ then $\hat{\beta}\sim N_1(\beta,\sigma^2(X^{T}X)^{-1})$ where in this case $(X^{T}X)^{-1}=\left(\sum_{i=1}^nx_i^2\right)^{-1}$.

You don't calculate $E[\hat{\beta}]=\int\beta f(\beta)d\beta$ directly. Instead, you use the hypothesis $Y\sim N_n(X\beta,\sigma^2I_n)$ and use the linearity of the expected value, namely, $E[\hat{\beta}]=E[(X^{T}X)^{-1}X^{T}Y]=(X^{T}X)^{-1}X^{T}E[Y]=(X^{T}X)^{-1}X^{T}X\beta=\beta$. Note that it is a fact that $\hat{\beta}$ is gaussian because is a linear transformation of $Y$, which is gaussian. For the variance use the property $Var(AY)=AVar(Y)A^{T}$.

Stack Exchange Network

Trouble understanding how the variance is calculated in a linear regression problem.

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Trouble understanding how the variance is calculated in a linear regression problem.

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions