Revisions to Regression residual distribution assumptions

fixed notation

edited Feb 3, 2014 at 5:54

2.9k
2
26
33

Why is it necessary to place the distributional assumption on the errors, i.e.

$y_i = X\beta + \epsilon_{i}$, with $\epsilon_{i} \sim \mathcal{N}(0,\sigma^{2})$.

Why not write

$y_i = X\beta + \epsilon_{i}$, with $y_i \sim \mathcal{N}(\hat{y},\sigma^{2})$$y_i \sim \mathcal{N}(X\hat{\beta},\sigma^{2})$,

where in either case $\epsilon_i = y_i - \hat{y}$.
I've seen it stressed that the distributional assumptions are placed on the errors, not the data, but without explanation.

I'm not really understanding the difference between these two formulations. Some places I see distributional assumptions being placed on the data (Bayesian lit. it seems mostly), but most times the assumptions are placed on the errors.

When modelling, why would/should one choose to begin with assumptions on one or the other?

edited tags

Link

edited Jan 28, 2014 at 16:52

gung - Reinstate Monica

150.3k
90
418
748

Formatting

Source Link

edit approved Jan 28, 2014 at 12:04

user35349

Why is it necessary to place the distributional assumption on the errors, i.e.

$y_i = X\beta + \epsilon_{i}$, with $\epsilon_{i} \sim \mathcal{N}(0,\sigma^{2})$.

Why not write,

$y_i = X\beta + \epsilon_{i}$, with $y_i \sim \mathcal{N}(\hat{y},\sigma^{2})$,

where in either case, $\epsilon_i = y_i - \hat{y}$. I've
I've seen it stressed that the distributional assumptions are placed on the errors, not the data, but without explanation.

I'm not really understanding the difference between these two formulations. Some places I see distributional assumptions being placed on the data (bayesianBayesian lit. it seems mostly), but most times the assumptions are placed on the errors. When modeling, why would/should one choose to begin with assumptions on one or the other?

When modelling, why would/should one choose to begin with assumptions on one or the other?

Source Link

asked Jan 28, 2014 at 11:09

bill_e

2.9k
2
26
33

Loading

Stack Exchange Network

Return to Question