Skip to main content
fixed notation
Source Link
bill_e
  • 2.9k
  • 2
  • 26
  • 33

Why is it necessary to place the distributional assumption on the errors, i.e.

$y_i = X\beta + \epsilon_{i}$, with $\epsilon_{i} \sim \mathcal{N}(0,\sigma^{2})$.

Why not write

$y_i = X\beta + \epsilon_{i}$, with $y_i \sim \mathcal{N}(\hat{y},\sigma^{2})$$y_i \sim \mathcal{N}(X\hat{\beta},\sigma^{2})$,

where in either case $\epsilon_i = y_i - \hat{y}$.
I've seen it stressed that the distributional assumptions are placed on the errors, not the data, but without explanation.

I'm not really understanding the difference between these two formulations. Some places I see distributional assumptions being placed on the data (Bayesian lit. it seems mostly), but most times the assumptions are placed on the errors.

When modelling, why would/should one choose to begin with assumptions on one or the other?

Why is it necessary to place the distributional assumption on the errors, i.e.

$y_i = X\beta + \epsilon_{i}$, with $\epsilon_{i} \sim \mathcal{N}(0,\sigma^{2})$.

Why not write

$y_i = X\beta + \epsilon_{i}$, with $y_i \sim \mathcal{N}(\hat{y},\sigma^{2})$,

where in either case $\epsilon_i = y_i - \hat{y}$.
I've seen it stressed that the distributional assumptions are placed on the errors, not the data, but without explanation.

I'm not really understanding the difference between these two formulations. Some places I see distributional assumptions being placed on the data (Bayesian lit. it seems mostly), but most times the assumptions are placed on the errors.

When modelling, why would/should one choose to begin with assumptions on one or the other?

Why is it necessary to place the distributional assumption on the errors, i.e.

$y_i = X\beta + \epsilon_{i}$, with $\epsilon_{i} \sim \mathcal{N}(0,\sigma^{2})$.

Why not write

$y_i = X\beta + \epsilon_{i}$, with $y_i \sim \mathcal{N}(X\hat{\beta},\sigma^{2})$,

where in either case $\epsilon_i = y_i - \hat{y}$.
I've seen it stressed that the distributional assumptions are placed on the errors, not the data, but without explanation.

I'm not really understanding the difference between these two formulations. Some places I see distributional assumptions being placed on the data (Bayesian lit. it seems mostly), but most times the assumptions are placed on the errors.

When modelling, why would/should one choose to begin with assumptions on one or the other?

edited tags
Link
gung - Reinstate Monica
  • 150.3k
  • 90
  • 418
  • 748

Why is it necessary to place the distributional assumption on the errors, i.e.

$y_i = X\beta + \epsilon_{i}$, with $\epsilon_{i} \sim \mathcal{N}(0,\sigma^{2})$.

Why not write,

$y_i = X\beta + \epsilon_{i}$, with $y_i \sim \mathcal{N}(\hat{y},\sigma^{2})$,

where in either case, $\epsilon_i = y_i - \hat{y}$. I've
I've seen it stressed that the distributional assumptions are placed on the errors, not the data, but without explanation.

I'm not really understanding the difference between these two formulations. Some places I see distributional assumptions being placed on the data (bayesianBayesian lit. it seems mostly), but most times the assumptions are placed on the errors. When modeling, why would/should one choose to begin with assumptions on one or the other?

When modelling, why would/should one choose to begin with assumptions on one or the other?

Why is it necessary to place the distributional assumption on the errors, i.e.

$y_i = X\beta + \epsilon_{i}$, with $\epsilon_{i} \sim \mathcal{N}(0,\sigma^{2})$.

Why not write,

$y_i = X\beta + \epsilon_{i}$, with $y_i \sim \mathcal{N}(\hat{y},\sigma^{2})$,

where in either case, $\epsilon_i = y_i - \hat{y}$. I've seen it stressed that the distributional assumptions are placed on the errors, not the data, but without explanation.

I'm not really understanding the difference between these two formulations. Some places I see distributional assumptions being placed on the data (bayesian lit. it seems mostly), but most times the assumptions are placed on the errors. When modeling, why would/should one choose to begin with assumptions on one or the other?

Why is it necessary to place the distributional assumption on the errors, i.e.

$y_i = X\beta + \epsilon_{i}$, with $\epsilon_{i} \sim \mathcal{N}(0,\sigma^{2})$.

Why not write

$y_i = X\beta + \epsilon_{i}$, with $y_i \sim \mathcal{N}(\hat{y},\sigma^{2})$,

where in either case $\epsilon_i = y_i - \hat{y}$.
I've seen it stressed that the distributional assumptions are placed on the errors, not the data, but without explanation.

I'm not really understanding the difference between these two formulations. Some places I see distributional assumptions being placed on the data (Bayesian lit. it seems mostly), but most times the assumptions are placed on the errors.

When modelling, why would/should one choose to begin with assumptions on one or the other?

Source Link
bill_e
  • 2.9k
  • 2
  • 26
  • 33
Loading