Skip to main content
Tweeted twitter.com/#!/StackStats/status/501857961376235520
added 174 characters in body; edited tags
Source Link
Comp_Warrior
  • 2.2k
  • 2
  • 24
  • 37

Suppose you want to estimate a linear model: ($n$ observations of the response, and $p+1$ predictors) $$\mathbb{E}(y_i) = \beta_0 + \sum_{j=1}^p \beta_j x_{ij}$$

One way to do this is through the OLS solution, i.e. choose the coefficients so that the sum of square errors is minimum:

$$(\beta_0,\beta_1,\cdots,\beta_p)^T = \underset{\beta_0,\beta_1,\cdots,\beta_p}{\arg \min} \sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right)^2 $$

Alternatively, you could use another loss function, like the sum of the absolute deviations, so that:

$$(\beta_0,\beta_1,\cdots,\beta_p)^T = \underset{\beta_0,\beta_1,\cdots,\beta_p}{\arg \min} \sum_{i=1}^{n} \left| y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right| $$

Suppose you have found the parameters for the two models, and want to choose the model with the smallest value of the loss function. How can you compare the minimum values attained by the loss functions in general? (i.e. not just this specific case - we could also try other $L_p$ based loss functions) There seems to be a difference in the scale of the functions - one deals with squares while the other does not.

Suppose you want to estimate a linear model: ($n$ observations of the response, and $p+1$ predictors) $$\mathbb{E}(y_i) = \beta_0 + \sum_{j=1}^p \beta_j x_{ij}$$

One way to do this is through the OLS solution, i.e. choose the coefficients so that the sum of square errors is minimum:

$$(\beta_0,\beta_1,\cdots,\beta_p)^T = \underset{\beta_0,\beta_1,\cdots,\beta_p}{\arg \min} \sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right)^2 $$

Alternatively, you could use another loss function, like the sum of the absolute deviations, so that:

$$(\beta_0,\beta_1,\cdots,\beta_p)^T = \underset{\beta_0,\beta_1,\cdots,\beta_p}{\arg \min} \sum_{i=1}^{n} \left| y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right| $$

Suppose you have found the parameters for the two models. How can you compare the minimum values attained by the loss functions? There seems to be a difference in the scale of the functions - one deals with squares while the other does not.

Suppose you want to estimate a linear model: ($n$ observations of the response, and $p+1$ predictors) $$\mathbb{E}(y_i) = \beta_0 + \sum_{j=1}^p \beta_j x_{ij}$$

One way to do this is through the OLS solution, i.e. choose the coefficients so that the sum of square errors is minimum:

$$(\beta_0,\beta_1,\cdots,\beta_p)^T = \underset{\beta_0,\beta_1,\cdots,\beta_p}{\arg \min} \sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right)^2 $$

Alternatively, you could use another loss function, like the sum of the absolute deviations, so that:

$$(\beta_0,\beta_1,\cdots,\beta_p)^T = \underset{\beta_0,\beta_1,\cdots,\beta_p}{\arg \min} \sum_{i=1}^{n} \left| y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right| $$

Suppose you have found the parameters for the two models, and want to choose the model with the smallest value of the loss function. How can you compare the minimum values attained by the loss functions in general? (i.e. not just this specific case - we could also try other $L_p$ based loss functions) There seems to be a difference in the scale of the functions - one deals with squares while the other does not.

edited title
Link
Comp_Warrior
  • 2.2k
  • 2
  • 24
  • 37

Comparing errorsresiduals between OLS and non-OLS regressions

Source Link
Comp_Warrior
  • 2.2k
  • 2
  • 24
  • 37

Comparing errors between OLS and non-OLS regressions

Suppose you want to estimate a linear model: ($n$ observations of the response, and $p+1$ predictors) $$\mathbb{E}(y_i) = \beta_0 + \sum_{j=1}^p \beta_j x_{ij}$$

One way to do this is through the OLS solution, i.e. choose the coefficients so that the sum of square errors is minimum:

$$(\beta_0,\beta_1,\cdots,\beta_p)^T = \underset{\beta_0,\beta_1,\cdots,\beta_p}{\arg \min} \sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right)^2 $$

Alternatively, you could use another loss function, like the sum of the absolute deviations, so that:

$$(\beta_0,\beta_1,\cdots,\beta_p)^T = \underset{\beta_0,\beta_1,\cdots,\beta_p}{\arg \min} \sum_{i=1}^{n} \left| y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right| $$

Suppose you have found the parameters for the two models. How can you compare the minimum values attained by the loss functions? There seems to be a difference in the scale of the functions - one deals with squares while the other does not.