5
$\begingroup$

I am going through the book elements of statistical learning:http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf

In chapter 3 on linear regression (page 44), the author mentions that the least squares criterion (from the statistical point of view) is valid if:

  1. If the training observations $(x_i, y_i)$ represent independent random draws.

Or

  1. The $y_i$'s are conditionally independent given the inputs $x_i$.

I don't understand this requirement, the criterion seems valid to me no matter what, all it does is measure the goodness of fit in linear terms.

Can anyone explain to me the requirements 1 and 2 ?

$\endgroup$
4
  • 2
    $\begingroup$ Think about what validity of OLS means in the context of the chapter. That should help. $\endgroup$ Commented Sep 11, 2016 at 14:59
  • 2
    $\begingroup$ Given any data set $(x_i,y_i)$ one can construct the line that best represents the data, that is one can construct the hat matrix. However, if one is interested in understanding how well the resulting model fits the data, then one has to make assumptions. As a silly example imagine trying to estimate the relationship between age and income by only sampling from children in school. That would be a violation of #1. Imagine sampling from 21 year olds and one sample consists of college grads with engineering degrees and the other of hs dropouts. That would provide an issue with #2. $\endgroup$ Commented Sep 11, 2016 at 15:34
  • $\begingroup$ @aginensky: You should convert your comment into an answer! $\endgroup$ Commented Sep 11, 2016 at 16:05
  • $\begingroup$ @ kjetil- sure. I will do so later. For me to be happy calling it an answer, I'd like to add a few more details :) $\endgroup$ Commented Sep 11, 2016 at 17:58

1 Answer 1

5
$\begingroup$

Given any data set (xi,yi)(xi,yi) one can construct the line that best represents the data, that is one can construct the hat matrix. However, if one is interested in understanding how well the resulting model fits the data, then one has to make assumptions. As a silly example imagine trying to estimate the relationship between age and income by only sampling from children in school. That would be a violation of #1. Imagine sampling from 21 year olds and one sample consists of college grads with engineering degrees and the other of hs dropouts. That would provide an issue with #2. The OP suggested I make this an answer, so I did.

$\endgroup$
3
  • $\begingroup$ if they are independent are they not conditionally independent as well? $\endgroup$ Commented Dec 20, 2024 at 12:05
  • $\begingroup$ Independent variables are conditionally independent, but I don't understand the purpose of your question. $\endgroup$ Commented Dec 20, 2024 at 22:40
  • $\begingroup$ could you take a look at stats.stackexchange.com/questions/659012/… $\endgroup$ Commented Dec 25, 2024 at 17:53

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.