Revisions to Why is a large choice of K lowering my cross validation score?

additional example

edited Oct 8, 2016 at 20:31

632
4
22

r^2 score is undefined when applied to a single sample (e.g. leave-one-out CV).

r^2 is not good for evaluation of small test sets: when it's used to evaluate a sufficiently-small test set, the score can be far into the negatives despite good predictions.

Given a single sample, a good prediction for a given domain may appear terrible:

from sklearn.metrics import r2_score true = [1] predicted = [1.01] # prediction of a single value, off by 1% print(r2_score(true, predicted)) # 0.0

Increase the size of the test set (keeping the accuracy of predictions the same), and suddenly the r^2 score appears near-perfect:

true = [1, 2, 3] predicted = [1.01, 2.02, 3.03] print(r2_score(true, predicted)) # 0.9993

Taken to the other extreme, if the test size is 2 samples, and we happen to be evaluating 2 samples that are close to each other by chance, this will have substantial impact on the r^2 score, even if the predictions are quite good:

true = [20.2, 20.1] # actual target values from the Boston Housing dataset predicted = [19, 21] print(r2_score(true, predicted)) # -449.0

r^2 score is undefined when applied to a single sample (e.g. leave-one-out CV).

r^2 is not good for evaluation of small test sets: when it's used to evaluate a sufficiently-small test set, the score can be far into the negatives despite good predictions.

Given a single sample, a good prediction for a given domain may appear terrible:

from sklearn.metrics import r2_score true = [1] predicted = [1.01] # prediction of a single value, off by 1% print(r2_score(true, predicted)) # 0.0

Increase the size of the test set (keeping the accuracy of predictions the same), and suddenly the r^2 score appears near-perfect:

true = [1, 2, 3] predicted = [1.01, 2.02, 3.03] print(r2_score(true, predicted)) # 0.9993

r^2 score is undefined when applied to a single sample (e.g. leave-one-out CV).

r^2 is not good for evaluation of small test sets: when it's used to evaluate a sufficiently-small test set, the score can be far into the negatives despite good predictions.

Given a single sample, a good prediction for a given domain may appear terrible:

from sklearn.metrics import r2_score true = [1] predicted = [1.01] # prediction of a single value, off by 1% print(r2_score(true, predicted)) # 0.0

Increase the size of the test set (keeping the accuracy of predictions the same), and suddenly the r^2 score appears near-perfect:

true = [1, 2, 3] predicted = [1.01, 2.02, 3.03] print(r2_score(true, predicted)) # 0.9993

Taken to the other extreme, if the test size is 2 samples, and we happen to be evaluating 2 samples that are close to each other by chance, this will have substantial impact on the r^2 score, even if the predictions are quite good:

true = [20.2, 20.1] # actual target values from the Boston Housing dataset predicted = [19, 21] print(r2_score(true, predicted)) # -449.0

grammar

Source Link

edited Oct 8, 2016 at 20:24

Brian Bien

632
4
22

r^2 score is undefined when applied to a single sample (e.g. leave-one-out CV).

r^2 is not good for evaluation of small test sets: when it's used to evaluate a sufficiently-small test set, the score can be far into the negatives despite good predictions.

Given a single, sample, a good prediction for a given domain may appear terrible:

from sklearn.metrics import r2_score true = [1] predicted = [1.01] # prediction of a single value, off by 1% print(r2_score(true, predicted)) # 0.0

Increase the size of the test set (keeping the accuracy of predictions the same), and suddenly the r^2 score appears near-perfect:

true = [1, 2, 3] predicted = [1.01, 2.02, 3.03] print(r2_score(true, predicted)) # 0.9993

r^2 score is undefined when applied to a single sample (e.g. leave-one-out CV).

r^2 is not good for evaluation of small test sets: when it's used to evaluate a sufficiently-small test set, the score can be far into the negatives despite good predictions.

Given a single, sample, a good prediction for a given domain may appear terrible:

from sklearn.metrics import r2_score true = [1] predicted = [1.01] # prediction of a single value, off by 1% print(r2_score(true, predicted)) # 0.0

Increase the size of the test set (keeping the accuracy of predictions the same), and suddenly the r^2 score appears near-perfect:

true = [1, 2, 3] predicted = [1.01, 2.02, 3.03] print(r2_score(true, predicted)) # 0.9993

r^2 score is undefined when applied to a single sample (e.g. leave-one-out CV).

r^2 is not good for evaluation of small test sets: when it's used to evaluate a sufficiently-small test set, the score can be far into the negatives despite good predictions.

Given a single sample, a good prediction for a given domain may appear terrible:

from sklearn.metrics import r2_score true = [1] predicted = [1.01] # prediction of a single value, off by 1% print(r2_score(true, predicted)) # 0.0

Increase the size of the test set (keeping the accuracy of predictions the same), and suddenly the r^2 score appears near-perfect:

true = [1, 2, 3] predicted = [1.01, 2.02, 3.03] print(r2_score(true, predicted)) # 0.9993

Source Link

answered Oct 8, 2016 at 19:52

Brian Bien

632
4
22

r^2 score is undefined when applied to a single sample (e.g. leave-one-out CV).

r^2 is not good for evaluation of small test sets: when it's used to evaluate a sufficiently-small test set, the score can be far into the negatives despite good predictions.

Given a single, sample, a good prediction for a given domain may appear terrible:

from sklearn.metrics import r2_score true = [1] predicted = [1.01] # prediction of a single value, off by 1% print(r2_score(true, predicted)) # 0.0

Increase the size of the test set (keeping the accuracy of predictions the same), and suddenly the r^2 score appears near-perfect:

true = [1, 2, 3] predicted = [1.01, 2.02, 3.03] print(r2_score(true, predicted)) # 0.9993

Stack Exchange Network

Return to Answer