Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

20
  • 6
    $\begingroup$ +1 (long time ago), but re-reading your answer now, I am confused by the following bit. You say that 2-fold CV "often also has large variance because the training sets are only half the size". I understand that having a training set two times smaller is a problem, but why does it give "large variance"? Shouldn't it be "large bias" instead? Then the whole issue of choosing the number of folds becomes a bias-variance trade-off, which is how it is often presented. $\endgroup$ Commented May 8, 2016 at 0:40
  • 4
    $\begingroup$ Was just looking into some literature. Interestingly, In Introduction to Statistical Learning James, Witten, Hastie & Tibshirani say LOOCV "is highly variable, since it is based upon a single observation (x1,y1)." and in Elements of Statistical Learning Hastie & Tibshirani & Friedman say that LOOCV "can have high variance because the N training sets are so similar to one another." $\endgroup$ Commented Sep 5, 2016 at 5:44
  • 5
    $\begingroup$ this is incorrect. The variance should be $var[\Sigma x_i / n]$=$\Sigma\Sigma cov(x_i, x_j) / n^2$. You are right that the enumerator is larger, but the denominator gets larger as well. $\endgroup$ Commented Jan 26, 2018 at 3:07
  • 3
    $\begingroup$ No, that's not really the "whole point". People use k-fold CV to get a single global estimate all the time. You can certainly try to use the multiple fold estimates in other ways, but putting them together is one of the most common ways to estimate holdout performance of a modeling technique. And that is precisely what Eq 7.48 of ESL is doing. $\endgroup$ Commented Jul 23, 2018 at 19:03
  • 3
    $\begingroup$ @amoeba Let me write out the math here. Maybe I miss sth, but so far I am not convinced of a conclusion yet. Lets say we have K folds. On each fold, we have an estimate of MSE. So we have K estimates of MSE and out final MSE estimator is $\hat{MSE} = \Sigma_{i =1...K} MSE_i / K$. The argument is whether the variance of this estimator ($Var[\hat{MSE}]$) goes up as K goes up. The variance is $$ \begin{split} Var[\hat{MSE}] &= Var[\Sigma_{i =1...K} MSE_i / K] \\ &= \frac{1}{K^2}*( \Sigma_{i=1...K}Var[MSE_i] + \Sigma\Sigma_{i < j}cov[MSE_i, MSE_j] ) \end{split}$$. $\endgroup$ Commented Aug 7, 2018 at 15:50