Edit - Cross Validated

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Rev

6

$\begingroup$ +1 (long time ago), but re-reading your answer now, I am confused by the following bit. You say that 2-fold CV "often also has large variance because the training sets are only half the size". I understand that having a training set two times smaller is a problem, but why does it give "large variance"? Shouldn't it be "large bias" instead? Then the whole issue of choosing the number of folds becomes a bias-variance trade-off, which is how it is often presented. $\endgroup$

amoeba
– amoeba

2016-05-08 00:40:37 +00:00
Commented May 8, 2016 at 0:40
4

$\begingroup$ Was just looking into some literature. Interestingly, In Introduction to Statistical Learning James, Witten, Hastie & Tibshirani say LOOCV "is highly variable, since it is based upon a single observation (x1,y1)." and in Elements of Statistical Learning Hastie & Tibshirani & Friedman say that LOOCV "can have high variance because the N training sets are so similar to one another." $\endgroup$

user39663
– user39663

2016-09-05 05:44:01 +00:00
Commented Sep 5, 2016 at 5:44
5

$\begingroup$ this is incorrect. The variance should be $var[\Sigma x_i / n]$=$\Sigma\Sigma cov(x_i, x_j) / n^2$. You are right that the enumerator is larger, but the denominator gets larger as well. $\endgroup$

denizen of the north
– denizen of the north

2018-01-26 03:07:07 +00:00
Commented Jan 26, 2018 at 3:07
3

$\begingroup$ No, that's not really the "whole point". People use k-fold CV to get a single global estimate all the time. You can certainly try to use the multiple fold estimates in other ways, but putting them together is one of the most common ways to estimate holdout performance of a modeling technique. And that is precisely what Eq 7.48 of ESL is doing. $\endgroup$

Paul
– Paul

2018-07-23 19:03:55 +00:00
Commented Jul 23, 2018 at 19:03
3

$\begingroup$ @amoeba Let me write out the math here. Maybe I miss sth, but so far I am not convinced of a conclusion yet. Lets say we have K folds. On each fold, we have an estimate of MSE. So we have K estimates of MSE and out final MSE estimator is $\hat{MSE} = \Sigma_{i =1...K} MSE_i / K$. The argument is whether the variance of this estimator ($Var[\hat{MSE}]$) goes up as K goes up. The variance is $$ \begin{split} Var[\hat{MSE}] &= Var[\Sigma_{i =1...K} MSE_i / K] \\ &= \frac{1}{K^2}*( \Sigma_{i=1...K}Var[MSE_i] + \Sigma\Sigma_{i < j}cov[MSE_i, MSE_j] ) \end{split}$$. $\endgroup$

denizen of the north
– denizen of the north

2018-08-07 15:50:11 +00:00
Commented Aug 7, 2018 at 15:50

| Show 15 more comments

Correct minor typos or mistakes
Clarify meaning without changing it
Add related resources or links
Always respect the author’s intent
Don’t use edits to reply to the author

create code fences with backticks ` or tildes ~
```
like so
```
add language identifier to highlight code
```python
def function(foo):
print(foo)
```
put returns between paragraphs
for linebreak add 2 spaces at end
_italic_ or **bold**
indent code by 4 spaces
backtick escapes `like _so_`
quote by placing > at start of line
to make links (use https whenever possible)

<https://example.com>

[example](https://example.com)

<a href="https://example.com">example</a>
MathJax equations $\sin^2 \theta$

formatting help »
answering help »

MathJax help »