Skip to main content
4 of 4
added 1 character in body
David Ernst
  • 3.3k
  • 12
  • 16

I think there is a more straightforward answer. If you increase k, the test sets get smaller and smaller. Since the folds are randomly sampled, it can happen with small test sets, but not as likely with bigger ones, that they are not representative of a random shuffle. One test set could contain all the difficult to predict records and another all the easy ones. Therefore, variance is high when you predict very small test sets per fold.

David Ernst
  • 3.3k
  • 12
  • 16