I think there is a more straightforward answer. If you increase k, the test sets get smaller and smaller. Since the folds are randomly sampled, it can happen with small test sets, that one contains all the difficult to predict records and another all the easy ones. Therefore, variance is high when you predict very small test sets per fold.
David Ernst
- 3.3k
- 12
- 16