What behaviour would you expect? This difference is caused simply by the fact that the number of samples cannot be evenly distributed over the number of folds you provided. You have 47 samples in your dataset and want to split this into 6 folds for cross validation. $47 / 6 = 7 \frac{5}{6}$, which would mean that the test dataset in each fold would contain $7 \frac{5}{6}$ samples, which is impossible since only complete samples can be included. As a result you will see that 5 out of 6 times the test set will contain 8 samples and 1 out of 6 times the test set will contain a single sample to get to an average of $7 \frac{5}{6}$ samples in your test set: $\frac{5}{6} * 8 + \frac{1}{6} * 7 = 7 \frac{5}{6}$. If you increase the number of samples in your dataset to a number divisible by 6 (e.g. 48), you will see that the number of samples in the test set will stay the same since dividing 48 by 6 will give a whole number instead of a decimal number.
from sklearn.model_selection import KFold import numpy as np data = np.arange(0,48, 1) kfold = KFold(6) for train, test in kfold.split(data): print("train size:",len(train), "test size:",len(test)) # train size: 40 test size: 8 # train size: 40 test size: 8 # train size: 40 test size: 8 # train size: 40 test size: 8 # train size: 40 test size: 8 # train size: 40 test size: 8