Error while implementing cross-validation

Question

I am trying to evaluate a model(MNIST) using cross-validation:

from sklearn.model_selection import StratifiedKFold from sklearn.base import clone skfolds = StratifiedKFold(n_splits=5, random_state=42)

while running 3rd line I get this warning:

C:\Users\nextg\Desktop\sample_project\env\lib\site-packages\sklearn\model_selection_split.py:293: FutureWarning: Setting a random_state has no effect since shuffle is False. This will raise an error in 0.24. You should leave random_state to its default (None), or set shuffle=True. warnings.warn(

Ignoring the warning I write this code

for train_index, test_index in skfolds.split(X_train, y_test_5): clone_clf = clone(sgd_clf) X_train_folds = X_train[train_index] y_train_folds = y_train[train_index] X_test_fold = X_test[test_index] y_test_fold = y_test_5[test_index] clone_clf.fit(X_train_folds, y_train_folds) y_pred = clone_clf.predict(X_test_fold) n_correct = sum(y_pred == y_test_fold) print(n_correct / len(y_pred))

After running this code the error is

ValueError Traceback (most recent call last) <ipython-input-66-7e786591c439> in <module> ----> 1 for train_index, test_index in skfolds.split(X_train, y_test_5): 2 clone_clf = clone(sgd_clf) 3 X_train_folds = X_train[train_index] 4 y_train_folds = y_train[train_index] 5 X_test_fold = X_test[test_index] ~\Desktop\sample_project\env\lib\site- packages\sklearn\model_selection\_split.py in split(self, X, y, groups) 326 The testing set indices for that split. 327 """ --> 328 X, y, groups = indexable(X, y, groups) 329 n_samples = _num_samples(X) 330 if self.n_splits > n_samples: ~\Desktop\sample_project\env\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables) 291 """ 292 result = [_make_indexable(X) for X in iterables] --> 293 check_consistent_length(*result) 294 return result 295 ~\Desktop\sample_project\env\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays) 254 uniques = np.unique(lengths) 255 if len(uniques) > 1: --> 256 raise ValueError("Found input variables with inconsistent numbers of" 257 " samples: %r" % [int(l) for l in lengths]) 258 ValueError: Found input variables with inconsistent numbers of samples: [60000, 10000]

Can somebody help to solve the error

Where exactly does the error pop up - in fit or in predict? Please update your question with the full trace. — desertnaut
– desertnaut, Commented Sep 8, 2020 at 13:20
Thank you to answer. The problem is in 3rd line of code before fit or predict. Model is already working. By this code I am trying to evaluate my model. while evaluating i got Future warning. — Rohit Kumar Singh
– Rohit Kumar Singh, Commented Sep 8, 2020 at 13:29
Please specify exactly in the question. I am talking about the error, not the warning (which is self-explainable). — desertnaut
– desertnaut, Commented Sep 8, 2020 at 13:30
I have updated full error. Should I write the whole MNIST model for better understanding the error. — Rohit Kumar Singh
– Rohit Kumar Singh, Commented Sep 8, 2020 at 13:54

ali ardakani · Accepted Answer · 2021-03-17 20:03:27Z

Its works:

from sklearn.model_selection import StratifiedKFold from sklearn.base import clone skfolds = StratifiedKFold(n_splits=3, random_state=42, shuffle=True) for train_index, test_index in skfolds.split(X_train, y_train_5): clone_clf = clone(sgd_clf) X_train_folds = X_train.values[train_index] y_train_folds = y_train_5[train_index] X_test_fold = X_train.values[test_index] y_test_fold = y_train_5[test_index] clone_clf.fit(X_train_folds, y_train_folds) y_pred = clone_clf.predict(X_test_fold) n_correct = sum(y_pred == y_test_fold) print(n_correct / len(y_pred))

qmeeus · Accepted Answer · 2020-09-08 14:12:52Z

This expression does not make sense: skfolds.split(X_train, y_test_5).

It should be skfolds.split(X, y) with X.shape[0] == y.shape[0]

From the doc:

for train_index, test_index in skf.split(X, y): print("TRAIN:", train_index, "TEST:", test_index) X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index]

desertnaut · Accepted Answer · 2020-09-08 17:11:30Z

It should be skfolds.split(X_train, y_train_5) not skfolds.split(X_train, y_test_5) and in 2nd line of for loop its y_test_fold = y_train_5[test_index] not y_train_folds = y_train[train_index]

The whole problem begun because of tab key.

alvaromiguelo · Accepted Answer · 2022-09-01 15:59:48Z

Looking to your code and assuming that you split (by yourself the data), you are looping into X_train and y_test_5.

And for the error that you obtained you have have 6000 samples for the train and 1000 samples for the test. That's why have the error (different matrix sizes(shapes)).

Note: Never fit the test data to Classifier, this will overfit the data (the ML algorythm will already know the images, and it will make the algorythm useless).

Hope it helps!!!

Collectives™ on Stack Overflow

Error while implementing cross-validation

4 Answers 4

Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Related