0

I am trying to evaluate a model(MNIST) using cross-validation:

from sklearn.model_selection import StratifiedKFold from sklearn.base import clone skfolds = StratifiedKFold(n_splits=5, random_state=42) 

while running 3rd line I get this warning:

C:\Users\nextg\Desktop\sample_project\env\lib\site-packages\sklearn\model_selection_split.py:293: FutureWarning: Setting a random_state has no effect since shuffle is False. This will raise an error in 0.24. You should leave random_state to its default (None), or set shuffle=True. warnings.warn(

Ignoring the warning I write this code

for train_index, test_index in skfolds.split(X_train, y_test_5): clone_clf = clone(sgd_clf) X_train_folds = X_train[train_index] y_train_folds = y_train[train_index] X_test_fold = X_test[test_index] y_test_fold = y_test_5[test_index] clone_clf.fit(X_train_folds, y_train_folds) y_pred = clone_clf.predict(X_test_fold) n_correct = sum(y_pred == y_test_fold) print(n_correct / len(y_pred)) 

After running this code the error is

ValueError Traceback (most recent call last) <ipython-input-66-7e786591c439> in <module> ----> 1 for train_index, test_index in skfolds.split(X_train, y_test_5): 2 clone_clf = clone(sgd_clf) 3 X_train_folds = X_train[train_index] 4 y_train_folds = y_train[train_index] 5 X_test_fold = X_test[test_index] ~\Desktop\sample_project\env\lib\site- packages\sklearn\model_selection\_split.py in split(self, X, y, groups) 326 The testing set indices for that split. 327 """ --> 328 X, y, groups = indexable(X, y, groups) 329 n_samples = _num_samples(X) 330 if self.n_splits > n_samples: ~\Desktop\sample_project\env\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables) 291 """ 292 result = [_make_indexable(X) for X in iterables] --> 293 check_consistent_length(*result) 294 return result 295 ~\Desktop\sample_project\env\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays) 254 uniques = np.unique(lengths) 255 if len(uniques) > 1: --> 256 raise ValueError("Found input variables with inconsistent numbers of" 257 " samples: %r" % [int(l) for l in lengths]) 258 ValueError: Found input variables with inconsistent numbers of samples: [60000, 10000] 

Can somebody help to solve the error

4
  • Where exactly does the error pop up - in fit or in predict? Please update your question with the full trace. Commented Sep 8, 2020 at 13:20
  • Thank you to answer. The problem is in 3rd line of code before fit or predict. Model is already working. By this code I am trying to evaluate my model. while evaluating i got Future warning. Commented Sep 8, 2020 at 13:29
  • Please specify exactly in the question. I am talking about the error, not the warning (which is self-explainable). Commented Sep 8, 2020 at 13:30
  • I have updated full error. Should I write the whole MNIST model for better understanding the error. Commented Sep 8, 2020 at 13:54

4 Answers 4

1

Its works:

from sklearn.model_selection import StratifiedKFold from sklearn.base import clone skfolds = StratifiedKFold(n_splits=3, random_state=42, shuffle=True) for train_index, test_index in skfolds.split(X_train, y_train_5): clone_clf = clone(sgd_clf) X_train_folds = X_train.values[train_index] y_train_folds = y_train_5[train_index] X_test_fold = X_train.values[test_index] y_test_fold = y_train_5[test_index] clone_clf.fit(X_train_folds, y_train_folds) y_pred = clone_clf.predict(X_test_fold) n_correct = sum(y_pred == y_test_fold) print(n_correct / len(y_pred)) 
Sign up to request clarification or add additional context in comments.

Comments

0

This expression does not make sense: skfolds.split(X_train, y_test_5).

It should be skfolds.split(X, y) with X.shape[0] == y.shape[0]

From the doc:

for train_index, test_index in skf.split(X, y): print("TRAIN:", train_index, "TEST:", test_index) X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] 

Comments

0

It should be skfolds.split(X_train, y_train_5) not skfolds.split(X_train, y_test_5) and in 2nd line of for loop its y_test_fold = y_train_5[test_index] not y_train_folds = y_train[train_index]

The whole problem begun because of tab key.

Comments

0

Looking to your code and assuming that you split (by yourself the data), you are looping into X_train and y_test_5.

And for the error that you obtained you have have 6000 samples for the train and 1000 samples for the test. That's why have the error (different matrix sizes(shapes)).

Note: Never fit the test data to Classifier, this will overfit the data (the ML algorythm will already know the images, and it will make the algorythm useless).

Hope it helps!!!

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.