Why ML model produces different results despite random_state defined? And how to set global random seed for sklearn

Question

I have been running few ML models on same set of data for a binary classification problem with class proportion of 33:67.

I had the same algorithms and same set of hyperparamters during yesterday and today's run.

Please note that I also have the parameter random_state in each estimator function as shown below

np.random.seed(42) svm=SVC() # i replace the estimator here for diff algos svm_cv=GridSearchCV(svm,op_param_grid,cv=10,scoring='f1') svm_cv.fit(X_train_std,y_train)

q1) Why does this change happens even when I have random_state configured?

q2) Is there anything else that I should do to reproduce the same results every time I run?

Please find below the results that are different? Here auc-Y denotes yesterday's run

desertnaut · Accepted Answer · 2021-04-08 12:51:41Z

Not every seed is the same.

Here is a definitive function that sets ALL of your seeds and you can expect complete reproducibility:

def seed_everything(seed=42): """" Seed everything. """ random.seed(seed) os.environ['PYTHONHASHSEED'] = str(seed) np.random.seed(seed) torch.manual_seed(seed) torch.cuda.manual_seed(seed) torch.cuda.manual_seed_all(seed) torch.backends.cudnn.deterministic = True

You have to import torch, numpy etc.

UPDATE: How to set global randomseed for sklearn models:

Given that sklearn does not have its own global random seed but uses the numpy random seed we can set it globally with the above :

np.random.seed(seed)

Here is a little experiment for scipy library, analogous would be sklearn (generating random numbers-usually weights):

import numpy as np from scipy.stats import norm print('Without seed') print(norm.rvs(100, size = 5)) print(norm.rvs(100, size = 5)) print('With the same seed') np.random.seed(42) print(norm.rvs(100, size = 5)) np.random.seed(42) # reset the random seed back to 42 print(norm.rvs(100, size = 5)) print('Without seed') np.random.seed(None) print(norm.rvs(100, size = 5)) print(norm.rvs(100, size = 5))

outputing and confirming

Without seed [100.27042599 100.9258397 100.20903163 99.88255017 99.29165699] [100.53127275 100.17750482 98.38604284 100.74109598 101.54287085] With the same seed **[101.36242188 101.13410818 102.36307449 99.74043318 98.83044407]** **[101.36242188 101.13410818 102.36307449 99.74043318 98.83044407]** Without seed [101.2933838 100.52176902 101.38602156 100.72865231 99.02271004] [100.19080241 99.11010957 99.51578106 101.56403284 100.37350788]

Hi, thanks for the response. Upvoted. So I shouldn't be using random_state? I only use scikit-learn and classic machine learning models like Linear and Logistic, SVM, RF,Boosting... No deep learning. — The Great
– The Great, Commented Jan 12, 2020 at 11:56
This s a general answer, but yes for your specific case sklearn suffices. Question is do you have any dependencies, I dont know your whole code — Noah Weber
– Noah Weber, Commented Jan 12, 2020 at 11:57
This solves the problem but leaves the question about why the random_state argument of scikit-learn models doesn't ensure repeatability unanswered. — Bill
– Bill, Commented Jun 22, 2021 at 19:46

Stack Exchange Network

Why ML model produces different results despite random_state defined? And how to set global random seed for sklearn

1 Answer 1

Linked

Hot Network Questions

Why ML model produces different results despite random_state defined? And how to set global random seed for sklearn

1 Answer 1

Linked

Related

Hot Network Questions