redirect GridSearchCV (or any other Sklearn object) output to file

Question

I want to be able to save GridSearchCV output to file while running.

GridSearchCV(XGBClassifier(), tuned_parameters, cv=cv, n_jobs=-1, verbose=10)

This is an example for an output:

 Fitting 1 folds for each of 200 candidates, totalling 200 fits [Parallel(n_jobs=-1)]: Using backend with 4 concurrent workers. [CV] colsample_bytree=0.7, learning_rate=0.05, max_depth=4, n_estimators=300, subsample=0.7 [CV] colsample_bytree=0.7, learning_rate=0.05, max_depth=4, n_estimators=300, subsample=0.7 score=0.645, total= 6.3min [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 6.3min

I managed to save the first line and the Parallel lines, but no matter what I tried, I couldn't save the lines that start with [CV]. I want to save those lines so if the program will fail, I could at least see part of the results.

I tried the solutions from here

sys.stdout = open('file', 'w')

and:

with open('help.txt', 'w') as f: with redirect_stdout(f): print('it now prints to `help.text`')

This solution (that is also referring to this solution) also didn't work:

class Tee(object): def __init__(self, *files): self.files = files def write(self, obj): for f in self.files: f.write(obj) f.flush() # If you want the output to be visible immediately def flush(self) : for f in self.files: f.flush()

And tried this monkey-patch as the author called it, but is also just saved the "Parallel" lines.

(Just to emphasize, the codes above are just a glimpse of the proposed solutions, when I tried them, I took all relevant code).

Is there a way to save ALL output?

Kota Mori · Accepted Answer · 2020-10-22 13:24:16Z

I don't know if you can do this using sys library or others. Instead, I suggest the following approach where we redirect stdout and stderr properly.

Suppose you have a script like this:

test.py

import numpy as np from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression model = LogisticRegression() params = {"C": [0.001, 0.01, 0.1, 1, 2, 3]} grid = GridSearchCV(model, params, n_jobs=-1, verbose=10) X = np.random.randn(100, 10) y = np.random.randint(0, 2, 100) grid.fit(X, y)

Then run it with:

python test.py > logfile.txt 2>&1

Then you will have both "Parallel" and "CV" lines in logfile.txt:

Fitting 5 folds for each of 6 candidates, totalling 30 fits [Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers. [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 11 out of 30 | elapsed: 1.7s remaining: 2.9s [Parallel(n_jobs=-1)]: Done 15 out of 30 | elapsed: 1.7s remaining: 1.7s [Parallel(n_jobs=-1)]: Done 19 out of 30 | elapsed: 1.7s remaining: 1.0s [Parallel(n_jobs=-1)]: Done 23 out of 30 | elapsed: 1.7s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 27 out of 30 | elapsed: 1.7s remaining: 0.2s [Parallel(n_jobs=-1)]: Done 30 out of 30 | elapsed: 1.7s finished [CV] C=0.001 ......................................................... [CV] ............................. C=0.001, score=0.500, total= 0.0s [CV] C=0.1 ........................................................... [CV] ............................... C=0.1, score=0.450, total= 0.0s [CV] C=0.1 ........................................................... [CV] ............................... C=0.1, score=0.550, total= 0.0s [CV] C=1 ............................................................. [CV] ................................. C=1, score=0.550, total= 0.0s [CV] C=1 ............................................................. [CV] ................................. C=1, score=0.500, total= 0.0s [CV] C=2 ............................................................. ...

Details

The "[CV]" lines are produced by print statement (Source). This is written to stdout.

And "Parallel" lines are produced by loggers (Source). This is written to stderr.

> logfile.txt 2>&1 is a trick to redirect both stdout and stderr to a same file (Related question). As a result, both messages are written to a same file.

"python test.py > logfile.txt 2>&1" which is what I would have expected to work, unfortunately does not, at least on Mac OS (it's how I ended up here). I must say it's pretty bad behaviour for a library to break output redirection in this way.

Collectives™ on Stack Overflow

redirect GridSearchCV (or any other Sklearn object) output to file

1 Answer 1

Details

1 Comment

Linked

Hot Network Questions