1

I want to be able to save GridSearchCV output to file while running.

GridSearchCV(XGBClassifier(), tuned_parameters, cv=cv, n_jobs=-1, verbose=10) 

This is an example for an output:

 Fitting 1 folds for each of 200 candidates, totalling 200 fits [Parallel(n_jobs=-1)]: Using backend with 4 concurrent workers. [CV] colsample_bytree=0.7, learning_rate=0.05, max_depth=4, n_estimators=300, subsample=0.7 [CV] colsample_bytree=0.7, learning_rate=0.05, max_depth=4, n_estimators=300, subsample=0.7 score=0.645, total= 6.3min [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 6.3min 

I managed to save the first line and the Parallel lines, but no matter what I tried, I couldn't save the lines that start with [CV]. I want to save those lines so if the program will fail, I could at least see part of the results.

I tried the solutions from here

sys.stdout = open('file', 'w') 

and:

with open('help.txt', 'w') as f: with redirect_stdout(f): print('it now prints to `help.text`') 

This solution (that is also referring to this solution) also didn't work:

class Tee(object): def __init__(self, *files): self.files = files def write(self, obj): for f in self.files: f.write(obj) f.flush() # If you want the output to be visible immediately def flush(self) : for f in self.files: f.flush() 

And tried this monkey-patch as the author called it, but is also just saved the "Parallel" lines.

(Just to emphasize, the codes above are just a glimpse of the proposed solutions, when I tried them, I took all relevant code).

Is there a way to save ALL output?

1 Answer 1

0

I don't know if you can do this using sys library or others. Instead, I suggest the following approach where we redirect stdout and stderr properly.

Suppose you have a script like this:

test.py

import numpy as np from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression model = LogisticRegression() params = {"C": [0.001, 0.01, 0.1, 1, 2, 3]} grid = GridSearchCV(model, params, n_jobs=-1, verbose=10) X = np.random.randn(100, 10) y = np.random.randint(0, 2, 100) grid.fit(X, y) 

Then run it with:

python test.py > logfile.txt 2>&1 

Then you will have both "Parallel" and "CV" lines in logfile.txt:

Fitting 5 folds for each of 6 candidates, totalling 30 fits [Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers. [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 11 out of 30 | elapsed: 1.7s remaining: 2.9s [Parallel(n_jobs=-1)]: Done 15 out of 30 | elapsed: 1.7s remaining: 1.7s [Parallel(n_jobs=-1)]: Done 19 out of 30 | elapsed: 1.7s remaining: 1.0s [Parallel(n_jobs=-1)]: Done 23 out of 30 | elapsed: 1.7s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 27 out of 30 | elapsed: 1.7s remaining: 0.2s [Parallel(n_jobs=-1)]: Done 30 out of 30 | elapsed: 1.7s finished [CV] C=0.001 ......................................................... [CV] ............................. C=0.001, score=0.500, total= 0.0s [CV] C=0.1 ........................................................... [CV] ............................... C=0.1, score=0.450, total= 0.0s [CV] C=0.1 ........................................................... [CV] ............................... C=0.1, score=0.550, total= 0.0s [CV] C=1 ............................................................. [CV] ................................. C=1, score=0.550, total= 0.0s [CV] C=1 ............................................................. [CV] ................................. C=1, score=0.500, total= 0.0s [CV] C=2 ............................................................. ... 

Details

The "[CV]" lines are produced by print statement (Source). This is written to stdout.

And "Parallel" lines are produced by loggers (Source). This is written to stderr.

> logfile.txt 2>&1 is a trick to redirect both stdout and stderr to a same file (Related question). As a result, both messages are written to a same file.

Sign up to request clarification or add additional context in comments.

1 Comment

"python test.py > logfile.txt 2>&1" which is what I would have expected to work, unfortunately does not, at least on Mac OS (it's how I ended up here). I must say it's pretty bad behaviour for a library to break output redirection in this way.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.