0

The following code is my first try to fail fast when in the hyperparameter function an Exception is thrown.

Unfortunately, the whole data is processed first, before the caller receives the exception.

What can I do that the whole process is terminated immediately, if in the called function an error occurs (so that I can correct my coding bug etc. faster and do not have to wait until all different parameter combinations have be processed / optimized)?

The code:

from sklearn.model_selection import ParameterGrid from multiprocessing import Pool from enum import Enum var1 = 'var1' var2 = 'var2' abc = [1, 2] xyz = list(range(1_00_000)) pg = [{'variant': [var1], 'abc': abc, 'xyz': xyz, }, {'variant': [var2], 'abc': abc, }] parameterGrid = ParameterGrid(pg) myTemp = list(parameterGrid) print('len(parameterGrid):', len(parameterGrid)) def myFunc(myParam): if myParam['abc'] == 1: raise ValueError('error thrown') print(myParam) pool = Pool(1) myList = pool.map(myFunc, parameterGrid) 

Which results in:

len(parameterGrid): 200002 {'abc': 2, 'variant': 'var1', 'xyz': 2} {'abc': 2, 'variant': 'var1', 'xyz': 3} {'abc': 2, 'variant': 'var1', 'xyz': 4} {'abc': 2, 'variant': 'var1', 'xyz': 5} {'abc': 2, 'variant': 'var1', 'xyz': 6} . . . {'abc': 2, 'variant': 'var1', 'xyz': 99992} {'abc': 2, 'variant': 'var1', 'xyz': 99993} {'abc': 2, 'variant': 'var1', 'xyz': 99994} {'abc': 2, 'variant': 'var1', 'xyz': 99995} {'abc': 2, 'variant': 'var1', 'xyz': 99996} {'abc': 2, 'variant': 'var1', 'xyz': 99997} {'abc': 2, 'variant': 'var1', 'xyz': 99998} {'abc': 2, 'variant': 'var1', 'xyz': 99999} ValueError: error thrown 
1

2 Answers 2

1

As I can see not whole data is processed. Only for case 'abc' = 2 it passes. As soon as myFunc gets a params with 'abc' = 2, it throws an Exception. Looks right, is not it? You can check all your parameterGrid before running map. It leaves only values that are valid/suitable for you

myTemp_2 = filter(lambda x: x['abc'] != 1, myTemp) 

It leaves only values suitable for you

Sign up to request clarification or add additional context in comments.

Comments

1

To terminate the whole Pool of processes emergently (hope that you need such condition for test purpose):

... def myFunc(myParam): if myParam['abc'] == 1: print('error occurred') pool.terminate() # accessed globally print(myParam) if __name__ == '__main__': pool = Pool(1) myList = pool.map(myFunc, parameterGrid) 

https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.terminate

5 Comments

I'm probably missing something, but # accessed globally: how? With start method spawn no pool is created in the worker process. With fork and forkserver there is an incompletely initialized pool that is not assigned to the name pool (or any other name) in the workers.
@shmee, read about what happens with child processes on condition if __name__ == '__main__': On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource. But that's more for Unix (actually I'm not considering Windows)
@shmee, here's a similar topic stackoverflow.com/a/36962624/3185459
I'm not sure your example applies. The workers are forked during the initialization of the pool. The assignment of the pool object to the name pool happens after the pool's __init__ method completes, when the workers are already alive. The Event objects in the first code example of the answer you linked are available in the workers because they were created before the instantiation of the pool. If you move their instantiation to after that of the pool, using them in the function raises a NameError, just as using pool does here.
In the second code example of that answer, the function that calls terminate on the pool is passed as callback. That executes in the main process, hence the pool is fully initialized and assigned to the respective name in that case.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.