2

I have a loop that is checking for the first instance of a value that evaluates to True:

for i in l: if fun(i): return i return None 

I'd like to speed this up using a multiprocessing pool, but my understanding is that pooling would just run fun over every element in l. Is there a way to do a sort of "short-circuit" when pooling?

2
  • Will stuff break if you run fun on elements after the first true output? If later evaluations have to wait for earlier ones to finish like that, it's going to put a pretty big damper on how much you can benefit from multiprocessing. Commented Apr 3, 2018 at 0:03
  • @user2357112 no, nothing will break, but there's no point in evaluating everything since I'm only searching for the first instance that evaluates to True. Commented Apr 3, 2018 at 0:04

1 Answer 1

5

That's what the terminate method is for, but you have to be careful how you use it. It will kill the worker processes, but surprisingly it won't stop you from blocking forever on a waiting call. So you can only use it if you do apply_async or imap_unordered calls. Closing from another thread typically causes your calls into the pool to hang. In this example I set chunksize to 1 which is the preferred value if a single work item has a significant amount of processing. You can set chunksize to something greater if work item cost is low and you don't mind processing more items before you are done. But don't use the default... must items will be processed before anything makes it back to you.

import multiprocessing def worker(item): print(item) return item if __name__ == "__main__": with multiprocessing.Pool(4) as pool: for i in pool.imap_unordered(worker, range(100), chunksize=1): if i == 10: print('terminate') pool.terminate() break print('done') 
Sign up to request clarification or add additional context in comments.

4 Comments

When would I use apply_async vs imap_unordered? It's not obvious to me fro the documentation.
@Teknophilia - Pool.apply_async() gives you more control (you get the process itself, not just its potential return value) but in this case it's an overkill which is why I deleted my answer.
imap_unordered is a higher level interface - think of it as a wrapper on top of apply_async (well... close enough). Suppose you have 1 million items. You have to call apply_async on each of them and you have to keep each result object and then wait on that object to get a result. imap_unordered takes a list or generator and keeps track of the individual calls down to the the pool for you. And its the only one of the map calls that passes results back immediately as they are available. The others wait.
@Teknophilia - I haven't tested with apply_async but I am concerned about what happens to a thread waiting on its AsyncResult object when the pool is terminated. Does the pool clean this object up? Seeing how dismally fragile it is in other cases (for instance, continuing to iterate on imap_unordered after termination just hangs), I wouldn't hold out hope!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.