Parallelization using multiprocessing running processes sequentially instead of simultaneously

Question

I'm trying to parallelize a piece of code given below using the multiprocessing module. Everything I try leads to each child process being run one after the other even though they all have different PIDs. I have tried:

CentOS and MacOS
Context as spawn and as fork
Using Queues and using pools
Using Apply and Using map and their async versions
Adding/removing pool.join() and Process.join()

I can't figure out what I am doing wrong.

fs.py:

import numpy as np from time import sleep import os def f(r): res = np.arange(r[0], r[1]) print(f'I am {os.getpid()}') sleep(10) print(f'I am {os.getpid()} and I am finished') return {'nums': res, 'dubs': res * 2}

playground.py:

import multiprocessing as mp import numpy as np from fs import f if __name__ == '__main__': ctx = mp.get_context('spawn') p = ctx.Pool(4) with p: subsets = [[0, 3], [3, 6], [6, 7]] res = [p.apply(f, (subset, )) for subset in subsets] print(res) print('Done!')

Command: python playground.py

Output:

I am 29881 I am 29881 and I am finished I am 29882 I am 29882 and I am finished I am 29881 I am 29881 and I am finished [{'nums': array([0, 1, 2]), 'dubs': array([0, 2, 4])}, {'nums': array([3, 4, 5]), 'dubs': array([ 6, 8, 10])}, {'nums': array([6]), 'dubs': array([12])}] Done!

if you have simple function then process may finish it before other process start. — furas
– furas, Commented Oct 18, 2021 at 19:34
maybe you should use map() instead of apply() - apply() may wait for the end of process. — furas
– furas, Commented Oct 18, 2021 at 19:36
In the documentation (docs.python.org/3/library/…) it says apply blocks until result is ready. So you're spawning a new subprocess, let it print, then close it, then spawn the next one. If you have an iterable and a function, as suggested below 'Pool.map' is the way to go. If you want more control, 'Pool.apply_async' is the way to go. — AlexNe
– AlexNe, Commented Oct 18, 2021 at 19:48
It seems I had implemented map and apply_async wrong. Both seem to work now. — unkown
– unkown, Commented Oct 18, 2021 at 22:03

furas · Accepted Answer · 2021-10-18 19:54:36Z

When I use p.map() like this (on Linux Mint)

res = p.map(f, subsets)

then I get

I am 1337328 I am 1337325 I am 1337327 I am 1337328 and I am finished I am 1337325 and I am finished I am 1337327 and I am finished

Maybe you used map() in wrong way. res = [p.map(f, (subset, )) for subset in subsets]

if __name__ == '__main__': ctx = mp.get_context('spawn') with ctx.Pool(4) as p: subsets = [[0, 3], [3, 6], [6, 7]] res = p.map(f, subsets) print(res) print('Done!')

For apply_async you would need two for-loops

 items = [p.apply_async(f, (subset, )) for subset in subsets] res = [x.get() for x in items] print(res)

And both have to be inside with p:

if __name__ == '__main__': ctx = mp.get_context('spawn') with ctx.Pool(4) as p: subsets = [[0, 3], [3, 6], [6, 7]] items = [p.apply_async(f, (subset, )) for subset in subsets] print(items) res = [x.get() for x in items] print(res) print('Done!')

martineau · Accepted Answer · 2021-10-18 20:03:27Z

Each child process is being run one-after-the-other because Pool.apply() blocks until a result is ready — effectively preventing parallel processing from taking place.

Using Pool.map_async() instead would prevent that. Note I also made the delay in the f() function variable to make the processing times vary.

`playground.py`

import multiprocessing as mp import numpy as np from pprint import pprint from fs import f if __name__ == '__main__': ctx = mp.get_context('spawn') pool = ctx.Pool(4) with pool: subsets = [[0, 3], [3, 6], [6, 7]] res = pool.map_async(f, subsets).get(timeout=10) pprint(res, sort_dicts=False) print('Done!')

`fs.py`

import numpy as np import os import random from time import sleep def f(r): print(f'f({r}) called') res = np.arange(r[0], r[1]) print(f'I am {os.getpid()}') sleep(random.uniform(0, 2)) # Random time delay. print(f'I am {os.getpid()} and I am finished') return {'nums': res, 'dubs': res * 2}

Results:

f([0, 3]) called I am 2120 f([3, 6]) called I am 32208 f([6, 7]) called I am 13884 I am 2120 and I am finished I am 13884 and I am finished I am 32208 and I am finished [{'nums': array([0, 1, 2]), 'dubs': array([0, 2, 4])}, {'nums': array([3, 4, 5]), 'dubs': array([ 6, 8, 10])}, {'nums': array([6]), 'dubs': array([12])}]

Collectives™ on Stack Overflow

Parallelization using multiprocessing running processes sequentially instead of simultaneously

2 Answers 2

Comments

`playground.py`

`fs.py`

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

playground.py

fs.py

Comments

Related

`playground.py`

`fs.py`