Poor Python Multiprocessing Performance

Question

I attempted to speed up my python program using the multiprocessing module but I found it was quite slow. A Toy example is as follows:

import time from multiprocessing import Pool, Manager class A: def __init__(self, i): self.i = i def score(self, x): return self.i - x class B: def __init__(self): self.i_list = list(range(1000)) self.A_list = [] def run_1(self): for i in self.i_list: self.x = i map(self.compute, self.A_list) #map version self.A_list.append(A(i)) def run_2(self): p = Pool() for i in self.i_list: self.x = i p.map(self.compute, self.A_list) #multicore version self.A_list.append(A(i)) def compute(self, some_A): return some_A.score(self.x) if __name__ == "__main__": st = time.time() foo = B() foo.run_1() print("Map: ", time.time()-st) st = time.time() foo = B() foo.run_2() print("MultiCore: ", time.time()-st)

The outcomes on my computer(Windows 10, Python 3.5) is

Map: 0.0009996891021728516

MultiCore: 19.34994912147522

Similar results can be observed on Linux Machine (CentOS 7, Python 3.6).

I guess it was caused by the pickling/depicking of objects among processes? I tried to use the Manager module but failed to get it to work.

Any help will be appreciated.

It's not the map that is slow, it seems to be the self.A_list.append(A(i)) Also you seem to use map incorrectly. It returns a value and you are not using it at all. Do you know what map is doing? — Joe
– Joe, Commented Jul 19, 2018 at 9:58
Thanks for commenting. Both map and append are fast as can be seen from the first timing result. On the contrary, p.map is slow(the multiprocessing version). I did not use the return value of map because this is just an example to show the poor performance of p.map and I did not need the return value. — 杨梓东
– 杨梓东, Commented Jul 19, 2018 at 10:30

Pixou · Accepted Answer · 2018-07-19 09:58:36Z

Wow that's impressive (and slow!).

Yes, this is because Objects must be accessed concurrently by workers, which is costly.

So I played a little bit and managed to gain a lot of perf by making the compute method static. So basically, you don't need to share the B object instance anymore. Still very slow but better.

import time from multiprocessing import Pool, Manager class A: def __init__(self, i): self.i = i def score(self, x): return self.i - x x=0 def static_compute(some_A): res= some_A.score(x) return res class B: def __init__(self): self.i_list = list(range(1000)) self.A_list = [] def run_1(self): for i in self.i_list: x=i map(self.compute, self.A_list) #map version self.A_list.append(A(i)) def run_2(self): p = Pool(4) for i in self.i_list: x=i p.map(static_compute, self.A_list) #multicore version self.A_list.append(A(i))

The other reason that makes it slow, to me, is the fixed cost of using Pool. You're actually launching a Pool.map 1000 times. If there is a fixed cost associated with launching those processes, that would make the overall strategy slow. Maybe you should test that with longer A_list (longer than the i_list, which requires a different algo).

Thanks. The static function is much faster. I think, though not compelling in this example, it will be helpful in my real application.

Michael Smith · Accepted Answer · 2018-08-08 21:22:13Z

The reasoning behind this is:

the map call is performed by main

*meaning when foo.run_1() is called. The main is mapping for itself. much like telling your self what to do.

*when foo_run2() is called the main is mapping for max process capablilites of that pc. If your max process is 6, then the main is mapping for 6 Threads. much like orginizing 6 people to tell you something.

Side Note: if you use:

p.imap(self.compute,self.A_list)

the items will append in order to A_list

Collectives™ on Stack Overflow

Poor Python Multiprocessing Performance

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related