Why is this multiprocess program running slowly than its not concurrent version?

Question

I have made a program for adding a list by dividing them in subparts and using multiprocessing in Python. My code is the following:

from concurrent.futures import ProcessPoolExecutor, as_completed import random import time def dummyFun(l): s=0 for i in range(0,len(l)): s=s+l[i] return s def sumaSec(v): start=time.time() sT=0 for k in range(0,len(v),10): vc=v[k:k+10] print ("vector ",vc) for item in vc: sT=sT+item print ("sequential sum result ",sT) sT=0 start1=time.time() print ("sequential version time ",start1-start) def main(): workers=5 vector=random.sample(range(1,101),100) print (vector) sumaSec(vector) dim=10 sT=0 for k in range(0,len(vector),dim): vc=vector[k:k+dim] print (vc) for item in vc: sT=sT+item print ("sub list result ",sT) sT=0 chunks=(vector[k:k+dim] for k in range(0,len(vector),10)) start=time.time() with ProcessPoolExecutor(max_workers=workers) as executor: futures=[executor.submit(dummyFun,chunk) for chunk in chunks] for future in as_completed(futures): print (future.result()) start1=time.time() print (start1-start) if __name__=="__main__": main()

The problem is that for the sequential version I got a time of:

0.0009753704071044922

while for the concurrent version my time is:

0.10629010200500488

And when I reduce the number of workers to 2 my time is:

0.08622884750366211

Why is this happening?

Adam · Accepted Answer · 2019-11-16 23:45:26Z

The length of your vector is only 100. That is a very small amount of work, so the the fixed cost of starting the process pool is the most significant part of the runtime. For this reason parallelism is most beneficial when there is a lot of work to do. Try a larger vector, like a length of 1 million.

The second problem is that you have each worker do a tiny amount of work: a chunk of size 10. Again, that means the cost of starting a task cannot be amortized over so little work. Use larger chunks. For example, instead of 10 use int(len(vector)/(workers*10)).

Also note that you're creating 5 processes. For a CPU-bound task like this one you ideally want to use the same number of processes as you have physical CPU cores. Either use whatever number of cores your system has, or if you use max_workers=None (the default value) then ProcessPoolExecutor will default to that number for your system. If you use too few processes you're leaving performance on the table, if you use too many then the CPU will have to switch between them and your performance may suffer.

thank you for the reply, I have tried with a vector of a million, but still for the sequential part I got a time of 0.79 and for the concurrent part is 83.2, any advice?
@Little I just played a bit with your code and your timing reports the sequential is 0.06s and the parallel as 0.02s. The reason yours takes a thousand times more time is because of the print statements. Only print what you actually need to see (timings) and leave off the rest. Also see my edit about chunk size.
I doubt you wanted to use random.sample, so I instead created a vector like this: vector = [random.randint(1,101) for _ in range(1000000)].
could you post your actual code? I have deleted the prints instructions and still the gap is huge

Sintifo · Accepted Answer · 2019-11-17 00:06:21Z

Your chunking is pretty awful for creating multiple tasks. Creating too many tasks still incurs the time punishment even when your workers are already created.

Maybe this post can help you in your search: How to parallel sum a loop using multiprocessing in Python

Collectives™ on Stack Overflow

Why is this multiprocess program running slowly than its not concurrent version?

2 Answers 2

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Linked

Related