Python multiprocessing speed

Question

I wrote this bit of code to test out Python's multiprocessing on my computer:

from multiprocessing import Pool var = range(5000000) def test_func(i): return i+1 if __name__ == '__main__': p = Pool() var = p.map(test_func, var)

I timed this using Unix's time command and the results were:

real 0m2.914s user 0m4.705s sys 0m1.406s

Then, using the same var and test_func() I timed:

var = map(test_func, var)

and the results were

real 0m1.785s user 0m1.548s sys 0m0.214s

Shouldn't the multiprocessing code be much faster than plain old map?

Jaber Haisan · Accepted Answer · 2023-05-04 07:18:54Z

Why it should.

In map function, you are just calling the function sequentially.

Multiprocessing pool creates a set of workers to which your task will be mapped. It is coordinating multiple worker processes to run these functions.

Try doing some significant work inside your function and then time them and see if multiprocessing helps you to compute faster.

You have to understand that there will be overheads in using multiprocessing. Only when the computing effort is significantly greater than these overheads that you will see it's benefits.

See the last example in excellent introduction by Hellmann: https://doughellmann.com/posts/pymotw-3-multiprocessing-manage-processes-like-threads/

pool_size = multiprocessing.cpu_count() * 2 pool = multiprocessing.Pool(processes=pool_size, initializer=start_process, maxtasksperchild=2, ) pool_outputs = pool.map(do_calculation, inputs)

You create pools depending on cores that you have.

unode · Accepted Answer · 2014-03-21 12:00:55Z

There is an overhead on using parallelization. There is only benefit if each work unit takes long enough to compensate the overhead.

Also if you only have one CPU (or CPU thread) on your machine, there's no point in using parallelization at all. You'll only see gains if you have at least a hyperthreaded machine or at least two CPU cores.

In your case a simple addition operation doesn't compensate that overhead.

Try something a bit more costly such as:

from multiprocessing import Pool import math def test_func(i): j = 0 for x in xrange(1000000): j += math.atan2(i, i) return j if __name__ == '__main__': var = range(500) p = Pool() var = p.map(test_func, var)

Collectives™ on Stack Overflow

Python multiprocessing speed

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related