Python multiprocessing slower than single

Question

Sort.py

import random import time def time_analysis(func): def do_func(*args, **kwargs): print('[INFO] \'{}\' analysis started (N={}).'.format(func.__name__, len(args[0]))) start_time = time.clock() result = func(*args, **kwargs) end_time = time.clock() total_time = end_time - start_time print('[INFO] \'{}\' took {} seconds (N={}).'.format(func.__name__, total_time, len(args[0]))) return result return do_func @time_analysis def bubble_sort(num_list): num_len = len(num_list) for i in range(num_len - 1): for j in range(num_len - i - 1): if num_list[j] > num_list[j + 1]: num_list[j], num_list[j + 1] = num_list[j + 1], num_list[j] return num_list if __name__ == '__main__': N = 30000 random_list = list(range(N)) random.shuffle(random_list) bubble_sort(random_list) random_list = list(range(N)) random.shuffle(random_list) bubble_sort(random_list)

Parallel.py

from multiprocessing import Pool, cpu_count from Sort import * def bubble_sort_parallel(*args, **kwargs): return bubble_sort(*args, **kwargs) if __name__ == '__main__': N = 30000 random_list = list(range(N)) random.shuffle(random_list) pool.apply_async(bubble_sort_parallel, (random_list,)) random_list = list(range(N)) random.shuffle(random_list) pool.apply_async(bubble_sort_parallel, (random_list,)) pool.close() pool.join()

Single thread took only 2 seconds but Multiprocessing took 8 seconds.

N = 300,000. Single thread took only 200 seconds but Multiprocessing took 1400 seconds.

Why using multiprocessing is slower than single thread?

How could I improve the performance?

Platform: Linux, pypy2.7-5.10.0, 4 Cores on my computer

Multiprocessing: [Figure of multiprocessing][https://i.sstatic.net/QksXf.png]

Single thread: [Figure of single thread][https://i.sstatic.net/9HYw7.png]

When I run Sort.py on its own, I get TypeError: object of type 'NoneType' has no len(). Is that intended? How are you getting the timing of the single threaded approach? — Kevin
– Kevin, Commented May 4, 2018 at 14:54
Have you also considered the fact that the arrays you pass to the multiprocessing version vs the normal version are different? That may impact performance. Also, how many cores do you have on your machine? — Aneesh Durg
– Aneesh Durg, Commented May 4, 2018 at 14:56
I notice that random.shuffle returns None, so that may be impacting your runtime: you're not actually passing a list to your sorting functions. — Kevin
– Kevin, Commented May 4, 2018 at 14:57
I assume that you're using BubbleSort purely to have something to run while experimenting with multiprocessing. For actual sorting purposes, BubbleSort is only useful to demonstrate its inferiority to almost every other (sane) sorting algorithm. ;) — PM 2Ring
– PM 2Ring, Commented May 4, 2018 at 15:08

Jamborino · Accepted Answer · 2018-05-04 17:17:23Z

I hope this much was already clear to you: Pool.apply_async allows you to dispatch work to other processes in the pool; it does not automatically parallelize a single function. In other words, both versions are performing each sort in a single thread on a single core. The parallel version ought to dispatch the two sort calls to two cores, but you're measuring the runtime of each sort, not the whole program's execution, so you would not detect any savings via the overlap of the two sort invocations. (Additionally, at present, your code does not include the creation of the pool object, so I am just assuming that you used processes=N for N > 2--though, again, it wouldn't matter since you're not measuring the overall runtime but rather the runtime of each sort.)

(If not, see https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers and https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool)

This, however, does not explain why simply dispatching the work to another process results in a slower runtime. (It is worth noting that on my MacBook Pro, there is no difference in execution time between the simple version and the "parallel" version.) The reason for the slowness is the communication between the processes. You're asking it to transfer a big list through its IPC channel and it is apparently not efficient at doing so. You can demonstrate this for yourself: Move the list creation and shuffling into bubble_sort_parallel and make the argument to the supplied function in pool.apply_async the empty list. On my computer, this makes the runtime identical to the basic version.

Mikhail Berlinkov · Accepted Answer · 2018-05-04 15:43:35Z

I've tried with N=15000. On my computer it worked almost the same time, that is, non-parallel version sorted one array in 26 sec. and parallel version sorted two arrays in 28 sec. I set pool = Pool(2). Have you tried to increase N, maybe your results will be comparable for larger values of N in your environment.

p.s. You should also keep in mind that spawning processes also requires resources and there are some synchronisation tools involved.

Collectives™ on Stack Overflow

Python multiprocessing slower than single

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related