Python's multi-threaded code is slower than single-threaded

Question

I've first observed this issue in a production code, then made a prototype:

import threading, Queue, time, sys def heavyfunc(): ''' The idea is just to load CPU ''' sm = 0 for i in range(5000): for j in range(5000): if i + j % 2 == 0: sm += i - j print "sm = %d" % sm def worker(queue): ''' worker thread ''' while True: elem = queue.get() if elem == None: break heavyfunc() # whatever the elem is starttime = time.time() q = Queue.Queue() # queue with tasks number_of_threads = 1 # create & start number_of_threads working threads threads = [threading.Thread(target=worker, args=[q]) for thread_idx in range(number_of_threads)] for t in threads: t.start() # add 2 working items: they are estimated to be computed in parallel for x in range(2): q.put(1) for t in threads: q.put(None) # Add 2 'None' => each worker will exit when gets them for t in threads: t.join() # Wait for every worker #heavyfunc() elapsed = time.time() - starttime print >> sys.stderr, elapsed

The idea of heavyfunc() is just to load CPU, without any synchronization and dependencies.

When using 1 thread, it takes 4.14 sec in average When using 2 threads, it takes 6.40 sec in average When not using any threads, to compute heavyfunc() takes 2.07 sec in average (measured many times, that's exactly 4.14 / 2, as in case with 1 thread and 2 tasks).

I'm expecting 2 jobs with heavyfunc() to take 2.07 sec, provided there are 2 threads. (My CPU is i7 => there are enough cores).

Here is the CPU monitor's screenshots that also give the idea there were no true multithreading:

CPU Load Graphi

Where is the error in my thinking? How do I create n threads that don't interfere?

As the pretty large note at the start of the documentation of the threading module says.. only one thread in CPython, use processes. — Voo
– Voo, Commented Jun 28, 2012 at 15:55
This is well-known Python behaviour, see Understandind the Python GIL. — Sven Marnach
– Sven Marnach, Commented Jun 28, 2012 at 15:55
@Voo: your restatement of the restriction isn't quite right. You can have many threads, but only one can execute Python bytecode or manipulate Python objects at a time. — Ned Batchelder
– Ned Batchelder, Commented Jun 28, 2012 at 15:57
@Ned Yes quite the informal description, still I'm sure nobody would misunderstand it, especially since I refer to the actual documentation that mentions the GIL, yada yada in detail. It's a comment after all, no whole answer. — Voo
– Voo, Commented Jun 28, 2012 at 16:01

Ned Batchelder · Accepted Answer · 2012-06-28 15:56:10Z

CPython will not execute bytecode on more than one core at once. Multi-threading cpu-bound code is pointless. The Global Interpreter Lock (GIL) is there to protect all of the reference counts in the process, so only one thread can use Python objects at a time.

You are seeing worse performance because you still only have one thread at a time working, but now you are also changing thread contexts.

In a case like this maybe it could be useful to switch from threading to multiprocessing. As I see it, Python seems to make good use of threading for tasks that are not CPU intensive, but that may be slowing down your code for other reasons (usually I/O, like witing for a web request). In cases where we have CPU intensive tasks that push your machine to its limits, multiprocessing may open another door that allows you to get more stuff done at the same time.

Collectives™ on Stack Overflow

Python's multi-threaded code is slower than single-threaded

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related