I've first observed this issue in a production code, then made a prototype:
import threading, Queue, time, sys def heavyfunc(): ''' The idea is just to load CPU ''' sm = 0 for i in range(5000): for j in range(5000): if i + j % 2 == 0: sm += i - j print "sm = %d" % sm def worker(queue): ''' worker thread ''' while True: elem = queue.get() if elem == None: break heavyfunc() # whatever the elem is starttime = time.time() q = Queue.Queue() # queue with tasks number_of_threads = 1 # create & start number_of_threads working threads threads = [threading.Thread(target=worker, args=[q]) for thread_idx in range(number_of_threads)] for t in threads: t.start() # add 2 working items: they are estimated to be computed in parallel for x in range(2): q.put(1) for t in threads: q.put(None) # Add 2 'None' => each worker will exit when gets them for t in threads: t.join() # Wait for every worker #heavyfunc() elapsed = time.time() - starttime print >> sys.stderr, elapsed The idea of heavyfunc() is just to load CPU, without any synchronization and dependencies.
When using 1 thread, it takes 4.14 sec in average When using 2 threads, it takes 6.40 sec in average When not using any threads, to compute heavyfunc() takes 2.07 sec in average (measured many times, that's exactly 4.14 / 2, as in case with 1 thread and 2 tasks).
I'm expecting 2 jobs with heavyfunc() to take 2.07 sec, provided there are 2 threads. (My CPU is i7 => there are enough cores).
Here is the CPU monitor's screenshots that also give the idea there were no true multithreading:

Where is the error in my thinking? How do I create n threads that don't interfere?