2

I have 2 simple functions(loops over a range) that can run separately without any dependency.. I'm trying to run this 2 functions both using the Python multiprocessing module as well as multithreading module..

When I compared the output, I see the multiprocess application takes 1 second more than the multi-threading module..

I read multi-threading is not that efficient because of the Global interpreter lock...

Based on the above statements -
1. Is is best to use the multiprocessing if there is no dependency between 2 processes?
2. How to calculate the number of processes/threads that I can run in my machine for maximum efficiency..
3. Also, is there a way to calculate the efficiency of the program by using multithreading...

Multithread module...

from multiprocessing import Process import thread import platform import os import time import threading class Thread1(threading.Thread): def __init__(self,threadindicator): threading.Thread.__init__(self) self.threadind = threadindicator def run(self): starttime = time.time() if self.threadind == 'A': process1() else: process2() endtime = time.time() print 'Thread 1 complete : Time Taken = ', endtime - starttime def process1(): starttime = time.time() for i in range(100000): for j in range(10000): pass endtime = time.time() def process2(): for i in range(1000): for j in range(1000): pass def main(): print 'Main Thread' starttime = time.time() thread1 = Thread1('A') thread2 = Thread1('B') thread1.start() thread2.start() threads = [] threads.append(thread1) threads.append(thread2) for t in threads: t.join() endtime = time.time() print 'Main Thread Complete , Total Time Taken = ', endtime - starttime if __name__ == '__main__': main() 

multiprocess module

from multiprocessing import Process import platform import os import time def process1(): # print 'process_1 processor =',platform.processor() starttime = time.time() for i in range(100000): for j in range(10000): pass endtime = time.time() print 'Process 1 complete : Time Taken = ', endtime - starttime def process2(): # print 'process_2 processor =',platform.processor() starttime = time.time() for i in range(1000): for j in range(1000): pass endtime = time.time() print 'Process 2 complete : Time Taken = ', endtime - starttime def main(): print 'Main Process start' starttime = time.time() processlist = [] p1 = Process(target=process1) p1.start() processlist.append(p1) p2 = Process(target = process2) p2.start() processlist.append(p2) for i in processlist: i.join() endtime = time.time() print 'Main Process Complete - Total time taken = ', endtime - starttime if __name__ == '__main__': main() 
1
  • As a side note: time.time() may have a precision as low as 1 second, and also may get confused by clock changes. So it's not an ideal way to measure performance, especially for code that only takes about a second. Commented Oct 12, 2013 at 1:04

1 Answer 1

7

If you have two CPUs available on your machine, you have two processes which don't have to communicate, and you want to use both of them to make your program faster, you should use the multiprocessing module, rather than the threading module.

The Global Interpreter Lock (GIL) prevents the Python interpreter from making efficient use of more than one CPU by using multiple threads, because only one thread can be executing Python bytecode at a time. Therefore, multithreading won't improve the overall runtime of your application unless you have calls that are blocking (e.g. waiting for IO) or that release the GIL (e.g. numpy will do this for some expensive calls) for extended periods of time. However, the multiprocessing library creates separate subprocesses, and therefore several copies of the interpreter, so it can make efficient use of multiple CPUs.

However, in the example you gave, you have one process that finishes very quickly (less than 0.1 seconds on my machine) and one process that takes around 18 seconds to finish on the other. The exact numbers may vary depending on your hardware. In that case, nearly all the work is happening in one process, so you're really only using one CPU regardless. In this case, the increased overhead of spawning processes vs threads is probably causing the process-based version to be slower.

If you make both processes do the 18 second nested loops, you should see that the multiprocessing code goes much faster (assuming your machine actually has more than one CPU). On my machine, I saw the multiprocessing code finish in around 18.5 seconds, and the multithreaded code finish in 71.5 seconds. I'm not sure why the multithreaded one took longer than around 36 seconds, but my guess is the GIL is causing some sort of thread contention issue which is slowing down both threads from executing.

As for your second question, assuming there's no other load on the system, you should use a number of processes equal to the number of CPUs on your system. You can discover this by doing lscpu on a Linux system, sysctl hw.ncpu on a Mac system, or running dxdiag from the Run dialog on Windows (there's probably other ways, but this is how I always do it).

For the third question, the simplest way to figure out how much efficiency you're getting from the extra processes is just to measure the total runtime of your program, using time.time() as you were, or the time utility in Linux (e.g. time python myprog.py). The ideal speedup should be equal to the number of processes you're using, so a 4 process program running on 4 CPUs should be at most 4x faster than the same program with 1 process, assuming you get maximum benefit from the extra processes. If the other processes aren't helping you that much, it will be less than 4x.

Sign up to request clarification or add additional context in comments.

3 Comments

"The Global Interpreter Lock (GIL) prevents the Python interpreter from making efficient use of more than one CPU by using multiple threads, because only one thread can be executing at a time. " This is false. Python's threads can execute at the same time. What they cannot do is to execute bytecode at the same time. If one thread does an expensive numpy call then other python threads will be executed concurrently since numpy releases the GIL. Similarly many C extension release the GIL during expensive operations.
Thanks for the clarification - I don't think this was clear in my mind. I've edited my answer to be more accurate on this point.
You can get the CPU count from inside Python, which is better than editing the code manually for each machine.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.