In the code below, I am generating cube of a number 9999 and calling the same via thread pool and normal method.
I am timing the difference between the same. Seems like the normal method is way faster. I am running this on a i7 8th gen intel processor with 16 gig ram inside a python 2.7 terminal.
I am baffled by this. May be I am missing something. I hope this question is helpful for people in the future.
import time from multiprocessing.pool import ThreadPool def cube(): return 9999*9999*9999 print "Start Execution Threading: " x = int(round(time.time() * 1000)) pool = ThreadPool() for i in range(0,100): result = pool.apply_async(cube, ()) result = pool.apply_async(cube, ()) result = pool.apply_async(cube, ()) # print result.get() pool.close() pool.join() print "Stop Execution Threading: " y = int(round(time.time() * 1000)) print y-x print "Start Execution Main: " x = int(round(time.time() * 1000)) for i in range(0,100): cube() cube() cube() print "Stop Execution Main: " y = int(round(time.time() * 1000)) print y-x
%timeit 9999*9999*9999 19.3 ns ± 3.18 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each). This is not a job formultiprocessing. The overheads of just spawning a pool make the approach redundant. I'm not convinced this calculation would benefit from other cores in lower-level languages, but in python, you're physically having to spawn whole new python processes, copy all the namespace across to the new processes, and there's nothing here that they can work on collaboratively across cores.