Consider the following program:
import Queue from multiprocessing import Queue as Q from multiprocessing import Process, Value, Array from time import time from numpy import sum, zeros from numpy.random import normal def enche(queue, cnts, tims, t0, proc): while True: try: queue.get(False) x = sum(normal(size = 1000)) cnts[proc] = cnts[proc] + 1 except Queue.Empty: tims[proc] = time() - t0.value break def main(m, nth): queue = Q() tmp = [queue.put(0) for i in xrange(m)] cnts = Array('i', zeros(nth, dtype = "int")) tims = Array('d', zeros(nth)) t0 = Value('d', time()) procs = [ Process(target = enche , args = (queue,cnts, tims, t0,i)) for i in xrange(nth) ] for p in procs: p.start() for p in procs: p.join() for k in xrange(nth): print "%2.0f/100, %4.1f ms,"%(cnts[k]/float(m) * 100, tims[k]*1000), print "" if __name__ == "__main__": for k in xrange(100): main(100, 4) In essence, the function main(m,nth) should do a task - sum a thousand normal random numbers m times - and divide this task in nth processes.
I then count and print what percentage of the task was done by each process and how long each process lasted. This the typical results I get:
4/100, 1.6 ms, 3/100, 2.2 ms, 88/100, 10.9 ms, 5/100, 2.7 ms, 23/100, 3.6 ms, 1/100, 1.5 ms, 3/100, 2.0 ms, 73/100, 14.9 ms, 8/100, 3.9 ms, 1/100, 3.4 ms, 8/100, 5.3 ms, 83/100, 21.0 ms, 6/100, 4.1 ms, 0/100, 3.2 ms, 76/100, 19.0 ms, 18/100, 8.8 ms, 5/100, 1.7 ms, 90/100, 9.2 ms, 5/100, 2.4 ms, 0/100, 2.6 ms, 2/100, 1.5 ms, 8/100, 2.4 ms, 89/100, 8.9 ms, 1/100, 2.6 ms, 8/100, 2.3 ms, 11/100, 2.7 ms, 3/100, 2.4 ms, 78/100, 8.5 ms, 52/100, 5.8 ms, 11/100, 3.2 ms, 4/100, 2.6 ms, 33/100, 5.9 ms, 6/100, 2.3 ms, 8/100, 2.9 ms, 24/100, 4.3 ms, 62/100, 7.7 ms, 13/100, 3.0 ms, 11/100, 2.6 ms, 1/100, 2.5 ms, 75/100, 8.5 ms, 6/100, 2.1 ms, 3/100, 2.1 ms, 6/100, 2.7 ms, 85/100, 9.0 ms, 8/100, 2.3 ms, 3/100, 2.0 ms, 27/100, 4.4 ms, 62/100, 7.6 ms, 18/100, 3.2 ms, 0/100, 1.9 ms, 6/100, 3.1 ms, 76/100, 9.2 ms, 3/100, 1.9 ms, 19/100, 3.4 ms, 30/100, 6.0 ms, 48/100, 7.9 ms, 10/100, 2.4 ms, 0/100, 1.5 ms, 8/100, 2.8 ms, 82/100, 8.8 ms, 6/100, 1.8 ms, 36/100, 4.7 ms, 32/100, 4.7 ms, 26/100, 4.7 ms, 8/100, 2.0 ms, 46/100, 5.9 ms, 46/100, 5.9 ms, 0/100, 2.6 ms, 3/100, 1.6 ms, 54/100, 6.2 ms, 1/100, 2.4 ms, 42/100, 6.1 ms, 12/100, 2.6 ms, 8/100, 2.4 ms, 34/100, 5.3 ms, 46/100, 6.3 ms, 1/100, 1.4 ms, 9/100, 2.3 ms, 1/100, 2.3 ms, 89/100, 9.2 ms, 13/100, 2.6 ms, 5/100, 2.2 ms, 3/100, 2.4 ms, 79/100, 8.7 ms, 11/100, 2.3 ms, 7/100, 2.6 ms, 8/100, 2.9 ms, 74/100, 8.8 ms, 8/100, 1.8 ms, 9/100, 2.7 ms, 0/100, 1.8 ms, 83/100, 9.3 ms, 13/100, 2.4 ms, 4/100, 2.0 ms, 3/100, 2.5 ms, 80/100, 9.9 ms, 10/100, 2.3 ms, 72/100, 8.0 ms, 6/100, 2.6 ms, 12/100, 4.1 ms, 27/100, 4.3 ms, 0/100, 1.5 ms, 5/100, 2.5 ms, 68/100, 8.4 ms, 3/100, 1.5 ms, 8/100, 2.3 ms, 86/100, 8.3 ms, 3/100, 2.7 ms, 7/100, 1.9 ms, 10/100, 2.5 ms, 5/100, 2.5 ms, 78/100, 8.2 ms, 14/100, 2.5 ms, 3/100, 2.0 ms, 5/100, 2.5 ms, 78/100, 8.2 ms, 53/100, 7.4 ms, 3/100, 1.9 ms, 1/100, 2.4 ms, 43/100, 7.5 ms, 3/100, 1.7 ms, 9/100, 2.5 ms, 47/100, 6.0 ms, 41/100, 6.0 ms, 3/100, 1.5 ms, 37/100, 4.7 ms, 33/100, 4.7 ms, 27/100, 4.7 ms, 7/100, 1.8 ms, 36/100, 4.5 ms, 32/100, 4.5 ms, 25/100, 4.5 ms, 3/100, 1.4 ms, 51/100, 5.6 ms, 46/100, 5.6 ms, 0/100, 2.3 ms, 5/100, 1.8 ms, 9/100, 2.5 ms, 46/100, 5.8 ms, 40/100, 5.9 ms, 9/100, 2.0 ms, 3/100, 1.8 ms, 5/100, 2.3 ms, 83/100, 8.8 ms, 7/100, 2.1 ms, 3/100, 2.1 ms, 8/100, 2.9 ms, 82/100, 8.6 ms, 11/100, 2.3 ms, 4/100, 2.0 ms, 3/100, 2.5 ms, 82/100, 9.3 ms, 19/100, 2.9 ms, 9/100, 2.6 ms, 10/100, 3.0 ms, 62/100, 7.3 ms, 18/100, 2.8 ms, 16/100, 2.9 ms, 2/100, 2.5 ms, 64/100, 7.9 ms, 2/100, 1.5 ms, 51/100, 5.8 ms, 47/100, 5.9 ms, 0/100, 2.4 ms, 26/100, 4.0 ms, 8/100, 2.3 ms, 11/100, 2.9 ms, 55/100, 7.6 ms, 8/100, 2.1 ms, 3/100, 1.9 ms, 79/100, 9.2 ms, 10/100, 3.8 ms, 14/100, 2.5 ms, 4/100, 2.1 ms, 45/100, 5.6 ms, 37/100, 5.6 ms, 11/100, 2.2 ms, 9/100, 2.8 ms, 4/100, 2.4 ms, 76/100, 8.9 ms, 3/100, 1.4 ms, 4/100, 2.1 ms, 5/100, 2.4 ms, 88/100, 9.5 ms, 3/100, 1.4 ms, 7/100, 2.4 ms, 49/100, 5.9 ms, 41/100, 5.9 ms, 3/100, 1.4 ms, 3/100, 1.8 ms, 92/100, 9.3 ms, 2/100, 2.6 ms, 9/100, 2.3 ms, 16/100, 3.1 ms, 41/100, 5.5 ms, 34/100, 5.5 ms, 5/100, 1.7 ms, 8/100, 2.3 ms, 47/100, 5.8 ms, 40/100, 5.7 ms, 6/100, 1.7 ms, 5/100, 2.2 ms, 5/100, 2.4 ms, 84/100, 8.6 ms, 7/100, 2.0 ms, 6/100, 2.2 ms, 47/100, 5.9 ms, 40/100, 5.8 ms, 5/100, 1.9 ms, 32/100, 5.1 ms, 6/100, 2.7 ms, 57/100, 7.7 ms, 11/100, 2.4 ms, 14/100, 2.7 ms, 42/100, 5.3 ms, 33/100, 5.3 ms, 50/100, 6.6 ms, 12/100, 2.8 ms, 0/100, 1.9 ms, 38/100, 6.6 ms, 55/100, 7.0 ms, 1/100, 1.7 ms, 2/100, 2.2 ms, 42/100, 6.9 ms, 8/100, 1.9 ms, 48/100, 6.7 ms, 5/100, 2.4 ms, 39/100, 6.7 ms, 12/100, 2.3 ms, 18/100, 3.0 ms, 30/100, 5.6 ms, 40/100, 6.8 ms, 5/100, 1.7 ms, 3/100, 1.9 ms, 5/100, 2.5 ms, 87/100, 8.8 ms, 50/100, 6.7 ms, 13/100, 2.7 ms, 0/100, 2.0 ms, 37/100, 6.7 ms, Why do I get such a poor division of labor? Why sometimes a process last for short times and do nothing and others do almost 90% of the job?
What should I do to improve this?
edit: extra info - I run this in a i7 CPU.