No gain from multiple threads when using ThreadPoolExecutor

Question

I'm trying to simulate some processes in order to get some statistics. I decided to write simulation program using multiple threads as each test run is independant.

It means that if I need to perform e.g. 1000 test runs then it should be possible to have 4 threads (each doing 250 test runs).

While doing this I found that addition of multiple threads does not decrease simulation time.

I have Windows 10 laptop with 4 physical cores.

I wrote a simple program which shows behaviour I'm talking about.

from concurrent.futures import ThreadPoolExecutor import time import psutil import random def runScenario(): d = dict() for i in range(0, 10000): rval = random.random() if rval in d: d[rval] += 1 else: d[rval] = 1 return len(d) def runScenarioMultipleTimesSingleThread(taskId, numOfCycles): print('starting thread {}, numOfCycles is {}'.format(taskId, numOfCycles)) sum = 0 for i in range(numOfCycles): sum += runScenario() print('thread {} finished'.format(taskId)) return sum def modelAvg(numOfCycles, numThreads): pool = ThreadPoolExecutor(max_workers=numThreads) cyclesPerThread = int(numOfCycles / numThreads) numOfCycles = cyclesPerThread * numThreads futures = list() for i in range(numThreads): future = pool.submit(runScenarioMultipleTimesSingleThread, i, cyclesPerThread) futures.append(future) sum = 0 for future in futures: sum += future.result() return sum / numOfCycles def main(): p = psutil.Process() print('cpus:{}, affinity{}'.format(psutil.cpu_count(), p.cpu_affinity() )) start = time.time() modelAvg( numOfCycles = 10000, numThreads = 4) end = time.time() print('simulation took {}'.format(end - start)) if __name__ == '__main__': main()

These are the results:

One thread:

cpus:8, affinity[0, 1, 2, 3, 4, 5, 6, 7] starting thread 0, numOfCycles is 10000 thread 0 finished simulation took 23.542529582977295

Four threads:

cpus:8, affinity[0, 1, 2, 3, 4, 5, 6, 7] starting thread 0, numOfCycles is 2500 starting thread 1, numOfCycles is 2500 starting thread 2, numOfCycles is 2500 starting thread 3, numOfCycles is 2500 thread 1 finished thread 2 finished thread 0 finished thread 3 finished simulation took 23.508538484573364

I expect that when using 4 threads simulation time should be ideally 4 times smaller, and of cause it should not be the same.

Related: When are Python threads fast?, How to get a faster speed when using multi-threading in python, — wwii
– wwii, Commented Sep 3, 2019 at 21:31

jjmontes · Accepted Answer · 2019-09-03 21:27:02Z

When you are using cPython, you won't get significant speedups by distributing computational load across threads. This is because memory accesses in cPython are serialized using the Python GIL mechanism (Global Interpreter Lock). I have experienced this when processing text for example.

In this case, if you monitor your CPU, you would likely see that your process is not fully utilizing 4 of them, just 25% of each.

You can use MultiProcessing to really spread your load across CPUs.

Threads can still provide performance improvements in Python when your threads are IO-bound (as opossed to CPU-bound).

Thank you very much for the explanation. ProcessPoolExecutor really does what I expect.

Collectives™ on Stack Overflow

No gain from multiple threads when using ThreadPoolExecutor

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related