1

I'm trying to simulate some processes in order to get some statistics. I decided to write simulation program using multiple threads as each test run is independant.

It means that if I need to perform e.g. 1000 test runs then it should be possible to have 4 threads (each doing 250 test runs).

While doing this I found that addition of multiple threads does not decrease simulation time.

I have Windows 10 laptop with 4 physical cores.

I wrote a simple program which shows behaviour I'm talking about.

from concurrent.futures import ThreadPoolExecutor import time import psutil import random def runScenario(): d = dict() for i in range(0, 10000): rval = random.random() if rval in d: d[rval] += 1 else: d[rval] = 1 return len(d) def runScenarioMultipleTimesSingleThread(taskId, numOfCycles): print('starting thread {}, numOfCycles is {}'.format(taskId, numOfCycles)) sum = 0 for i in range(numOfCycles): sum += runScenario() print('thread {} finished'.format(taskId)) return sum def modelAvg(numOfCycles, numThreads): pool = ThreadPoolExecutor(max_workers=numThreads) cyclesPerThread = int(numOfCycles / numThreads) numOfCycles = cyclesPerThread * numThreads futures = list() for i in range(numThreads): future = pool.submit(runScenarioMultipleTimesSingleThread, i, cyclesPerThread) futures.append(future) sum = 0 for future in futures: sum += future.result() return sum / numOfCycles def main(): p = psutil.Process() print('cpus:{}, affinity{}'.format(psutil.cpu_count(), p.cpu_affinity() )) start = time.time() modelAvg( numOfCycles = 10000, numThreads = 4) end = time.time() print('simulation took {}'.format(end - start)) if __name__ == '__main__': main() 

These are the results:

One thread:

cpus:8, affinity[0, 1, 2, 3, 4, 5, 6, 7] starting thread 0, numOfCycles is 10000 thread 0 finished simulation took 23.542529582977295 

Four threads:

cpus:8, affinity[0, 1, 2, 3, 4, 5, 6, 7] starting thread 0, numOfCycles is 2500 starting thread 1, numOfCycles is 2500 starting thread 2, numOfCycles is 2500 starting thread 3, numOfCycles is 2500 thread 1 finished thread 2 finished thread 0 finished thread 3 finished simulation took 23.508538484573364 

I expect that when using 4 threads simulation time should be ideally 4 times smaller, and of cause it should not be the same.

2

1 Answer 1

3

When you are using cPython, you won't get significant speedups by distributing computational load across threads. This is because memory accesses in cPython are serialized using the Python GIL mechanism (Global Interpreter Lock). I have experienced this when processing text for example.

In this case, if you monitor your CPU, you would likely see that your process is not fully utilizing 4 of them, just 25% of each.

You can use MultiProcessing to really spread your load across CPUs.

Threads can still provide performance improvements in Python when your threads are IO-bound (as opossed to CPU-bound).

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for the explanation. ProcessPoolExecutor really does what I expect.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.