I'm trying to simulate some processes in order to get some statistics. I decided to write simulation program using multiple threads as each test run is independant.
It means that if I need to perform e.g. 1000 test runs then it should be possible to have 4 threads (each doing 250 test runs).
While doing this I found that addition of multiple threads does not decrease simulation time.
I have Windows 10 laptop with 4 physical cores.
I wrote a simple program which shows behaviour I'm talking about.
from concurrent.futures import ThreadPoolExecutor import time import psutil import random def runScenario(): d = dict() for i in range(0, 10000): rval = random.random() if rval in d: d[rval] += 1 else: d[rval] = 1 return len(d) def runScenarioMultipleTimesSingleThread(taskId, numOfCycles): print('starting thread {}, numOfCycles is {}'.format(taskId, numOfCycles)) sum = 0 for i in range(numOfCycles): sum += runScenario() print('thread {} finished'.format(taskId)) return sum def modelAvg(numOfCycles, numThreads): pool = ThreadPoolExecutor(max_workers=numThreads) cyclesPerThread = int(numOfCycles / numThreads) numOfCycles = cyclesPerThread * numThreads futures = list() for i in range(numThreads): future = pool.submit(runScenarioMultipleTimesSingleThread, i, cyclesPerThread) futures.append(future) sum = 0 for future in futures: sum += future.result() return sum / numOfCycles def main(): p = psutil.Process() print('cpus:{}, affinity{}'.format(psutil.cpu_count(), p.cpu_affinity() )) start = time.time() modelAvg( numOfCycles = 10000, numThreads = 4) end = time.time() print('simulation took {}'.format(end - start)) if __name__ == '__main__': main() These are the results:
One thread:
cpus:8, affinity[0, 1, 2, 3, 4, 5, 6, 7] starting thread 0, numOfCycles is 10000 thread 0 finished simulation took 23.542529582977295 Four threads:
cpus:8, affinity[0, 1, 2, 3, 4, 5, 6, 7] starting thread 0, numOfCycles is 2500 starting thread 1, numOfCycles is 2500 starting thread 2, numOfCycles is 2500 starting thread 3, numOfCycles is 2500 thread 1 finished thread 2 finished thread 0 finished thread 3 finished simulation took 23.508538484573364 I expect that when using 4 threads simulation time should be ideally 4 times smaller, and of cause it should not be the same.