I'm trying to learn how to use the multiprocessing package in Python, and I've written the following code, which randomly generates a large 2D array and then works out how many numbers in each row are within a specified interval (in this case between 4 and 8):
import time import multiprocessing as mp import numpy as np def how_many_within_range(row, minimum, maximum): count = 0 for n in row: if minimum <= n <= maximum: count += 1 return count if __name__ == '__main__': data = np.random.randint(0, 10, size=[10000000, 5]) print(data[:5]) start_time = time.perf_counter() # With parallelisation with mp.Pool(mp.cpu_count()) as pool: results = [ pool.apply(how_many_within_range, args=(row, 4, 8)) \ for row in data ] # Without parallelisation # results = [ how_many_within_range(row, 4, 8) for row in data ] print(f'Time elapsed: {time.perf_counter() - start_time}') print(results[:5]) Without multiprocessing, the code runs in about 40 seconds, but with it, the program is much slower and doesn't finish in a realistic time. I'm pretty sure I've correctly followed the tutorial I was using, so what am I doing wrong?