I have a matrix multiplication test to do a [MxN] * [Nx1] multiplication. It uses Numpy (and MKL) in Windows:
import timeit import numpy as np from numpy.random import random_sample NUMBER_OF_SAMPLES = 1000000 NUMBER_OF_DIMENSIONS = 128 dataset = random_sample((NUMBER_OF_SAMPLES, NUMBER_OF_DIMENSIONS)).astype(np.float32) feature = random_sample((NUMBER_OF_DIMENSIONS, 1)).astype(np.float32) print("Finished Generating the Data...") numbers = 1000 total_time = timeit.timeit('np.dot(dataset, feature)', globals=globals(), number=numbers) print("Average Time %.3f" % float(total_time / numbers)) I benchmarked it on a Core i7 7700 CPU (4 Cores / 8 Threads) and again on a Core i7 7820X (8 Cores / 16 Threads) With HyperThreading disabled and enabled (Disabling Hyperthreading didn't really change the benchmark much):
Core i7 7700 (4 Cores) || Data Count 1.000.000 || 128 Dimensional || Time 0.021 s Core i7 7820X (8 Cores) || Data Count 1.000.000 || 128 Dimensional || Time 0.019 s I expected increasing the CPU count would cut the time by half but it barely did anything.
Is there any way to improve this speed? Thanks.
timeitbecause you're callingtime()and also appending to a list in the loop; calls which aren't vectorized and could be taking up a significant portion of the execution time of the loop. But I wouldn't expect the kind of correlation with speed as you're expecting.