My Python program was too slow. So, I profiled it and found that most of the time was being spent in a function that computes distance between two points (a point is a list of 3 Python floats):
def get_dist(pt0, pt1): val = 0 for i in range(3): val += (pt0[i] - pt1[i]) ** 2 val = math.sqrt(val) return val To analyze why this function was so slow, I wrote two test programs: one in Python and one in C++ that do similar computation. They compute the distance between 1 million pairs of points. (The test code in Python and C++ is below.)
The Python computation takes 2 seconds, while C++ takes 0.02 seconds. A 100x difference!
Why is Python code so much slower than C++ code for such simple math computations? How do I speed it up to match the C++ performance?
The Python code used for testing:
import math, random, time num = 1000000 # Generate random points and numbers pt_list = [] rand_list = [] for i in range(num): pt = [] for j in range(3): pt.append(random.random()) pt_list.append(pt) rand_list.append(random.randint(0, num - 1)) # Compute beg_time = time.clock() dist = 0 for i in range(num): pt0 = pt_list[i] ri = rand_list[i] pt1 = pt_list[ri] val = 0 for j in range(3): val += (pt0[j] - pt1[j]) ** 2 val = math.sqrt(val) dist += val end_time = time.clock() elap_time = (end_time - beg_time) print elap_time print dist The C++ code used for testing:
#include <cstdlib> #include <iostream> #include <ctime> #include <cmath> struct Point { double v[3]; }; int num = 1000000; int main() { // Allocate memory Point** pt_list = new Point*[num]; int* rand_list = new int[num]; // Generate random points and numbers for ( int i = 0; i < num; ++i ) { Point* pt = new Point; for ( int j = 0; j < 3; ++j ) { const double r = (double) rand() / (double) RAND_MAX; pt->v[j] = r; } pt_list[i] = pt; rand_list[i] = rand() % num; } // Compute clock_t beg_time = clock(); double dist = 0; for ( int i = 0; i < num; ++i ) { const Point* pt0 = pt_list[i]; int r = rand_list[i]; const Point* pt1 = pt_list[r]; double val = 0; for ( int j = 0; j < 3; ++j ) { const double d = pt0->v[j] - pt1->v[j]; val += ( d * d ); } val = sqrt(val); dist += val; } clock_t end_time = clock(); double sec_time = (end_time - beg_time) / (double) CLOCKS_PER_SEC; std::cout << sec_time << std::endl; std::cout << dist << std::endl; return 0; }
numpyfor computation across such large datasets.get_dist()implemented in Cython would be almost instantaneous compared withfor i in xrange(num)overhead (14ms for num=1000000 on my machine)). Cython interoperates with numpy arrays very well. You can use Cython if you can't express your computations as vectorized numpy operations.