i have a question on how to calculate distances in numpy as fast as it can,
def getR1(VVm,VVs,HHm,HHs): t0=time.time() R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis] R*=R R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis] R1*=R1 R+=R1 del R1 print "R1\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500) print numpy.max(R) #4176.26290975 # uses 17.5Gb ram return R def getR2(VVm,VVs,HHm,HHs): t0=time.time() precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten())) measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten())) deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :] #print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2) R = numpy.einsum('ijk,ijk->ij', deltas, deltas) print "R2\t",time.time()-t0,R.shape, #14.5291359425 (108225, 10500) print numpy.max(R) #4176.26290975 # uses 26Gb ram return R def getR3(VVm,VVs,HHm,HHs): from numpy.core.umath_tests import inner1d t0=time.time() precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten())) measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten())) deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :] #print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2) R = inner1d(deltas, deltas) print "R3\t",time.time()-t0, R.shape, #12.6972110271 (108225, 10500) print numpy.max(R) #4176.26290975 #Uses 26Gb return R def getR4(VVm,VVs,HHm,HHs): from scipy.spatial.distance import cdist t0=time.time() precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten())) measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten())) R=spdist.cdist(precomputed_flat,measured_flat, 'sqeuclidean') #.T print "R4\t",time.time()-t0, R.shape, #17.7022118568 (108225, 10500) print numpy.max(R) #4176.26290975 # uses 9 Gb ram return R def getR5(VVm,VVs,HHm,HHs): from scipy.spatial.distance import cdist t0=time.time() precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten())) measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten())) R=spdist.cdist(precomputed_flat,measured_flat, 'euclidean') #.T print "R5\t",time.time()-t0, R.shape, #15.6070930958 (108225, 10500) print numpy.max(R) #64.6240118667 # uses only 9 Gb ram return R def getR6(VVm,VVs,HHm,HHs): from scipy.weave import blitz t0=time.time() R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis] blitz("R=R*R") # R*=R R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis] blitz("R1=R1*R1") # R1*=R1 blitz("R=R+R1") # R+=R1 del R1 print "R6\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500) print numpy.max(R) #4176.26290975 return R results in the following times:
R1 11.7737319469 (108225, 10500) 4909.66881791 R2 15.1279799938 (108225, 10500) 4909.66881791 R3 12.7408981323 (108225, 10500) 4909.66881791 R4 17.3336868286 (10500, 108225) 4909.66881791 R5 15.7530870438 (10500, 108225) 70.0690289494 R6 11.670968771 (108225, 10500) 4909.66881791 While the last one gives sqrt((VVm-VVs)^2+(HHm-HHs)^2), while the others give (VVm-VVs)^2+(HHm-HHs)^2, This is not really important, since otherwise further on in my code i take the minimum of R[i,:] for each i, and sqrt doesnt influence the minimum value anyways, (and if i am interested in the distance, i just take sqrt(value), instead of doing the sqrt over the entire array, so there is really no timing difference due to that.
The question remains: how come the first solution is the best, (the reason the second and third are slower is because deltas=... takes 5.8seconds, (which is also why those two methods take 26Gb)), And why is the sqeuclidean slower than the euclidean?
sqeuclidean should just do (VVm-VVs)^2+(HHm-HHs)^2, while i think it does something different. Anyone know how to find the sourcecode (C or whatever is at the bottom) of that method? I think it does sqrt((VVm-VVs)^2+(HHm-HHs)^2)^2 (the only reason i can think why it would be slower than (VVm-VVs)^2+(HHm-HHs)^2 - I know its a stupid reason, anyone got a more logical one?)
Since i know nothing of C, how would i inline this with scipy.weave? and is that code compilable normally like you do with python? or do i need special stuff installed for that?
Edit: ok, i tried it with scipy.weave.blitz, (R6 method), and that is slightly faster, but i assume someone who knows more C than me can still improve this speed? I just took the lines which are of the form a+=b or *=, and looked up how they would be in C, and put them in the blitz statement, but i guess if i put lines with the statements with flatten and newaxis in C as well, that it should go faster too, but i dont know how i can do that (someone who knows C maybe explain?). Right now, the difference between the stuff with blitz and my first method are not big enough to really be caused by C vs numpy i guess?
I guess the other methods like with deltas=... can go much faster too, when i would put it in C ?
R(i.e., just useR1 += R3).