I have an 1D array of numbers, and want to calculate all pairwise euclidean distances. I have a method (thanks to SO) of doing this with broadcasting, but it's inefficient because it calculates each distance twice. And it doesn't scale well.
Here's an example that gives me what I want with an array of 1000 numbers.
import numpy as np import random r = np.array([random.randrange(1, 1000) for _ in range(0, 1000)]) dists = np.abs(r - r[:, None]) What's the fastest implementation in scipy/numpy/scikit-learn that I can use to do this, given that it has to scale to situations where the 1D array has >10k values.
Note: the matrix is symmetric, so I'm guessing that it's possible to get at least a 2x speedup by addressing that, I just don't know how.
scipy.spatial.distance.pdist. I dunno whether this is the fastest option, since it needs to have checks for multidimensional data, non-Euclidean norms, and other things, but it's built in.scipyis always compiled with BLAS, it's not optional as withnumpy.