First, thanks for reading and taking the time to respond.
Second, the question:
I have a PxN matrix X where P is in the order of 10^6 and N is in the order of 10^3. So, X is relatively large and is not sparse. Let's say each row of X is an N-dimensional sample. I want to construct a PxP matrix of pairwise distances between these P samples. Let's also say I am interested in Hellinger distances.
So far I am relying on sparse dok matrices:
def hellinger_distance(X): P = X.shape[0] H1 = sp.sparse.dok_matrix((P, P)) for i in xrange(P): if i%100 == 0: print i x1 = X[i] X2 = X[i:P] h = np.sqrt(((np.sqrt(x1) - np.sqrt(X2))**2).sum(1)) / math.sqrt(2) H1[i, i:P] = h H = H1 + H1.T return H This is super slow. Is there a more efficient way of doing this? Any help is much appreciated.