I'm given a 2-D numpy array X consisting of floating values and need to compute the euclidean distances between all pairs of rows, then compute the top k row indices with the smallest distances and return them (where k > 0). I'm testing with a small array and this is what I have so far...
import numpy as np from sklearn.metrics.pairwise import euclidean_distances X_testing = np.asarray([[1,2,3.5],[4,1,2],[0,0,2],[3.4,1,5.6]]) test = euclidean_distances(X_testing, X_testing) print(test) The resulting printout is:
[[ 0. 3.5 2.6925824 3.34215499] [ 3.5 0. 4.12310563 3.64965752] [ 2.6925824 4.12310563 0. 5.05173238] [ 3.34215499 3.64965752 5.05173238 0. ]] Next, I need to efficiently compute the top k smallest distances between all pairs of rows, and return the corresponding k tuples of (row1, row2, distance_value) in order in the form of a list.
So in the above test case, if k = 2, then I would need to return the following:
[(0, 2, 2.6925824), (0, 3, 3.34215499)]
Is there a built-in way (in either scipy, sklearn, numpy, etc.), or any other way to help compute this efficiently? Although the above test case is small, in reality the 2-D array is very large so memory and time is a concern. Thanks