I have this data: Data file
I made 8-NN graph as follows:
data20 = Import["~/Downloads/data_20.mat"]; ndata20 = Table[Flatten[data20[[i]]], {i, Length[data20]}]; tndata20 = Transpose[ndata20]; gp = NearestNeighborGraph[tndata20, 8, DistanceFunction -> EuclideanDistance, DirectedEdges -> False]; adj = AdjacencyMatrix[gp]; This gives a symmetric adjacency matrix:
adj == Transpose[adj] True This is what I did in python:
import scipy.io mat = scipy.io.loadmat('~/Downloads/data_20.mat') mat2 =[] for i in range(60000): mat2.append(mat['foo'][i]) mat2 = np.array(mat2) mat22 = mat2.reshape(60000,784) mat22T = mat22.T from sklearn.neighbors import kneighbors_graph A = kneighbors_graph(mat22T, 8, mode='connectivity') aa = A.toarray() This gives an antisymmetric matrix:
(aa==aa.T).all() False Sklearn and Mathematica aren't using the same algorithm for k-nn graph construction. How can I get the same result in python and vice-versa?
EDITS: Following the comments by @Szabolcs, I made some edits.
Naively, this is how I think it should work:
- First calculate distances between all the nodes. To calculate the distance you should choose a distance metric. In my case that would be the Euclidean metric. This gives a Distance matrix.
- If I want to make an 8-NN graph then I would find 8 closet neighbors to the node using the distance matrix. Using this idea, I can find neighbors for all the nodes. This gives a graph.
- If the graph is undirected then the graph representation (adjacency matrix) would be symmetric because if node i is connected to node j then node j is also connected to node i. Basically, the connection is bi-directional.
Here's an example:
Mathematica:
dat = {{-1, -1}, {-2, -1}, {-3, -2}, {1, 1}, {2, 1}, {3, 2}}; gp11 = NearestNeighborGraph[dat, 2, Method -> "KDtree", DistanceFunction -> EuclideanDistance, DirectedEdges -> False ] adj1 = AdjacencyMatrix[gp11]; adj1 == Transpose[adj1] True Python with sklearn:
from sklearn.neighbors import NearestNeighbors import numpy as np X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) kdt1 = KDTree(X, metric='euclidean') A11 = kneighbors_graph(kdt1, 2, mode='connectivity') a11 = A11.toarray() (a11==a11.T).all() True However, the same approach does not work for my dataset. Thanks a lot for reading my question.
kneighbors_graphdoes in sklearn? Euclidean distance is obviously symmetric. $\endgroup$