Knn graph differences: Mathematica and sklearn

Question

I have this data: Data file

I made 8-NN graph as follows:

data20 = Import["~/Downloads/data_20.mat"]; ndata20 = Table[Flatten[data20[[i]]], {i, Length[data20]}]; tndata20 = Transpose[ndata20]; gp = NearestNeighborGraph[tndata20, 8, DistanceFunction -> EuclideanDistance, DirectedEdges -> False]; adj = AdjacencyMatrix[gp];

This gives a symmetric adjacency matrix:

adj == Transpose[adj] True

This is what I did in python:

import scipy.io mat = scipy.io.loadmat('~/Downloads/data_20.mat') mat2 =[] for i in range(60000): mat2.append(mat['foo'][i]) mat2 = np.array(mat2) mat22 = mat2.reshape(60000,784) mat22T = mat22.T from sklearn.neighbors import kneighbors_graph A = kneighbors_graph(mat22T, 8, mode='connectivity') aa = A.toarray()

This gives an antisymmetric matrix:

(aa==aa.T).all() False

Sklearn and Mathematica aren't using the same algorithm for k-nn graph construction. How can I get the same result in python and vice-versa?

EDITS: Following the comments by @Szabolcs, I made some edits.

Naively, this is how I think it should work:

First calculate distances between all the nodes. To calculate the distance you should choose a distance metric. In my case that would be the Euclidean metric. This gives a Distance matrix.
If I want to make an 8-NN graph then I would find 8 closet neighbors to the node using the distance matrix. Using this idea, I can find neighbors for all the nodes. This gives a graph.
If the graph is undirected then the graph representation (adjacency matrix) would be symmetric because if node i is connected to node j then node j is also connected to node i. Basically, the connection is bi-directional.

Here's an example:

Mathematica:

dat = {{-1, -1}, {-2, -1}, {-3, -2}, {1, 1}, {2, 1}, {3, 2}}; gp11 = NearestNeighborGraph[dat, 2, Method -> "KDtree", DistanceFunction -> EuclideanDistance, DirectedEdges -> False ] adj1 = AdjacencyMatrix[gp11]; adj1 == Transpose[adj1] True

Python with sklearn:

from sklearn.neighbors import NearestNeighbors import numpy as np X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) kdt1 = KDTree(X, metric='euclidean') A11 = kneighbors_graph(kdt1, 2, mode='connectivity') a11 = A11.toarray() (a11==a11.T).all() True

However, the same approach does not work for my dataset. Thanks a lot for reading my question.

That's not Python, it's sklearn to be accurate. There are many different Python packages. Can you change the title and the description? Also, can you give a plain English description of what kneighbors_graph does in sklearn? Euclidean distance is obviously symmetric. — Szabolcs
– Szabolcs, Commented Nov 6, 2021 at 19:52

Szabolcs · Accepted Answer · 2021-11-07 08:49:22Z

I did not look at sklearn, but I wanted to note that by default, NearestNeighborGraph creates an undirected graph. This means that the adjacency matrix is symmetric. We can use DirectedEdges -> True to get a directed graph:

SeedRandom[1234]; pts = RandomPoint[Disk[], 10]; NearestNeighborGraph[pts, 2, DirectedEdges -> True]

Notice that not all connections are reciprocal. This is because if point B is A's closest neighbour, that does not mean that A is also B's closest neighbour.

While I did not try sklearn, perhaps this is the source of the difference. You can test it on a small example.

sra · Accepted Answer · 2021-11-08 18:32:16Z

sklearn makes a directed knn graph.

In the case of mathematica:

adj = NearestNeighborGraph[pts, 2, DirectedEdges -> True] mat = AdjacencyMatrix[adj] mat == Transpose[mat] False adj1 = NearestNeighborGraph[pts, 2, DirectedEdges -> False] mat1 = AdjacencyMatrix[adj1] mat1 == Transpose[mat1] True

In the case of sklearn, using the same data as @Szabolcs':

xt=np.array([[0.753217, 0.0439285], [-0.827553, -0.244174], [0.0875135, -0.0413367], [-0.509302, 0.519792], [-0.0819656, 0.769458], [0.167709, -0.472054], [0.83912, -0.15233], [0.974581, 0.175885], [0.392318, 0.503732], [-0.196902, 0.265483]]); adj = kneighbors_graph(xt, 2,metric='euclidean', mode='connectivity') adj11 = adj.toarray() (adj11==adj11.T).all() False

How can I force sklearn to produce an undirected graph? Just do this:

adj11 = adj11 + adj11.T adj11[adj11 > 1] = 1

This adj11 is equal to mat1

Stack Exchange Network

Knn graph differences: Mathematica and sklearn

2 Answers 2

Hot Network Questions

Knn graph differences: Mathematica and sklearn

2 Answers 2

Related

Hot Network Questions