4
$\begingroup$

I have this data: Data file

I made 8-NN graph as follows:

data20 = Import["~/Downloads/data_20.mat"]; ndata20 = Table[Flatten[data20[[i]]], {i, Length[data20]}]; tndata20 = Transpose[ndata20]; gp = NearestNeighborGraph[tndata20, 8, DistanceFunction -> EuclideanDistance, DirectedEdges -> False]; adj = AdjacencyMatrix[gp]; 

This gives a symmetric adjacency matrix:

adj == Transpose[adj] True 

This is what I did in python:

import scipy.io mat = scipy.io.loadmat('~/Downloads/data_20.mat') mat2 =[] for i in range(60000): mat2.append(mat['foo'][i]) mat2 = np.array(mat2) mat22 = mat2.reshape(60000,784) mat22T = mat22.T from sklearn.neighbors import kneighbors_graph A = kneighbors_graph(mat22T, 8, mode='connectivity') aa = A.toarray() 

This gives an antisymmetric matrix:

(aa==aa.T).all() False 

Sklearn and Mathematica aren't using the same algorithm for k-nn graph construction. How can I get the same result in python and vice-versa?

EDITS: Following the comments by @Szabolcs, I made some edits.

Naively, this is how I think it should work:

  1. First calculate distances between all the nodes. To calculate the distance you should choose a distance metric. In my case that would be the Euclidean metric. This gives a Distance matrix.
  2. If I want to make an 8-NN graph then I would find 8 closet neighbors to the node using the distance matrix. Using this idea, I can find neighbors for all the nodes. This gives a graph.
  3. If the graph is undirected then the graph representation (adjacency matrix) would be symmetric because if node i is connected to node j then node j is also connected to node i. Basically, the connection is bi-directional.

Here's an example:

Mathematica:

dat = {{-1, -1}, {-2, -1}, {-3, -2}, {1, 1}, {2, 1}, {3, 2}}; gp11 = NearestNeighborGraph[dat, 2, Method -> "KDtree", DistanceFunction -> EuclideanDistance, DirectedEdges -> False ] adj1 = AdjacencyMatrix[gp11]; adj1 == Transpose[adj1] True 

Python with sklearn:

from sklearn.neighbors import NearestNeighbors import numpy as np X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) kdt1 = KDTree(X, metric='euclidean') A11 = kneighbors_graph(kdt1, 2, mode='connectivity') a11 = A11.toarray() (a11==a11.T).all() True 

However, the same approach does not work for my dataset. Thanks a lot for reading my question.

$\endgroup$
1
  • 3
    $\begingroup$ That's not Python, it's sklearn to be accurate. There are many different Python packages. Can you change the title and the description? Also, can you give a plain English description of what kneighbors_graph does in sklearn? Euclidean distance is obviously symmetric. $\endgroup$ Commented Nov 6, 2021 at 19:52

2 Answers 2

4
$\begingroup$

I did not look at sklearn, but I wanted to note that by default, NearestNeighborGraph creates an undirected graph. This means that the adjacency matrix is symmetric. We can use DirectedEdges -> True to get a directed graph:

SeedRandom[1234]; pts = RandomPoint[Disk[], 10]; NearestNeighborGraph[pts, 2, DirectedEdges -> True] 

Notice that not all connections are reciprocal. This is because if point B is A's closest neighbour, that does not mean that A is also B's closest neighbour.

While I did not try sklearn, perhaps this is the source of the difference. You can test it on a small example.

$\endgroup$
2
$\begingroup$

sklearn makes a directed knn graph.

In the case of mathematica:

adj = NearestNeighborGraph[pts, 2, DirectedEdges -> True] mat = AdjacencyMatrix[adj] mat == Transpose[mat] False adj1 = NearestNeighborGraph[pts, 2, DirectedEdges -> False] mat1 = AdjacencyMatrix[adj1] mat1 == Transpose[mat1] True 

In the case of sklearn, using the same data as @Szabolcs':

xt=np.array([[0.753217, 0.0439285], [-0.827553, -0.244174], [0.0875135, -0.0413367], [-0.509302, 0.519792], [-0.0819656, 0.769458], [0.167709, -0.472054], [0.83912, -0.15233], [0.974581, 0.175885], [0.392318, 0.503732], [-0.196902, 0.265483]]); adj = kneighbors_graph(xt, 2,metric='euclidean', mode='connectivity') adj11 = adj.toarray() (adj11==adj11.T).all() False 

How can I force sklearn to produce an undirected graph? Just do this:

adj11 = adj11 + adj11.T adj11[adj11 > 1] = 1 

This adj11 is equal to mat1

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.