cosine distance between two matrices

Question

Take two matrices, arr1, arr2 of size mxn and pxn respectively. I'm trying to find the cosine distance of their respected rows as a mxp matrix. Essentially I want to take the the pairwise dot product of the rows, then divide by the outer product of the norms of each rows.

import numpy as np def cosine_distance(arr1, arr2): numerator = np.dot(arr1, arr2.T) denominator = np.outer( np.sqrt(np.square(arr1).sum(1)), np.sqrt(np.square(arr2).sum(1))) return np.nan_to_num(np.divide(numerator, denominator))

I Think this should be returning an mxn matrix with entries in [-1.0, 1.0] but for some reason I'm getting values out of that interval. I'm thinking that my one of these numpy functions is doing something other than what I think it does.

If p is different from n, then the rows of arr1 and arr2 are not the same lentgh. How do you compute their inner product in this case? — P. Camilleri
– P. Camilleri, Commented Oct 16, 2015 at 6:44
@M.Massias sorry meant to be m by n and p by n. They should have the same number of columns. — Kevin Johnson
– Kevin Johnson, Commented Oct 16, 2015 at 7:21

xnx · Accepted Answer · 2015-10-16 07:32:47Z

It sounds like you need to divide by the outer product of the L2 norms of your arrays of vectors:

arr1.dot(arr2.T) / np.outer(np.linalg.norm(arr1, axis=1), np.linalg.norm(arr2, axis=1))

e.g.

In [4]: arr1 = np.array([[1., -2., 3.], [0., 0.5, 2.], [-1., 1.5, 1.5], [2., -0.5, 0.]]) In [5]: arr2 = np.array([[0., -3., 1.], [1.5, 0.25, 1.]]) In [6]: arr1.dot(arr2.T)/np.outer(np.linalg.norm(arr1, axis=1), np.linalg.norm(arr2, axis=1)) Out[6]: array([[ 0.76063883, 0.58737848], [ 0.0766965 , 0.56635211], [-0.40451992, 0.08785611], [ 0.2300895 , 0.7662411 ]])

Collectives™ on Stack Overflow

cosine distance between two matrices

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related