4

Take two matrices, arr1, arr2 of size mxn and pxn respectively. I'm trying to find the cosine distance of their respected rows as a mxp matrix. Essentially I want to take the the pairwise dot product of the rows, then divide by the outer product of the norms of each rows.

import numpy as np def cosine_distance(arr1, arr2): numerator = np.dot(arr1, arr2.T) denominator = np.outer( np.sqrt(np.square(arr1).sum(1)), np.sqrt(np.square(arr2).sum(1))) return np.nan_to_num(np.divide(numerator, denominator)) 

I Think this should be returning an mxn matrix with entries in [-1.0, 1.0] but for some reason I'm getting values out of that interval. I'm thinking that my one of these numpy functions is doing something other than what I think it does.

2
  • 1
    If p is different from n, then the rows of arr1 and arr2 are not the same lentgh. How do you compute their inner product in this case? Commented Oct 16, 2015 at 6:44
  • @M.Massias sorry meant to be m by n and p by n. They should have the same number of columns. Commented Oct 16, 2015 at 7:21

1 Answer 1

4

It sounds like you need to divide by the outer product of the L2 norms of your arrays of vectors:

arr1.dot(arr2.T) / np.outer(np.linalg.norm(arr1, axis=1), np.linalg.norm(arr2, axis=1)) 

e.g.

In [4]: arr1 = np.array([[1., -2., 3.], [0., 0.5, 2.], [-1., 1.5, 1.5], [2., -0.5, 0.]]) In [5]: arr2 = np.array([[0., -3., 1.], [1.5, 0.25, 1.]]) In [6]: arr1.dot(arr2.T)/np.outer(np.linalg.norm(arr1, axis=1), np.linalg.norm(arr2, axis=1)) Out[6]: array([[ 0.76063883, 0.58737848], [ 0.0766965 , 0.56635211], [-0.40451992, 0.08785611], [ 0.2300895 , 0.7662411 ]]) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.