2

Given two numpy arrays of nx3 and mx3, what is an efficient way to determine the row indices (counter) wherein the rows are common in the two arrays. For instance I have the following solution, which is significantly slow for not even much larger arrays

def arrangment(arr1,arr2): hits = [] for i in range(arr2.shape[0]): current_row = np.repeat(arr2[i,:][None,:],arr1.shape[0],axis=0) x = current_row - arr1 for j in range(arr1.shape[0]): if np.isclose(x[j,0],0.0) and np.isclose(x[j,1],0.0) and np.isclose(x[j,2],0.0): hits.append(j) return hits 

It checks if rows of arr2 exist in arr1 and returns the row indices of arr1 where the rows match. I need this arrangement to be always sequentially ascending in terms of rows of arr2. For instance given

arr1 = np.array([[-1., -1., -1.], [ 1., -1., -1.], [ 1., 1., -1.], [-1., 1., -1.], [-1., -1., 1.], [ 1., -1., 1.], [ 1., 1., 1.], [-1., 1., 1.]]) arr2 = np.array([[-1., 1., -1.], [ 1., 1., -1.], [ 1., 1., 1.], [-1., 1., 1.]]) 

The function should return:

[3, 2, 6, 7] 

2 Answers 2

3

quick and dirty answer

(arr1[:, None] == arr2).all(-1).argmax(0) array([3, 2, 6, 7]) 

Better answer
Takes care of chance a row in arr2 doesn't match anything in arr1

t = (arr1[:, None] == arr2).all(-1) np.where(t.any(0), t.argmax(0), np.nan) array([ 3., 2., 6., 7.]) 

As pointed out by @Divakar np.isclose accounts for rounding error in comparing floats

t = np.isclose(arr1[:, None], arr2).all(-1) np.where(t.any(0), t.argmax(0), np.nan) 
Sign up to request clarification or add additional context in comments.

1 Comment

Use np.isclose(arr1[:, None],arr2) to account for floating pt numbers?
0

I had a similar problem in the past and I came up with a fairly optimised solution for it.

First you need a generalisation of numpy.unique for multidimensional arrays, which for the sake of completeness I would copy it here

def unique2d(arr,consider_sort=False,return_index=False,return_inverse=False): """Get unique values along an axis for 2D arrays. input: arr: 2D array consider_sort: Does permutation of the values within the axis matter? Two rows can contain the same values but with different arrangements. If consider_sort is True then those rows would be considered equal return_index: Similar to numpy unique return_inverse: Similar to numpy unique returns: 2D array of unique rows If return_index is True also returns indices If return_inverse is True also returns the inverse array """ if consider_sort is True: a = np.sort(arr,axis=1) else: a = arr b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1]))) if return_inverse is False: _, idx = np.unique(b, return_index=True) else: _, idx, inv = np.unique(b, return_index=True, return_inverse=True) if return_index == False and return_inverse == False: return arr[idx] elif return_index == True and return_inverse == False: return arr[idx], idx elif return_index == False and return_inverse == True: return arr[idx], inv else: return arr[idx], idx, inv 

Now all you need is to concatenate (np.vstack) your arrays and find the unique rows. The reverse mapping together with np.searchsorted will give you the indices you need. So lets write another function similar to numpy.in2d but for multidimensional (2D) arrays

def in2d_unsorted(arr1, arr2, axis=1, consider_sort=False): """Find the elements in arr1 which are also in arr2 and sort them as the appear in arr2""" assert arr1.dtype == arr2.dtype if axis == 0: arr1 = np.copy(arr1.T,order='C') arr2 = np.copy(arr2.T,order='C') if consider_sort is True: sorter_arr1 = np.argsort(arr1) arr1 = arr1[np.arange(arr1.shape[0])[:,None],sorter_arr1] sorter_arr2 = np.argsort(arr2) arr2 = arr2[np.arange(arr2.shape[0])[:,None],sorter_arr2] arr = np.vstack((arr1,arr2)) _, inv = unique2d(arr, return_inverse=True) size1 = arr1.shape[0] size2 = arr2.shape[0] arr3 = inv[:size1] arr4 = inv[-size2:] # Sort the indices as they appear in arr2 sorter = np.argsort(arr3) idx = sorter[arr3.searchsorted(arr4, sorter=sorter)] return idx 

Now all you need to do is call in2d_unsorted with your input parameters

>>> in2d_unsorted(arr1,arr2) array([ 3, 2, 6, 7]) 

While may not be fully optimised this approach is much faster. Let's benchmark it against @piRSquareds solutions

def indices_piR(arr1,arr2): t = np.isclose(arr1[:, None], arr2).all(-1) return np.where(t.any(0), t.argmax(0), np.nan) 

with the following arrays

n=150 arr1 = np.random.permutation(n).reshape(n//3, 3) idx = np.random.permutation(n//3) arr2 = arr1[idx] In [13]: np.allclose(in2d_unsorted(arr1,arr2),indices_piR(arr1,arr2)) True In [14]: %timeit indices_piR(arr1,arr2) 10000 loops, best of 3: 181 µs per loop In [15]: %timeit in2d_unsorted(arr1,arr2) 10000 loops, best of 3: 85.7 µs per loop 

Now, for n=1500

In [24]: %timeit indices_piR(arr1,arr2) 100 loops, best of 3: 10.3 ms per loop In [25]: %timeit in2d_unsorted(arr1,arr2) 1000 loops, best of 3: 403 µs per loop 

and for n=15000

In [28]: %timeit indices_piR(A,B) 1 loop, best of 3: 1.02 s per loop In [29]: %timeit in2d_unsorted(arr1,arr2) 100 loops, best of 3: 4.65 ms per loop 

So for largeish arrays this is over 200X faster compared to @piRSquared's vectorised solution.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.