test for membership in a 2d numpy array

Question

I have two 2D arrays of the same size

a = array([[1,2],[3,4],[5,6]]) b = array([[1,2],[3,4],[7,8]])

I want to know the rows of b that are in a.

So the output should be :

array([ True, True, False], dtype=bool)

without making :

array([any(i == a) for i in b])

cause a and b are huge.

There is a function that does this but only for 1D arrays : in1d

What is the actual dtype of a and b?

unutbu
– unutbu

2013-04-25 13:51:35 +00:00
Commented Apr 25, 2013 at 13:51 — unutbu
– unutbu, Commented Apr 25, 2013 at 13:51
@unutbu float (could settle to int)

amine23
– amine23

2013-04-25 14:01:31 +00:00
Commented Apr 25, 2013 at 14:01 — amine23
– amine23, Commented Apr 25, 2013 at 14:01

unutbu · Accepted Answer · 2019-01-09 14:40:14Z

What we'd really like to do is use np.in1d... except that np.in1d only works with 1-dimensional arrays. Our arrays are multi-dimensional. However, we can view the arrays as a 1-dimensional array of strings:

arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))

For example,

In [15]: arr = np.array([[1, 2], [2, 3], [1, 3]]) In [16]: arr = arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))) In [30]: arr.dtype Out[30]: dtype('V16') In [31]: arr.shape Out[31]: (3, 1) In [37]: arr Out[37]: array([[b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00'], [b'\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'], [b'\x01\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00']], dtype='|V16')

This makes each row of arr a string. Now it is just a matter of hooking this up to np.in1d:

import numpy as np def asvoid(arr): """ Based on http://stackoverflow.com/a/16973510/190597 (Jaime, 2013-06) View the array as dtype np.void (bytes). The items along the last axis are viewed as one value. This allows comparisons to be performed on the entire row. """ arr = np.ascontiguousarray(arr) if np.issubdtype(arr.dtype, np.floating): """ Care needs to be taken here since np.array([-0.]).view(np.void) != np.array([0.]).view(np.void) Adding 0. converts -0. to 0. """ arr += 0. return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))) def inNd(a, b, assume_unique=False): a = asvoid(a) b = asvoid(b) return np.in1d(a, b, assume_unique) tests = [ (np.array([[1, 2], [2, 3], [1, 3]]), np.array([[2, 2], [3, 3], [4, 4]]), np.array([False, False, False])), (np.array([[1, 2], [2, 2], [1, 3]]), np.array([[2, 2], [3, 3], [4, 4]]), np.array([True, False, False])), (np.array([[1, 2], [3, 4], [5, 6]]), np.array([[1, 2], [3, 4], [7, 8]]), np.array([True, True, False])), (np.array([[1, 2], [5, 6], [3, 4]]), np.array([[1, 2], [5, 6], [7, 8]]), np.array([True, True, False])), (np.array([[-0.5, 2.5, -2, 100, 2], [5, 6, 7, 8, 9], [3, 4, 5, 6, 7]]), np.array([[1.0, 2, 3, 4, 5], [5, 6, 7, 8, 9], [-0.5, 2.5, -2, 100, 2]]), np.array([False, True, True])) ] for a, b, answer in tests: result = inNd(b, a) try: assert np.all(answer == result) except AssertionError: print('''\ a: {a} b: {b} answer: {answer} result: {result}'''.format(**locals())) raise else: print('Success!')

yields

Success!

View it as a record array, I think .view(dtype([(´´, a.dtype)*a.shape[1]])) is what you need, and you have the same trick working for any type.
@Jaime: I tried a1d = a.view([('f0','int32'),('f1','int32')]), b1d = ..., np.in1d(a1d, b1d) but got a TypeError. If you see a way around this, I'd love to know.
Funny that mergesort doesn't work with generalized dtypes... While it still won't work with this, the simplest way I found to join many fields in a single dtype is dtype((np.void, a.dtype.itemsize*a.shape[1])).
@0vbb: In Python2, np.str was a dtype representing bytes. In Python3, np.str represents unicode strings. Here, we want to compare values as bytes not unicode strings. np.void serves this purpose on both Python2 and Python3.
@0vbb: I've updated the code above with Jaime's idea of using np.void dtype instead of np.str. This avoids the ValueError you were seeing too.

Jan · Accepted Answer · 2013-04-25 19:44:56Z

In [1]: import numpy as np In [2]: a = np.array([[1,2],[3,4]]) In [3]: b = np.array([[3,4],[1,2]]) In [5]: a = a[a[:,1].argsort(kind='mergesort')] In [6]: a = a[a[:,0].argsort(kind='mergesort')] In [7]: b = b[b[:,1].argsort(kind='mergesort')] In [8]: b = b[b[:,0].argsort(kind='mergesort')] In [9]: bInA1 = b[:,0] == a[:,0] In [10]: bInA2 = b[:,1] == a[:,1] In [11]: bInA = bInA1*bInA2 In [12]: bInA Out[12]: array([ True, True], dtype=bool)

should do this... Not sure, whether this is still efficient. You need do mergesort, as other methods are unstable.

Edit:

If you have more than 2 columns and if the rows are sorted already, you can do

In [24]: bInA = np.array([True,]*a.shape[0]) In [25]: bInA Out[25]: array([ True, True], dtype=bool) In [26]: for k in range(a.shape[1]): bInAk = b[:,k] == a[:,k] bInA = bInAk*bInA ....: In [27]: bInA Out[27]: array([ True, True], dtype=bool)

There is still space for speeding up, as in the iteration, you don't have to check the entire column, but only the entries where the current bInA is True.

what if a = array([[1,2],[2,3],[1,3]]) and b = array([[2,3],[3,3],[4,4]]) ?
yes.. I've just checked this - then it fails... I'm trying to fix this
Edit/Fix: Using in1d may fail, because it does not check for the location of the occurence... Changed it to the ==
Btw. As the == outperforms in1d by a factor of 8, it now performs much better.
this would fail the test case I posted as a comment on Ryan Saxe answer.

Oresto · Accepted Answer · 2016-07-20 11:12:51Z

If you have smth like a=np.array([[1,2],[3,4],[5,6]]) and b=np.array([[5,6],[1,2],[7,6]]), you can convert them into complex 1-D array:

c=a[:,0]+a[:,1]*1j d=b[:,0]+b[:,1]*1j

This whole stuff in my Interpreter looks like this:

>>> c=a[:,0]+a[:,1]*1j >>> c array([ 1.+2.j, 3.+4.j, 5.+6.j]) >>> d=b[:,0]+b[:,1]*1j >>> d array([ 5.+6.j, 1.+2.j, 7.+6.j])

And now that you have just 1D array, you can easily do np.in1d(c,d), and the Python will give you:

>>> np.in1d(c,d) array([ True, False, True], dtype=bool)

And with this you don't need any loops, at least with this data type

Ryan Saxe · Accepted Answer · 2013-04-25 14:16:36Z

the numpy module can actually broadcast through your array and tell what parts are the same as the other and return true if they are and false if they are not:

import numpy as np a = np.array(([1,2],[3,4],[5,6])) #converting to a numpy array b = np.array(([1,2],[3,4],[7,8])) #converting to a numpy array new_array = a == b #creating a new boolean array from comparing a and b

now new_array looks like this:

[[ True True] [ True True] [False False]]

but that is not what you want. So you can transpose (flip x and y) the array and then compare the two rows with an & gate. This will now create a 1-D array that will only return true if both columns in the row are true:

new_array = new_array.T #transposing result = new_array[0] & new_array[1] #comparing rows

when you print result you now get what you're looking for:

[ True True False]

What if a = array([[1,2],[3,4]]) and b = array([[3,4],[1,2]]) ?
it was not really clear that you wanted to be able to compare all. Your example didn't display that clearlyly...and you want to be able to check if a nested array in b is in a without using a for loop?

Collectives™ on Stack Overflow

test for membership in a 2d numpy array

4 Answers 4

16 Comments

6 Comments

Comments

2 Comments

Linked

Hot Network Questions