classify np.arrays as duplicates

Question

My goal is to take a list of np.arrays and create an associated list or array that classifies each as having a duplicate or not. Here's what I thought would work:

www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])] uniques, counts = np.unique(www, axis = 0, return_counts = True) counts = [1 if x > 1 else 0 for x in counts] count_dict = dict(zip(uniques, counts)) [count_dict[i] for i in www]

The desired output for this case would be :

[1, 1, 0]

because the first and second element have another copy within the original list. It seems that the problem is that I cannot use a np.array as a key for a dictionary.

Suggestions?

Are all the arrays in www always the same size?

javidcf
– javidcf

2019-07-11 13:59:22 +00:00
Commented Jul 11, 2019 at 13:59 — javidcf
– javidcf, Commented Jul 11, 2019 at 13:59

Kasravnd · Accepted Answer · 2019-07-11 14:02:53Z

First convert www to a 2D Numpy array then do the following:

In [18]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int) Out[18]: array([1, 1, 0])

here we use broadcasting for check the equality of all www rows with uniques array and then using all() on last axis to find out which of its rows are completely equal to uniques rows.

Here's the elaborated results:

In [20]: (www[:,None] == uniques).all(2) Out[20]: array([[ True, False], [ True, False], [False, True]]) # Respective indices in `counts` array In [21]: np.where((www[:,None] == uniques).all(2))[1] Out[21]: array([0, 0, 1]) In [22]: counts[np.where((www[:,None] == uniques).all(2))[1]] > 1 Out[22]: array([ True, True, False]) In [23]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int) Out[23]: array([1, 1, 0])

zachdj · Accepted Answer · 2019-07-11 13:58:43Z

In Python, lists (and numpy arrays) cannot be hashed, so they can't be used as dictionary keys. But tuples can! So one option would be to convert your original list to a tuple, and to convert uniques to a tuple. The following works for me:

www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])] www_tuples = [tuple(l) for l in www] # list of tuples uniques, counts = np.unique(www, axis = 0, return_counts = True) counts = [1 if x > 1 else 0 for x in counts] # convert uniques to tuples uniques_tuples = [tuple(l) for l in uniques] count_dict = dict(zip(uniques_tuples, counts)) [count_dict[i] for i in www_tuples]

Just a heads-up: this will double your memory consumption, so it may not be the best solution if www is large. You can mitigate the extra memory consumption by ingesting your data as tuples instead of numpy arrays if possible.

Collectives™ on Stack Overflow

classify np.arrays as duplicates

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related