2

My goal is to take a list of np.arrays and create an associated list or array that classifies each as having a duplicate or not. Here's what I thought would work:

www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])] uniques, counts = np.unique(www, axis = 0, return_counts = True) counts = [1 if x > 1 else 0 for x in counts] count_dict = dict(zip(uniques, counts)) [count_dict[i] for i in www] 

The desired output for this case would be :

[1, 1, 0]

because the first and second element have another copy within the original list. It seems that the problem is that I cannot use a np.array as a key for a dictionary.

Suggestions?

1
  • Are all the arrays in www always the same size? Commented Jul 11, 2019 at 13:59

2 Answers 2

2

First convert www to a 2D Numpy array then do the following:

In [18]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int) Out[18]: array([1, 1, 0]) 

here we use broadcasting for check the equality of all www rows with uniques array and then using all() on last axis to find out which of its rows are completely equal to uniques rows.

Here's the elaborated results:

In [20]: (www[:,None] == uniques).all(2) Out[20]: array([[ True, False], [ True, False], [False, True]]) # Respective indices in `counts` array In [21]: np.where((www[:,None] == uniques).all(2))[1] Out[21]: array([0, 0, 1]) In [22]: counts[np.where((www[:,None] == uniques).all(2))[1]] > 1 Out[22]: array([ True, True, False]) In [23]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int) Out[23]: array([1, 1, 0]) 
Sign up to request clarification or add additional context in comments.

Comments

1

In Python, lists (and numpy arrays) cannot be hashed, so they can't be used as dictionary keys. But tuples can! So one option would be to convert your original list to a tuple, and to convert uniques to a tuple. The following works for me:

www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])] www_tuples = [tuple(l) for l in www] # list of tuples uniques, counts = np.unique(www, axis = 0, return_counts = True) counts = [1 if x > 1 else 0 for x in counts] # convert uniques to tuples uniques_tuples = [tuple(l) for l in uniques] count_dict = dict(zip(uniques_tuples, counts)) [count_dict[i] for i in www_tuples] 

Just a heads-up: this will double your memory consumption, so it may not be the best solution if www is large. You can mitigate the extra memory consumption by ingesting your data as tuples instead of numpy arrays if possible.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.