1

I have a pandas dataframe in python where the rows are identified by p1 & p2, but p2 is sometimes NaN:

 p1 p2 0 a 1 1 a 2 2 a 3 3 b NaN 4 c 4 5 d NaN 6 d 5 

The above dataframe was returned from a larger one with many duplicates by using

df.drop_duplicates(subset=["p1","p2"], keep='last') 

which works for the most part, the only issue being that NaN and 5 are technically not duplicates and therefore not dropped.

How can I drop the rows (such as: "d", NaN) where there is another row with the same p1 and a p2 value of not.null eg. "d", 5. The important thing here being that "b", NaN is kept because there are no rows with "b", not.null.

2 Answers 2

1

We can groupby and ffill and bfill, then drop_duplicates

df.assign(p2=df.groupby('p1')['p2'].apply(lambda x : x.ffill().bfill())).\ drop_duplicates(subset=["p1","p2"], keep='last') Out[645]: p1 p2 0 a 1.0 1 a 2.0 2 a 3.0 3 b NaN 4 c 4.0 6 d 5.0 
Sign up to request clarification or add additional context in comments.

Comments

1

This set of duplicates should essentially be the intersection of all rows which contain NaN values and rows which contain duplicate p1 elements, unioned with the those which are duplicates across both columns:

dupe_1 = df['p1'].duplicated(keep=False) & df['p2'].isnull() dupe_2 = df.duplicated(subset=['p1','p2']) total_dupes = dupe_1 | dupe_2 new_df = df[~total_dupes] 

Note that this will fail for a dataframe such as:

 p1 p2 0 a NaN 1 a NaN 

As both of those elements would be removed. Thus, we must first run df.drop_duplicates(subset=['p1','p2'], inplace=True, keep='last'), removing all but one of those rows, making the solution work fine once again.

2 Comments

df.drop_duplicates(subset=["p1","p2"], keep='last') This should remove all of those cases, so as long as I do your answer afterward it would work
Ah, excellent point, that would definitely fix that problem. I'll edit my answer to include that.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.