How to drop row in Dataframe if column is NaN and there is another row where the column is not NaN

Question

I have a pandas dataframe in python where the rows are identified by p1 & p2, but p2 is sometimes NaN:

 p1 p2 0 a 1 1 a 2 2 a 3 3 b NaN 4 c 4 5 d NaN 6 d 5

The above dataframe was returned from a larger one with many duplicates by using

df.drop_duplicates(subset=["p1","p2"], keep='last')

which works for the most part, the only issue being that NaN and 5 are technically not duplicates and therefore not dropped.

How can I drop the rows (such as: "d", NaN) where there is another row with the same p1 and a p2 value of not.null eg. "d", 5. The important thing here being that "b", NaN is kept because there are no rows with "b", not.null.

BENY · Accepted Answer · 2017-11-21 04:47:52Z

We can groupby and ffill and bfill, then drop_duplicates

df.assign(p2=df.groupby('p1')['p2'].apply(lambda x : x.ffill().bfill())).\ drop_duplicates(subset=["p1","p2"], keep='last') Out[645]: p1 p2 0 a 1.0 1 a 2.0 2 a 3.0 3 b NaN 4 c 4.0 6 d 5.0

Sebastian Mendez · Accepted Answer · 2017-11-21 05:16:17Z

This set of duplicates should essentially be the intersection of all rows which contain NaN values and rows which contain duplicate p1 elements, unioned with the those which are duplicates across both columns:

dupe_1 = df['p1'].duplicated(keep=False) & df['p2'].isnull() dupe_2 = df.duplicated(subset=['p1','p2']) total_dupes = dupe_1 | dupe_2 new_df = df[~total_dupes]

Note that this will fail for a dataframe such as:

 p1 p2 0 a NaN 1 a NaN

As both of those elements would be removed. Thus, we must first run df.drop_duplicates(subset=['p1','p2'], inplace=True, keep='last'), removing all but one of those rows, making the solution work fine once again.

df.drop_duplicates(subset=["p1","p2"], keep='last') This should remove all of those cases, so as long as I do your answer afterward it would work
Ah, excellent point, that would definitely fix that problem. I'll edit my answer to include that.

Collectives™ on Stack Overflow

How to drop row in Dataframe if column is NaN and there is another row where the column is not NaN

2 Answers 2

Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Related