I have a pandas dataframe in python where the rows are identified by p1 & p2, but p2 is sometimes NaN:
p1 p2 0 a 1 1 a 2 2 a 3 3 b NaN 4 c 4 5 d NaN 6 d 5 The above dataframe was returned from a larger one with many duplicates by using
df.drop_duplicates(subset=["p1","p2"], keep='last') which works for the most part, the only issue being that NaN and 5 are technically not duplicates and therefore not dropped.
How can I drop the rows (such as: "d", NaN) where there is another row with the same p1 and a p2 value of not.null eg. "d", 5. The important thing here being that "b", NaN is kept because there are no rows with "b", not.null.