0

Scenario: I have a dataframe with some nan scattered around. It has multiple columns, the ones of interest are "bid" and "ask"

What I want to do: I want to remove all rows where the bid column value is nan AND the ask column value is nan.

Question: What is the best way to do it?

What I already tried:

ab_df = ab_df[ab_df.bid != 'nan' and ab_df.ask != 'nan'] ab_df = ab_df[ab_df.bid.empty and ab_df.ask.empty] ab_df = ab_df[ab_df.bid.notnull and ab_df.ask.notnull] 

But none of them work.

0

2 Answers 2

3

You need vectorized logical operators & or | (and and or from python are to compare scalars not for pandas Series), to check nan values, you can use isnull and notnull:

To remove all rows where the bid column value is nan AND the ask column value is nan, keep the opposite:

ab_df[ab_df.bid.notnull() | ab_df.ask.notnull()] 

Example:

df = pd.DataFrame({ "bid": [pd.np.nan, 1, 2, pd.np.nan], "ask": [pd.np.nan, pd.np.nan, 2, 1] }) df[df.bid.notnull() | df.ask.notnull()] # ask bid #1 NaN 1.0 #2 2.0 2.0 #3 1.0 NaN 

If you need both columns to be non missing:

df[df.bid.notnull() & df.ask.notnull()] # ask bid #2 2.0 2.0 

Another option using dropna by setting the thresh parameter:

df.dropna(subset=['ask', 'bid'], thresh=1) # ask bid #1 NaN 1.0 #2 2.0 2.0 #3 1.0 NaN df.dropna(subset=['ask', 'bid'], thresh=2) # ask bid #2 2.0 2.0 
Sign up to request clarification or add additional context in comments.

4 Comments

In this case, since I want both bid and ask to be nan, should I use &?
It depends on what you want to keep, if you need both bid and ask to be non missing values, then you'll need &.
Yes, that is the case, I only want to remove if both are nan. Thanks for the help.
@DGMS89 I think you want the answer as written - test it to be sure, and look at the middle section of my answer where I got it wrong at first! you want to REMOVE if both are null, which means KEEP if either isn't null.
2
ab_df = ab_df.loc[~ab_df.bid.isnull() | ~ab_df.ask.isnull()] 

all this time I've been usign that because i convinced myself that .notnull() didn't exist. TIL.

ab_df = ab_df.loc[ab_df.bid.notnull() | ab_df.ask.notnull()] 

The key is & rather than and and | rather than or

I made a mistake earlier using & - this is wrong because you want either bid isn't null OR ask isn't null, using and would give you only the rows where both are not null.

I think you can ab_df.dropna() as well, but i'll have to look it up

EDIT

oddly df.dropna() doesn't seem to support dropping based on NAs in a specific column. I would have thought it did.

based on the other answer I now see it does. It's friday afternoon, ok?

1 Comment

Yeah I know the friday feeling. Been working on this code for 2 weeks, and got almost no progress. I am about to shoryuken my pc to the stratosphere.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.