26

I want to delete rows when a few conditions are met:

An example dataframe is shown below:

 one two three four 0 -0.225730 -1.376075 0.187749 0.763307 1 0.031392 0.752496 -1.504769 -1.247581 2 -0.442992 -0.323782 -0.710859 -0.502574 3 -0.948055 -0.224910 -1.337001 3.328741 4 1.879985 -0.968238 1.229118 -1.044477 5 0.440025 -0.809856 -0.336522 0.787792 6 1.499040 0.195022 0.387194 0.952725 7 -0.923592 -1.394025 -0.623201 -0.738013 8 -1.775043 -1.279997 0.194206 -1.176260 9 -0.602815 1.183396 -2.712422 -0.377118 

I want to delete rows based on the conditions that:

Row with value of col 'one', 'two', or 'three' greater than 0; and value of col 'four' less than 0 should be deleted.

Then I tried to implement as follows:

df = df[df.one > 0 or df.two > 0 or df.three > 0 and df.four < 1] 

However, it results in a error message as follows:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() 

Could someone help me on how to delete based on multiple conditions?

0

2 Answers 2

54

For reasons that aren't 100% clear to me, pandas plays nice with the bitwise logical operators | and &, but not the boolean ones or and and.

Try this instead:

df = df[(df.one > 0) | (df.two > 0) | (df.three > 0) & (df.four < 1)] 
Sign up to request clarification or add additional context in comments.

4 Comments

You want df = df[((df.one > 0) | (df.two > 0) | (df.three > 0)) & (df.four < 1)] as to why it's because it's ambiguous to compare arrays as there are potentially multiple matches see this: stackoverflow.com/questions/10062954/…
Oh, whoops, didn't see the and at the end. Edited.
@Brionius: it's basically because or and and can't have their behaviour customized by a class. They do what they do based on the result of bool(the_object), and that's it.
To delete, say, any row with a string that contains 1 of 20 possible subkeys, look here
2

drop could be used to drop rows

The most obvious way is to constructing a boolean mask given the condition, filter the index by it to get an array of indices to drop and drop these indices using drop(). If the condition is:

Row with value of col 'one', 'two', or 'three' greater than 0; and value of col 'four' less than 0 should be deleted.

then the following works.

msk = (df['one'].gt(0) | df['two'].gt(0) | df['three'].gt(0)) & df['four'].lt(0) idx_to_drop = df.index[msk] df1 = df.drop(idx_to_drop) 

The first part of the condition, i.e. col 'one', 'two', or 'three' greater than 0 can be written a little concisely with .any(axis=1):

msk = df[['one', 'two', 'three']].gt(0).any(axis=1) & df['four'].lt(0) 

Keep the complement of the rows to drop

Deleting/removing/dropping rows is the inverse of keeping rows. So another way to do this task is to negate (~) the boolean mask for dropping rows and filter the dataframe by it.

msk = df[['one', 'two', 'three']].gt(0).any(axis=1) & df['four'].lt(0) df1 = df[~msk] 

query() the rows to keep

pd.DataFrame.query() is a pretty readable API for filtering rows to keep. It also "understands" and/or etc. So the following works.

# negate the condition to drop df1 = df.query("not ((one > 0 or two > 0 or three > 0) and four < 0)") # the same condition transformed using de Morgan's laws df1 = df.query("one <= 0 and two <= 0 and three <= 0 or four >= 0") 

All of the above perform the following transformation:

result

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.