32

I have a dataframe customers with some "bad" rows, the key in this dataframe is CustomerID. I know I should drop these rows. I have a list called badcu that says [23770, 24572, 28773, ...] each value corresponds to a different "bad" customer.

Then I have another dataframe, lets call it sales, so I want to drop all the records for the bad customers, the ones in the badcu list.

If I do the following

sales[sales.CustomerID.isin(badcu)] 

I got a dataframe with precisely the records I want to drop, but if I do a

sales.drop(sales.CustomerID.isin(badcu)) 

It returns a dataframe with the first row dropped (which is a legitimate order), and the rest of the rows intact (it doesn't delete the bad ones), I think I know why this happens, but I still don't know how to drop the incorrect customer id rows.

2
  • you should drop by indexes Commented Apr 7, 2017 at 4:29
  • 3
    Use sales[~sales.CustomerID.isin(badcu)] Commented Apr 7, 2017 at 4:38

3 Answers 3

77

You need

new_df = sales[~sales.CustomerID.isin(badcu)] 
Sign up to request clarification or add additional context in comments.

3 Comments

I use your method to exclude rows based on iphone numbers of my dataframe, it doesn't work. weird.
The error is as follows: TypeError: isin() takes 2 positional arguments but 69 were given
@ahbon, did you pass a list of arguments?
7

You can also use query

sales.query('CustomerID not in @badcu') 

2 Comments

I use also this method to exclude rows based on iphone numbers of my dataframe, it doesn't work. weird.
@ahbon may be a bit late on the comment but how did it not work? What error did you get? Different result?
5

I think the best way is to drop by index,try it and let me know

sales.drop(sales[sales.CustomerId.isin(badcu)].index.tolist()) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.