0

Suppose I have two dataframes

df1 = pd.DataFrame({"A" : [1,1,2,5], "B" : [1,1,4,5], "C" : ["Adam","Bella","Charlie","Dan"]}) df2 = pd.DataFrame({"A" : [1,1,3,5], "B" : [1,3,6,5]}) 

and I want to delete the rows in df1 that have the same values of A and B with df2

I do this by

for i, row_1 in df1.iterrows(): for j, row_2 in df2.iterrows(): if row_1["A"] == row_2["A"] and row_1["B"] == row_2["B"]: index = i df1.drop([index], axis=0, inplace=False) 

which resulted in, as intended

 A B C 2 2 4 Charlie 

I was wondering if there was a much easier/faster way to do this especially if the data frame is large then it is not ideal to iterate over all the rows.

3 Answers 3

1

You can left-merge with the indicator parameter to flag the rows that match; then query to filter the rows that come only from df1:

out = df1.merge(df2, how='left', indicator=True).query('_merge=="left_only"').drop(columns=['_merge']) 

Output:

 A B C 2 2 4 Charlie 
Sign up to request clarification or add additional context in comments.

4 Comments

How does the above code satisfy this condition row_1["A"] == row_2["A"] and row_1["B"] == row_2["B"]
@PandaPandas it merges on the common columns (in this case A and B); i.e. it matches the A and B values in df1 with the A and B values in df2
How would this work if the columns for the two dataframe did not share the same strings
@PandaPandas do you mean common column names? because columns don't contain strings. If there are no common column names, you'll have to specify which columns to merge on. So df1.merge(df2, how='left', left_on=['A','B'], right_on=['AA','BB'], indicator=True) etc.
1

Here is another way:

df1.loc[~df1.set_index(['A','B']).index.isin(df2.to_records(index=False).tolist())] 

1 Comment

How would this work if the columns for the two dataframe did not share the same strings
0
#!pip install siuba from siuba import anti_join anti_join(df1, df2, on = ['A', 'B']) A B C 2 2 4 Charlie 

if you want to anti_join on all shared columns:

anti_join(df1, df2) A B C 2 2 4 Charlie 

2 Comments

How would this work if the columns for the two dataframe did not share the same strings
@PandaPandas will just give an empty dataframe

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.