Merging pandas dataframes and keeping the rows where merge criterium does not match

Question

I have these two dataframes:

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],'B': ['B0', 'B1', 'B2']}) df2 = pd.DataFrame({'A': ['A0', 'A1', 'A3'],'B': ['B0', 'B1', 'B2']})

I would like to merge these two dataframes now by the entries in column 'A'. But I do not want to keep the rows which match but the ones which do not match to each other.

That means, I would like to get a new dataframe which looks like this one:

df_new = pd.DataFrame({'A':['A3'], 'B':['B2']})

How could I do this?

Thanks a lot!

What about how='outer' in your pd.merge()?

G. Anderson
– G. Anderson

2020-08-03 20:49:15 +00:00
Commented Aug 3, 2020 at 20:49 — G. Anderson
– G. Anderson, Commented Aug 3, 2020 at 20:49

ipj · Accepted Answer · 2020-08-03 21:08:23Z

1

Merge with outer join gives You similar result:

df1.merge(df2, how = 'outer', on = 'A', indicator = True)

 A B_x B_y _merge 0 A0 B0 B0 both 1 A1 B1 B1 both 2 A2 B2 NaN left_only 3 A3 NaN B2 right_only

which can be filtered by query:

df1.merge(df2, how = 'outer', on = 'A', indicator = True).query("_merge != 'both'")

 A B_x B_y _merge 2 A2 B2 NaN left_only 3 A3 NaN B2 right_only

Note indicator = True created column _merge suitable to filter rows.

answered Aug 3, 2020 at 21:08

ipj

3,5981 gold badge17 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Tobitor Over a year ago

Thank you! If I use on = ['A', 'B'] than the datasets are first merged by 'A' and afterwards by column 'B'?

ipj Over a year ago

When a list is passed as on argument, then all columns in list are used as composite key which joins dataframes. It's like `df1.A==df2.A & df1.B == df2.B', similar to SQL join clause in relational databases.

Tobitor Over a year ago

Ok, that means, that the columns are first joined by the first argument and afterwards by the second, right?

ipj Over a year ago

I'm not sure there is any difference between on = ['A', 'B'] and on = ['B', 'A'] when it comes to implementation of this method. From perspective of merge logic and result it's exactly the same.

sammywemmy · Accepted Answer · 2020-08-03 20:50:13Z

1

Try this, using isin :

df2.loc[~df1.A.isin(df2.A)] A B 2 A3 B2

answered Aug 3, 2020 at 20:50

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Collectives™ on Stack Overflow

Merging pandas dataframes and keeping the rows where merge criterium does not match

2 Answers 2

4 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Related