1

I have these two dataframes:

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],'B': ['B0', 'B1', 'B2']}) df2 = pd.DataFrame({'A': ['A0', 'A1', 'A3'],'B': ['B0', 'B1', 'B2']}) 

I would like to merge these two dataframes now by the entries in column 'A'. But I do not want to keep the rows which match but the ones which do not match to each other.

That means, I would like to get a new dataframe which looks like this one:

df_new = pd.DataFrame({'A':['A3'], 'B':['B2']}) 

How could I do this?

Thanks a lot!

1
  • What about how='outer' in your pd.merge()? Commented Aug 3, 2020 at 20:49

2 Answers 2

1

Merge with outer join gives You similar result:

df1.merge(df2, how = 'outer', on = 'A', indicator = True) 
 A B_x B_y _merge 0 A0 B0 B0 both 1 A1 B1 B1 both 2 A2 B2 NaN left_only 3 A3 NaN B2 right_only 

which can be filtered by query:

df1.merge(df2, how = 'outer', on = 'A', indicator = True).query("_merge != 'both'") 
 A B_x B_y _merge 2 A2 B2 NaN left_only 3 A3 NaN B2 right_only 

Note indicator = True created column _merge suitable to filter rows.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you! If I use on = ['A', 'B'] than the datasets are first merged by 'A' and afterwards by column 'B'?
When a list is passed as on argument, then all columns in list are used as composite key which joins dataframes. It's like `df1.A==df2.A & df1.B == df2.B', similar to SQL join clause in relational databases.
Ok, that means, that the columns are first joined by the first argument and afterwards by the second, right?
I'm not sure there is any difference between on = ['A', 'B'] and on = ['B', 'A'] when it comes to implementation of this method. From perspective of merge logic and result it's exactly the same.
1

Try this, using isin :

df2.loc[~df1.A.isin(df2.A)] A B 2 A3 B2 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.