2

I am doing a dataframe outer join using multiple columns:

DF1:

ColumnA ColumnB ColumnC ColumnD 1 2 3 4 1 2 3 4 

DF2:

ColumnE ColumnF ColumnG ColumnH 1 2 3 4 1 2 3 4 

Merging code:

df= pd.merge(DF1, DF2, left_on=['ColumnA','ColumnB','ColumnC','ColumnD'], right_on=['ColumnE','ColumnF','ColumnG','ColumnH'], how='outer') 

Actual outcome:

ColumnA ColumnB ColumnC ColumnD ColumnE ColumnF ColumnG ColumnH 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 

Expected outcome(the values should display only twice as the combination of columns matches exactly in two datasets):

ColumnA ColumnB ColumnC ColumnD ColumnE ColumnF ColumnG ColumnH 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 

Can someone advise where am I going wrong?

1
  • This happens because there are duplicate values ​​in each column. If your data frame had 3 rows instead of 2 then 9 rows would appear instead of 4. please check my answer:) Commented Nov 19, 2019 at 1:10

2 Answers 2

2

You have identical duplicates on both df1 and df2, so the merged df got number of rows double for each duplicate. Simple solution is keep one dataframe unique by drop_duplicates and merge

df = pd.merge(df1.drop_duplicates(), df2, left_on=['ColumnA','ColumnB' ,'ColumnC','ColumnD'], right_on=['ColumnE','ColumnF','ColumnG','ColumnH'], how='outer') Out[742]: ColumnA ColumnB ColumnC ColumnD ColumnE ColumnF ColumnG ColumnH 0 1 2 3 4 1 2 3 4 1 1 2 3 4 1 2 3 4 
Sign up to request clarification or add additional context in comments.

Comments

2

So we need merge with a additional key , created by cumcount

df1=df1.assign(Key=df1.groupby(list(df1)).cumcount()) df2=df2.assign(Key=df1.groupby(list(df1)).cumcount() df1.merge(df2, left_on=['ColumnA','ColumnB','ColumnC','ColumnD','Key'], right_on=['ColumnE','ColumnF','ColumnG','ColumnH','Key'], how='outer') Out[19]: ColumnA ColumnB ColumnC ColumnD Key ColumnE ColumnF ColumnG ColumnH 0 1 2 3 4 0 1 2 3 4 1 1 2 3 4 1 1 2 3 4 

1 Comment

I never think about adding an additional key :) +1

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.