Combine two dataframes where column values match

Question

I have two dataframes containing similar columns:

 ID prop1 1 UUU &&& 2 III *** 3 OOO ))) 4 PPP %%%

and

 ID prop2 1 UUU 1234 2 WWW 4567 3 III 7890 5 EEE 0123 6 OOO 3456 7 RRR 6789 8 PPP 9012

I need to merge these two dataframes where the IDs match, and add the prop2 column to the original.

 ID prop1 prop1 1 UUU &&& 1234 2 III *** 7890 3 OOO ))) 3456 4 PPP %%% 9012

Ive tried every combination of merge, join, concat, for, iter, etc. It will either fail to merge, lose the index, or straight-up drop the column values.

When you merge on a level that isn't the index, bringing along the index is nonsensical -- if there is a match which index should you use, the one from left or the one from right? So rightfully so, pandas just puts an Int64Index on the result. If you need to bring the index along, then you'll need to bring it into the columns before the merge. — ALollz
– ALollz, Commented Apr 28, 2021 at 19:58

SeaBean · Accepted Answer · 2021-04-28 20:31:03Z

You can use pd.merge():

pd.merge(df1, df2, on='ID')

Output:

 ID prop1 prop2 0 UUU &&& 1234 1 III *** 7890 2 OOO ))) 3456 3 PPP %%% 9012

You can also use df.merge() as follows::

df1.merge(df2, on='ID')

Same result.

The default parameter on .merge() no matter using pd.merge() or df.merge() is how='inner'. So you are already doing an inner join without specifying how= parameter.

More complex scenario:

If you require the more complicated situation to maintain the index of df1 1, 2, 3, 4 instead of 0, 1, 2, 3, you can do it by resetting index before merge and then set index on the interim index column produced when resetting index:

df1.reset_index().merge(df2, on='ID').set_index('index')

Output:

 ID prop1 prop2 index 1 UUU &&& 1234 2 III *** 7890 3 OOO ))) 3456 4 PPP %%% 9012

Now, the index 1 2 3 4 of original df1 are kept.

Optionally, if you don't want the axis label index appear on top of the row index, you can do a rename_axis() as follows:

df1.reset_index().merge(df2, on='ID').set_index('index').rename_axis(index=None)

Output:

 ID prop1 prop2 1 UUU &&& 1234 2 III *** 7890 3 OOO ))) 3456 4 PPP %%% 9012

@cdubbs You have to reassign to df1 like: df1 = df1.reset_index().merge(df2, on='ID').set_index('index').rename_axis(index=None) or assign to a new name.
@cdubbs I assumed df1 is the first dataframe and df2 is the second. Is that what you name them. If no, replace df1, df2 with your dataframe names.
@cdubbs I tested with creating the test dataframes by: df1 = pd.DataFrame({'ID': {1: 'UUU', 2: 'III', 3: 'OOO', 4: 'PPP'}, 'prop1': {1: '&&&', 2: '***', 3: ')))', 4: '%%%'}}) and df2 = pd.DataFrame({'ID': {1: 'UUU', 2: 'WWW', 3: 'III', 5: 'EEE', 6: 'OOO', 7: 'RRR', 8: 'PPP'}, 'prop2': {1: 1234, 2: 4567, 3: 7890, 5: 123, 6: 3456, 7: 6789, 8: 9012}}) The test dataframes work fine for me and they should be the same as your sample data.
Hi @cdubbs Is it ok now ? If no, is there same empty dataframe with defining df1 df2 with my codes on the comment above ?

acrobat · Accepted Answer · 2021-04-28 20:42:48Z

You can also use .map to add the prop2 values to your original dataframe, where the ID column values match.

df1['prop2'] = df1['ID'].map(dict(df2[['ID', 'prop2']].to_numpy())

Should there be any IDs in your original dataframe that aren't also in the second one (and so don't have a prop2 value to bring across, you can fill those holes by adding .fillna() with the value of your choice.

df1['prop2'] = df1['ID'].map(dict(df2[['ID', 'prop2']].to_numpy()).fillna(your_fill_value_here)

Collectives™ on Stack Overflow

Combine two dataframes where column values match

2 Answers 2

More complex scenario:

5 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

More complex scenario:

5 Comments

Comments

Related