Pandas merging on different size dataframes based on one column

Question

I have 2 different sizes of dataframes.

On df1, I have date, time, username, email address, phone number, duration from logs. But email address and phone number is just columns with empty string.

On df2, I have all the username, email address and phone number from the database.

How can I merged df2 into df1 based on the username? Meaning to say the size of df1 will stays the same, but the email address and phone number columns will be populated. Of course with the matching data from df2

Assuming username is unique

jezrael · Accepted Answer · 2018-10-05 07:57:04Z

Use merge with left join and parameter suffixes, lastr remove original columns email address and phone number (with _):

df1 = pd.DataFrame({ 'username':list('abccdd'), 'email address':[''] * 6, 'phone number':[''] * 6, 'duration':[5,3,6,9,2,4], }) print (df1) username email address phone number duration 0 a 5 1 b 3 2 c 6 3 c 9 4 d 2 5 d 4 df2 = pd.DataFrame({ 'username':list('abcd'), 'email address':['[email protected]','[email protected]','[email protected]','[email protected]'], 'phone number':range(4) }) print (df2) username email address phone number 0 a [email protected] 0 1 b [email protected] 1 2 c [email protected] 2 3 d [email protected] 3

df = (df1.merge(df2, on='username', how='left', suffixes=('_','')) .drop(['email address_','phone number_'], axis=1) .reindex(columns=df1.columns)) print (df) username email address phone number duration 0 a [email protected] 0 5 1 b [email protected] 1 3 2 c [email protected] 2 6 3 c [email protected] 2 9 4 d [email protected] 3 2 5 d [email protected] 3 4

Another solution with difference for all columns names without defined in list and reindex for same ordering like in df1 of columns:

c = df1.columns.difference(['email address','phone number']) df = df1[c].merge(df2, on='username', how='left').reindex(columns=df1.columns) print (df) username email address phone number duration 0 a [email protected] 0 5 1 b [email protected] 1 3 2 c [email protected] 2 6 3 c [email protected] 2 9 4 d [email protected] 3 2 5 d [email protected] 3 4

Thanks! the 2nd one is much better that I do not need to reorder my column again.

Joe · Accepted Answer · 2018-10-05 08:27:52Z

You can use this:

df = df1[['username', 'date', 'time', 'duration']].merge(df2, left_on='username', right_on='username')

Example: df1

 date duration email address phone number time username 0 2015 5 14:00 aa 1 2016 10 16:00 bb

df2

 email address phone number username 0 rrr@ 333444 aa 1 tt@ 555533 bb

Output:

 username date time duration email address phone number 0 aa 2015 14:00 5 rrr@ 333444 1 bb 2016 16:00 10 tt@ 555533

Collectives™ on Stack Overflow

Pandas merging on different size dataframes based on one column

2 Answers 2

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Linked

Related