1

I have 2 different sizes of dataframes.

On df1, I have date, time, username, email address, phone number, duration from logs. But email address and phone number is just columns with empty string.

On df2, I have all the username, email address and phone number from the database.

How can I merged df2 into df1 based on the username? Meaning to say the size of df1 will stays the same, but the email address and phone number columns will be populated. Of course with the matching data from df2

Assuming username is unique

2 Answers 2

1

Use merge with left join and parameter suffixes, lastr remove original columns email address and phone number (with _):

df1 = pd.DataFrame({ 'username':list('abccdd'), 'email address':[''] * 6, 'phone number':[''] * 6, 'duration':[5,3,6,9,2,4], }) print (df1) username email address phone number duration 0 a 5 1 b 3 2 c 6 3 c 9 4 d 2 5 d 4 df2 = pd.DataFrame({ 'username':list('abcd'), 'email address':['[email protected]','[email protected]','[email protected]','[email protected]'], 'phone number':range(4) }) print (df2) username email address phone number 0 a [email protected] 0 1 b [email protected] 1 2 c [email protected] 2 3 d [email protected] 3 

df = (df1.merge(df2, on='username', how='left', suffixes=('_','')) .drop(['email address_','phone number_'], axis=1) .reindex(columns=df1.columns)) print (df) username email address phone number duration 0 a [email protected] 0 5 1 b [email protected] 1 3 2 c [email protected] 2 6 3 c [email protected] 2 9 4 d [email protected] 3 2 5 d [email protected] 3 4 

Another solution with difference for all columns names without defined in list and reindex for same ordering like in df1 of columns:

c = df1.columns.difference(['email address','phone number']) df = df1[c].merge(df2, on='username', how='left').reindex(columns=df1.columns) print (df) username email address phone number duration 0 a [email protected] 0 5 1 b [email protected] 1 3 2 c [email protected] 2 6 3 c [email protected] 2 9 4 d [email protected] 3 2 5 d [email protected] 3 4 
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! the 2nd one is much better that I do not need to reorder my column again.
1

You can use this:

df = df1[['username', 'date', 'time', 'duration']].merge(df2, left_on='username', right_on='username') 

Example: df1

 date duration email address phone number time username 0 2015 5 14:00 aa 1 2016 10 16:00 bb 

df2

 email address phone number username 0 rrr@ 333444 aa 1 tt@ 555533 bb 

Output:

 username date time duration email address phone number 0 aa 2015 14:00 5 rrr@ 333444 1 bb 2016 16:00 10 tt@ 555533 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.