2
  1. I have my original DataFrame (df1).
  2. I create a new DataFrame (df2) with only some rows from the first one (df1).
  3. I add some columns to this new DataFrame (df2).
  4. Now I want to update first DataFrame (df1) with my new content (df2).

So...I need to merge 2 DataFrame and the second DataFrame has more columns and less rows.

import pandas as pd print(pd.__version__) # 0.24.1 index1 = [1, 2, 3, 4] columns1 = ['a', 'b', 'c'] data1 = [ ['a1', 'b1', 'c1'], ['a2', 'b2', 'c2'], ['a3', 'b3', 'c3'], ['a4', 'b4', 'c4']] index2 = [1, 4] columns2 = ['b', 'c', 'd', 'e'] data2 = [ ['b1', 'c1', '<D1', 'e1'], ['b4', '<C4', 'd4', 'e4']] df1 = pd.DataFrame(index=index1, columns=columns1, data=data1) df2 = pd.DataFrame(index=index2, columns=columns2, data=data2) print(df1) # a b c # 1 a1 b1 c1 # 2 a2 b2 c2 # 3 a3 b3 c3 # 4 a4 b4 c4 print(df2) # b c d e # 1 b1 c1 <D1 e1 # 4 b4 <C4 d4 e4 # What I want: # a b c d e # 1 a1 b1 c1 <D1 e1 # 2 a2 b2 c2 NaN NaN # 3 a3 b3 c3 NaN NaN # 4 a4 b4 <C4 d4 e4 

I tried, but I'm lost with all the .merge, .update, .concat, .join, .combine_first etc. methods and all parameters. How can I simply merge these 2 DataFrame?

1 Answer 1

3

I couldn't do it in one line but this should work

df1.update(df2) df1 = df1.merge(df2, how='left') 

And then for some reason "merge" resets the index, so if you still want 1 to 4:

df1.index = index1 Out[]: a b c d e 1 a1 b1 c1 <D1 e1 2 a2 b2 c2 NaN NaN 3 a3 b3 c3 NaN NaN 4 a4 b4 <C4 d4 e4 
Sign up to request clarification or add additional context in comments.

2 Comments

It works for me, thank you. Kind of strange merge behaviour but it's ok!
The merge thing is really strange! I tried to figure a way around it but this is probably the simplest. I found this if you're interested: stackoverflow.com/questions/11976503/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.