2

I'm trying to merge certain data frames into a single data frame and fill in the missing values using pandas. An example case is as follows:

import pandas as pd data1 = {'SKU' : ['C1', 'D1'], 'Description' : ['c2', 'd'], 'Unit Cost' : [0.2, 1.5], 'Qty1' : [18, 10]} idx1 = ['RM0001', 'RM0004'] data2 = {'SKU' : ['C1', np.nan], 'Description' : ['c', 'e'], 'Qty2' : [15, 8]} idx2 = ['RM0001', 'RM0010'] data3 = {'SKU' : ['D1', 'E1'], 'Description' : ['d', 'e'], 'Qty3' : [7, 9]} idx3 = ['RM0004', 'RM0010'] df1 = pd.DataFrame(data1, index=idx1) df2 = pd.DataFrame(data2, index=idx2) df3 = pd.DataFrame(data3, index=idx3) 

And the desired output would be of the form:

 SKU Description Unit Cost Qty1 Qty2 Qty3 RM0001 C1 c 0.2 18.0 15.0 NaN RM0004 D1 d 1.5 10.0 NaN 7.0 RM0010 E1 e NaN NaN 8.0 9.0 

I've tried various pd.merge and functools.reduce applications, but none are giving me the output I'd like. I'm still learning pandas, so I think I'm missing something, since this doesn't feel like it should be too complicated. A small explanation of the steps (or a link to an excellent source) would be appreciated.

1 Answer 1

1

Try combine_first, which you can chain them:

df1.combine_first(df2).combine_first(df3) 

Output:

 Description Qty1 Qty2 Qty3 SKU Unit Cost RM0001 c2 18.0 15.0 NaN C1 0.2 RM0004 d 10.0 NaN 7.0 D1 1.5 RM0010 e NaN 8.0 9.0 E1 NaN 

Or you can use concat with groupby:

pd.concat([df1,df2,df3]).groupby(level=0).first() 

Output:

 SKU Description Unit Cost Qty1 Qty2 Qty3 RM0001 C1 c2 0.2 18.0 15.0 NaN RM0004 D1 d 1.5 10.0 NaN 7.0 RM0010 E1 e NaN NaN 8.0 9.0 
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you, I didn't know about combine_first. Diving into the documentation now. However, it seems like your output is slightly off, since the c2 is suppose to be c.
@Matt c coming from df2 where df1 has c2. You may want to reverse the order, e.g. df3.combine_first(df2).combine_first(df1) or use .last() in the concat approach.
Thank you! Works perfectly. For clarification, what is the level=0 argument doing in groupby?
level=0 groups the rows with the same index (at level 0).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.