3
df1 = pd.DataFrame({'a':['id1','id2','id3'],'b':['W','W','W'],'c1':[1,2,3]}) df2 = pd.DataFrame({'a':['id1','id2','id3'],'b':['W','W','W'],'c2':[4,5,6]}) df3 = pd.DataFrame({'a':['id1','id4','id5'],'b':['Q','Q','Q'],'c1':[7,8,9]}) 

I'm trying to concatenate df1 df2 and df3 into one dataframe:

a b c1 c2 id1 W 1 4 id2 W 2 5 id3 W 3 6 id1 Q 7 NA id4 Q 8 NA id5 Q 9 NA 

I tried:

l = [d.set_index(['a','b']) for d in [df1,df2,df3]] pd.concat(l, axis=1) 

but the output isn't what I expected:

 c1 c2 c1 a b id1 W 1.0 4.0 NaN id2 W 2.0 5.0 NaN id3 W 3.0 6.0 NaN id1 Q NaN NaN 7.0 id4 Q NaN NaN 8.0 id5 Q NaN NaN 9.0 

5 Answers 5

1

You can join MultiIndex Series created by DataFrame.stack:

l = [d.set_index(['a','b']).stack() for d in [df1,df2,df3]] df = pd.concat(l).unstack().sort_index(level=[1,0], ascending=[False, True]) print (df) c1 c2 a b id1 W 1.0 4.0 id2 W 2.0 5.0 id3 W 3.0 6.0 id1 Q 7.0 NaN id4 Q 8.0 NaN id5 Q 9.0 NaN 

If there is only 3 columns DataFrames use DataFrame.squeeze or select first column by iloc[:, 0] for list of Series:

l = [d.set_index(['a','b']).squeeze() for d in [df1,df2,df3]] keys = [x.name for x in l] df = (pd.concat(l, axis=0, keys=keys) .unstack(0) .sort_index(level=[1,0], ascending=[False, True])) print (df) c1 c2 a b id1 W 1.0 4.0 id2 W 2.0 5.0 id3 W 3.0 6.0 id1 Q 7.0 NaN id4 Q 8.0 NaN id5 Q 9.0 NaN 

l = [d.set_index(['a','b']).iloc[:, 0] for d in [df1,df2,df3]] keys = [x.name for x in l] df = (pd.concat(l, axis=0, keys=keys) .unstack(0) .sort_index(level=[1,0], ascending=[False, True])) 

Another idea is chain multiple DataFrames in list by DataFrame.combine_first:

from functools import reduce dfs = [d.set_index(['a','b']) for d in [df1,df2,df3]] df = (reduce(lambda x, y: x.combine_first(y), dfs) .sort_index(level=[1,0], ascending=[False, True])) print (df) c1 c2 a b id1 W 1.0 4.0 id2 W 2.0 5.0 id3 W 3.0 6.0 id1 Q 7.0 NaN id4 Q 8.0 NaN id5 Q 9.0 NaN 
Sign up to request clarification or add additional context in comments.

Comments

1

First merge df1 and df2 by basing columns a and b;

df_try_1 = df1.merge(df2, on=["a","b"])

Then merge that with df3;

df_try_2 = pd.concat([df_try_1, df3], axis=0)

Result; enter image description here

Comments

0

try

a=df1.merge(df2[['a','c2']],on='a',how='left') l=a.append(df3) 

Comments

0

This should work in your case:

df = pd.merge(pd.merge(df1, df2, how='outer', on=['a', 'b']), df3, how='outer', on=['a', 'b']) df.set_index(['a', 'b'], inplace=True) df.columns = ['c1', 'c2', 'c3'] print(df) 

Result:

 c1 c2 c3 a b id1 W 1.0 4.0 NaN id2 W 2.0 5.0 NaN id3 W 3.0 6.0 NaN id1 Q NaN NaN 7.0 id4 Q NaN NaN 8.0 id5 Q NaN NaN 9.0 

Comments

0

I think merge is your best shot for that.

df = df1.combine_first(df2) pd.merge(df, df3, on=['a', 'b', 'c1'], how='outer') 

This results in expected output:

 a b c1 c2 0 id1 W 1 4.0 1 id2 W 2 5.0 2 id3 W 3 6.0 3 id1 Q 7 NaN 4 id4 Q 8 NaN 5 id5 Q 9 NaN 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.