I have multiple data frames that I would like to concatenate. Some of these do not have certain columns so should be filled with NA.
df1_1 = pd.DataFrame({'id':[1,1,2,2,3,3], 'age':[22,22,55,55,53,53], 'group':1,'y':[1,2,3,4,5,6]}) df1_2 = pd.DataFrame({'id':[1,1,2,2,3,3], 'age':[22,22,55,55,53,53], 'group':1,'w':[7,8,9,10,11,12]}) df2 = pd.DataFrame({'id':[4,4,5,5], 'age':[39,39,54,54], 'group':2,'y':[1,2,3,4]}) df2_2 = pd.DataFrame({'id':[4,4,5,5], 'age':[39,39,54,54], 'group':2,'w':[5,6,7,8]}) df3 = pd.DataFrame({'id':[1,1,6,6,7,7,8,8], 'age':[23,23,63,63,43,43,25,25],'group':3,'w':[1,2,3,4,5,6,7,8]}) Desired output:
id age group y w 1 22 1 1 7 1 22 1 2 8 2 55 1 3 9 2 55 1 4 10 3 53 1 5 11 3 53 1 6 12 4 39 2 1 5 4 39 2 2 6 5 54 2 3 7 5 54 2 4 8 1 23 3 NA 1 1 23 3 NA 2 6 63 3 NA 3 6 63 3 NA 4 7 43 3 NA 5 7 43 3 NA 6 8 25 3 NA 7 8 25 3 NA 8 I tried
from functools import reduce dfs = [df1_1,df1_2,df2_1,df2_2,df3] df_merged = reduce(lambda left,right: pd.merge(left,right,on=['id','group','age'], how='outer'), dfs) df_merged = pd.concat(dfs, join='outer', axis=0) but none of my attempts worked