1

I have three dataframes

Dataframe df1:

date A 0 2022-04-11 1 1 2022-04-12 2 2 2022-04-14 26 3 2022-04-16 2 4 2022-04-17 1 5 2022-04-20 17 6 2022-04-21 14 7 2022-04-22 1 8 2022-04-23 9 9 2022-04-24 1 10 2022-04-25 5 11 2022-04-26 2 12 2022-04-27 21 13 2022-04-28 9 14 2022-04-29 17 15 2022-04-30 5 16 2022-05-01 8 17 2022-05-07 1241217 18 2022-05-08 211 19 2022-05-09 1002521 20 2022-05-10 488739 21 2022-05-11 12925 22 2022-05-12 57 23 2022-05-13 8515098 24 2022-05-14 1134576 

Dateframe df2:

date B 0 2022-04-12 8 1 2022-04-14 7 2 2022-04-16 2 3 2022-04-19 2 4 2022-04-23 2 5 2022-05-07 2 6 2022-05-08 5 7 2022-05-09 2 8 2022-05-14 1 

Dataframe df3:

date C 0 2022-04-12 6 1 2022-04-13 1 2 2022-04-14 2 3 2022-04-20 3 4 2022-04-21 9 5 2022-04-22 25 6 2022-04-23 56 7 2022-04-24 49 8 2022-04-25 68 9 2022-04-26 71 10 2022-04-27 40 11 2022-04-28 44 12 2022-04-29 27 13 2022-04-30 34 14 2022-05-01 28 15 2022-05-07 9 16 2022-05-08 20 17 2022-05-09 24 18 2022-05-10 21 19 2022-05-11 8 20 2022-05-12 8 21 2022-05-13 14 22 2022-05-14 25 23 2022-05-15 43 24 2022-05-16 36 25 2022-05-17 29 26 2022-05-18 28 27 2022-05-19 17 28 2022-05-20 6 

I would like to merge df1, df2, df3 in a single dataframe with columns date, A, B, C, in such a way that date contains all dates which appeared in df1 and/or df2 and/or df3 (without repetition), and if a particular date was not in any of the dataframes, then for the respective column I put value 0.0. So, I would like to have something like that:

date A B C 0 2022-04-11 1.0 0.0 0.0 1 2022-08-12 2.0 8.0 6.0 2 2022-08-13 0.0 0.0 1.0 ... 

I tried to use this method

merge1 = pd.merge(df1, df2, how='outer') sorted_merge1 = merge1.sort_values(by=['date'], ascending=False) full_merge = pd.merge(sorted_merg1, df3, how='outer') 

However, it seems it skips the dates which are not common for all three dataframes.

4
  • merge on is missing? Commented Aug 5, 2022 at 9:52
  • 1
    Does this answer your question? Merge multiple DataFrames Pandas Commented Aug 5, 2022 at 9:57
  • please try adding on=['date'] in the parameters of both merges. Commented Aug 5, 2022 at 9:58
  • Refactoring the accepted answer from the suggested duplicate (Merge multiple DataFrames Pandas): dfs = [df.set_index(['date']) for df in [df1, df2, df3]] and then pd.concat(dfs,axis=1).reset_index().fillna(0). You could of course also create a df.merge() chain (as here, but that won't scale. Commented Aug 5, 2022 at 9:58

1 Answer 1

1

Try this,

print(pd.merge(df1, df2, on='date', how='outer').merge(df3, on='date', how='outer').fillna(0)) 

O/P:

 date A B C 0 2022-04-11 1.0 0.0 0.0 1 2022-04-12 2.0 8.0 6.0 2 2022-04-14 26.0 7.0 2.0 3 2022-04-16 2.0 2.0 0.0 4 2022-04-17 1.0 0.0 0.0 5 2022-04-20 17.0 0.0 3.0 6 2022-04-21 14.0 0.0 9.0 7 2022-04-22 1.0 0.0 25.0 8 2022-04-23 9.0 2.0 56.0 9 2022-04-24 1.0 0.0 49.0 10 2022-04-25 5.0 0.0 68.0 11 2022-04-26 2.0 0.0 71.0 12 2022-04-27 21.0 0.0 40.0 13 2022-04-28 9.0 0.0 44.0 14 2022-04-29 17.0 0.0 27.0 15 2022-04-30 5.0 0.0 34.0 16 2022-05-01 8.0 0.0 28.0 17 2022-05-07 1241217.0 2.0 9.0 18 2022-05-08 211.0 5.0 20.0 19 2022-05-09 1002521.0 2.0 24.0 20 2022-05-10 488739.0 0.0 21.0 21 2022-05-11 12925.0 0.0 8.0 22 2022-05-12 57.0 0.0 8.0 23 2022-05-13 8515098.0 0.0 14.0 24 2022-05-14 1134576.0 1.0 25.0 25 2022-04-19 0.0 2.0 0.0 26 2022-04-13 0.0 0.0 1.0 27 2022-05-15 0.0 0.0 43.0 28 2022-05-16 0.0 0.0 36.0 29 2022-05-17 0.0 0.0 29.0 30 2022-05-18 0.0 0.0 28.0 31 2022-05-19 0.0 0.0 17.0 32 2022-05-20 0.0 0.0 6.0 ​ 
  • perform merge chain and fill NaN with 0
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.