Python: Pandas merge three dataframes on date, keeping all dates [duplicate]

Question

I have three dataframes

Dataframe df1:

date A 0 2022-04-11 1 1 2022-04-12 2 2 2022-04-14 26 3 2022-04-16 2 4 2022-04-17 1 5 2022-04-20 17 6 2022-04-21 14 7 2022-04-22 1 8 2022-04-23 9 9 2022-04-24 1 10 2022-04-25 5 11 2022-04-26 2 12 2022-04-27 21 13 2022-04-28 9 14 2022-04-29 17 15 2022-04-30 5 16 2022-05-01 8 17 2022-05-07 1241217 18 2022-05-08 211 19 2022-05-09 1002521 20 2022-05-10 488739 21 2022-05-11 12925 22 2022-05-12 57 23 2022-05-13 8515098 24 2022-05-14 1134576

Dateframe df2:

date B 0 2022-04-12 8 1 2022-04-14 7 2 2022-04-16 2 3 2022-04-19 2 4 2022-04-23 2 5 2022-05-07 2 6 2022-05-08 5 7 2022-05-09 2 8 2022-05-14 1

Dataframe df3:

date C 0 2022-04-12 6 1 2022-04-13 1 2 2022-04-14 2 3 2022-04-20 3 4 2022-04-21 9 5 2022-04-22 25 6 2022-04-23 56 7 2022-04-24 49 8 2022-04-25 68 9 2022-04-26 71 10 2022-04-27 40 11 2022-04-28 44 12 2022-04-29 27 13 2022-04-30 34 14 2022-05-01 28 15 2022-05-07 9 16 2022-05-08 20 17 2022-05-09 24 18 2022-05-10 21 19 2022-05-11 8 20 2022-05-12 8 21 2022-05-13 14 22 2022-05-14 25 23 2022-05-15 43 24 2022-05-16 36 25 2022-05-17 29 26 2022-05-18 28 27 2022-05-19 17 28 2022-05-20 6

I would like to merge df1, df2, df3 in a single dataframe with columns date, A, B, C, in such a way that date contains all dates which appeared in df1 and/or df2 and/or df3 (without repetition), and if a particular date was not in any of the dataframes, then for the respective column I put value 0.0. So, I would like to have something like that:

date A B C 0 2022-04-11 1.0 0.0 0.0 1 2022-08-12 2.0 8.0 6.0 2 2022-08-13 0.0 0.0 1.0 ...

I tried to use this method

merge1 = pd.merge(df1, df2, how='outer') sorted_merge1 = merge1.sort_values(by=['date'], ascending=False) full_merge = pd.merge(sorted_merg1, df3, how='outer')

However, it seems it skips the dates which are not common for all three dataframes.

Does this answer your question? Merge multiple DataFrames Pandas — ouroboros1
– ouroboros1, Commented Aug 5, 2022 at 9:57
please try adding on=['date'] in the parameters of both merges. — Triki Sadok
– Triki Sadok, Commented Aug 5, 2022 at 9:58
Refactoring the accepted answer from the suggested duplicate (Merge multiple DataFrames Pandas): dfs = [df.set_index(['date']) for df in [df1, df2, df3]] and then pd.concat(dfs,axis=1).reset_index().fillna(0). You could of course also create a df.merge() chain (as here, but that won't scale. — ouroboros1
– ouroboros1, Commented Aug 5, 2022 at 9:58

Mohamed Thasin ah · Accepted Answer · 2022-08-05 09:56:26Z

Try this,

print(pd.merge(df1, df2, on='date', how='outer').merge(df3, on='date', how='outer').fillna(0))

O/P:

 date A B C 0 2022-04-11 1.0 0.0 0.0 1 2022-04-12 2.0 8.0 6.0 2 2022-04-14 26.0 7.0 2.0 3 2022-04-16 2.0 2.0 0.0 4 2022-04-17 1.0 0.0 0.0 5 2022-04-20 17.0 0.0 3.0 6 2022-04-21 14.0 0.0 9.0 7 2022-04-22 1.0 0.0 25.0 8 2022-04-23 9.0 2.0 56.0 9 2022-04-24 1.0 0.0 49.0 10 2022-04-25 5.0 0.0 68.0 11 2022-04-26 2.0 0.0 71.0 12 2022-04-27 21.0 0.0 40.0 13 2022-04-28 9.0 0.0 44.0 14 2022-04-29 17.0 0.0 27.0 15 2022-04-30 5.0 0.0 34.0 16 2022-05-01 8.0 0.0 28.0 17 2022-05-07 1241217.0 2.0 9.0 18 2022-05-08 211.0 5.0 20.0 19 2022-05-09 1002521.0 2.0 24.0 20 2022-05-10 488739.0 0.0 21.0 21 2022-05-11 12925.0 0.0 8.0 22 2022-05-12 57.0 0.0 8.0 23 2022-05-13 8515098.0 0.0 14.0 24 2022-05-14 1134576.0 1.0 25.0 25 2022-04-19 0.0 2.0 0.0 26 2022-04-13 0.0 0.0 1.0 27 2022-05-15 0.0 0.0 43.0 28 2022-05-16 0.0 0.0 36.0 29 2022-05-17 0.0 0.0 29.0 30 2022-05-18 0.0 0.0 28.0 31 2022-05-19 0.0 0.0 17.0 32 2022-05-20 0.0 0.0 6.0

perform merge chain and fill NaN with 0

Collectives™ on Stack Overflow

Python: Pandas merge three dataframes on date, keeping all dates [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related