I have three dataframes I would like where I want to merge or join them based on the month column/field, then group by title.
df1:
Month Year TotalNumberofStreams TitleSortName 9 2018 1529 Movie A 9 2018 368 Movie B 1 2018 703 Movie C 1 2018 2278 Movie D 1 2018 382 Movie E df2:
Month Year video_view TitleSortName 9 2018 3 Movie A 9 2018 6 Movie B 3 2017 9 Movie C 3 2017 4 Movie D 3 2017 3 Movie E df3:
Month Year Views TitleSortName 9 2018 243 Movie A 9 2018 156 Movie B 9 2018 133 Movie C Desired Output:
Month Year Views video_view views TotalNumberofStreams TitleSortName 9 2018 NaN NaN NaN 1529 Movie A 9 2018 NaN 3 NaN NaN Movie A 9 2018 243 NaN NaN NaN Movie A Attempts:
I tried merging based on TitleSortName, with this code here:
merge=df1.merge(df2, how='outer',left_on='TitleSortName',right_on='TitleSortName') however this returns duplicates, and a lot of data that makes me do even more cleaning.
I also attempted to join based on month:
join_df = df1.join(df2.set_index('Month'),on='Month') this returns Value Error: Pandas join issue: columns overlap but no suffix specified
Im looking through different articles online, and I see maybe I can use a for loop to iterate through the month column and save the rows to a list that are alike and return a the rows how I desire, as well as lambda join functions, for example a:
lambda x: "/" .join(x), based on the desired columns is there an easier way to do this or any way to achieve the result i want at all?
from functools import reduce df = reduce(lambda left,right: pd.merge(left,right,on=['Month','Year','TitleSortName']), [df1, df2, df3])? If yes, then it is dupeNaNand why only Movie A 3 times?