0

I have three dataframes I would like where I want to merge or join them based on the month column/field, then group by title.

df1:

Month Year TotalNumberofStreams TitleSortName 9 2018 1529 Movie A 9 2018 368 Movie B 1 2018 703 Movie C 1 2018 2278 Movie D 1 2018 382 Movie E 

df2:

Month Year video_view TitleSortName 9 2018 3 Movie A 9 2018 6 Movie B 3 2017 9 Movie C 3 2017 4 Movie D 3 2017 3 Movie E 

df3:

 Month Year Views TitleSortName 9 2018 243 Movie A 9 2018 156 Movie B 9 2018 133 Movie C 

Desired Output:

Month Year Views video_view views TotalNumberofStreams TitleSortName 9 2018 NaN NaN NaN 1529 Movie A 9 2018 NaN 3 NaN NaN Movie A 9 2018 243 NaN NaN NaN Movie A 

Attempts:

I tried merging based on TitleSortName, with this code here:

merge=df1.merge(df2, how='outer',left_on='TitleSortName',right_on='TitleSortName') 

however this returns duplicates, and a lot of data that makes me do even more cleaning.

I also attempted to join based on month:

join_df = df1.join(df2.set_index('Month'),on='Month') 

this returns Value Error: Pandas join issue: columns overlap but no suffix specified

Im looking through different articles online, and I see maybe I can use a for loop to iterate through the month column and save the rows to a list that are alike and return a the rows how I desire, as well as lambda join functions, for example a:

lambda x: "/" .join(x), based on the desired columns 

is there an easier way to do this or any way to achieve the result i want at all?

2
  • Do you need from functools import reduce df = reduce(lambda left,right: pd.merge(left,right,on=['Month','Year','TitleSortName']), [df1, df2, df3]) ? If yes, then it is dupe Commented Sep 18, 2019 at 4:59
  • 1
    Your desired output makes no sense. Why are there so many NaN and why only Movie A 3 times? Commented Sep 18, 2019 at 5:02

1 Answer 1

1

Your group-by doesnt make sense. But for the merge, you can do this.

 df1 = pd.DataFrame(np.array([ [9, 2018, 1529,'A'], [9,2018, 368, 'B'], [1,2018, 703, 'C'], [1,2018,2278,'D']]), columns=['Month', 'Year', 'TotalNumberOfStreams','Title']) df2 = pd.DataFrame(np.array([ [9,2018, 3, 'A'], [9,2018, 6, 'B'], [3,2017,5, 'C']]), columns=['Month', 'Year', 'Video Views','Title']) df3 = pd.DataFrame(np.array([ [9,2018,243,'A'], [9,2018,156,'B']]), columns=['Month', 'Year', 'Total Views','Title']) merged_df=df1.merge(df2,on=['Month','Year','Title']).merge(df3,on=['Month','Year','Title']) merged_df Out[32]: Month Year TotalNumberOfStreams Title Video Views Total Views 0 9 2018 1529 A 3 243 1 9 2018 368 B 6 156 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.