4

I have one data frame df:

 fruit date volume 0 apple 20141001 2000 1 apple 20141101 1800 2 apple 20141201 2200 3 orange 20141001 1900 4 orange 20141101 2000 5 orange 20141201 3000 …. 

and I have following two data frames

apple:

 date price 0 20141001 2 1 20141101 2.5 2 20141201 3 

orange:

 date price 0 20141001 1.5 1 20141101 2 2 20141201 2 

how can I merge all these in to the following data frame:

 fruit date price volume 0 apple 20141001 2 2000 1 apple 20141101 2.5 1800 2 apple 20141201 3 2200 3 orange 20141001 1.5 1900 4 orange 20141101 2 2000 5 orange 20141201 2 3000 …. 

This is just a example, in my real work, I have hundreds of 'fruit' with price data need to be merged into the first data frame.

should I use merge or join? what is the difference between them? Thank you.

1
  • join uses merge underneath, as a rule of thumb if you are going to join on indices then use join if not then use merge Commented Feb 3, 2015 at 13:25

1 Answer 1

6

For your sample data you can achieve what you want by performing concat twice, this assumes that the last 2 dfs align with the master df. The inner concat concatenates the 2 supplemnentary dfs into a single df row-wise, the outer concat concatenates column-wise:

In [56]: # this concats the 2 supplementary dfs row-wise into a single df pd.concat([df1,df2], ignore_index=True) Out[56]: date price 0 20141001 2.0 1 20141101 2.5 2 20141201 3.0 3 20141001 1.5 4 20141101 2.0 5 20141201 2.0 In [54]: # now concat column-wise with the main df pd.concat([df,pd.concat([df1,df2], ignore_index=True)], axis=1) Out[54]: fruit date volume date price 0 apple 20141001 2000 20141001 2.0 1 apple 20141101 1800 20141101 2.5 2 apple 20141201 2200 20141201 3.0 3 orange 20141001 1900 20141001 1.5 4 orange 20141101 2000 20141101 2.0 5 orange 20141201 3000 20141201 2.0 

However, for your real data what you will need to do is to add the price column for each fruit:

In [55]: df[df['fruit'] == 'apple'].merge(df1, on='date') Out[55]: fruit date volume price 0 apple 20141001 2000 2.0 1 apple 20141101 1800 2.5 2 apple 20141201 2200 3.0 

and repeat again for each fruit

An approach to your real data problem would be to add a 'fruit' column to each supplemental df, concatenate all these and then merge back using 'fruit' and 'date' columns as the keys:

In [57]: df1['fruit'] = 'apple' df2['fruit'] = 'orange' fruit_df = pd.concat([df1,df2], ignore_index=True) fruit_df Out[57]: date price fruit 0 20141001 2.0 apple 1 20141101 2.5 apple 2 20141201 3.0 apple 3 20141001 1.5 orange 4 20141101 2.0 orange 5 20141201 2.0 orange In [58]: df.merge(fruit_df, on=['fruit', 'date']) Out[58]: fruit date volume price 0 apple 20141001 2000 2.0 1 apple 20141101 1800 2.5 2 apple 20141201 2200 3.0 3 orange 20141001 1900 1.5 4 orange 20141101 2000 2.0 5 orange 20141201 3000 2.0 
Sign up to request clarification or add additional context in comments.

2 Comments

Is it possible to use groupby?
@user6396 you could groupby the fruit and date and then set the price, your problem though is that if you have lots of dfs with no identifier in them (i.e. a fruit column) then you can't merge them back without setting all the rows that match the dates with the same price, What you could do is add the fruit column for each supplementary df, concatenate all these supplementary dfs and then you can merge them back using the fruit column and date column, that would be how I would approach this

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.