10

I am trying to join to data frames. They look like this

DF1 = ID COUNTRY YEAR V1 V2 V3 V4 12 USA 2012 x y z a 13 USA 2013 x y z a 14 RUSSIA 2012 x y z a DF2 = ID COUNTRY YEAR TRACT 9 USA 2000 A 13 USA 2013 B 

The desired end goal is:

DF3 = ID COUNTRY YEAR V1 V2 V3 V4 TRACT 9 USA 2000 A 12 USA 2012 x y z a 13 USA 2013 x y z a B 14 RUSSIA 2012 x y z a 

I've been trying to use the pd.merge and the .join function with the on='outer' setting to no success

df3 = pd.merge(df1,df2,how='outer',left_on=['ID','Country','Year'],right_on=['ID',"Country","Year"]) 
1
  • Other than your ID column, wharf you have should work. What is your merge giving you? Commented Feb 21, 2015 at 11:46

2 Answers 2

15

try this:

df.merge(df2,how='outer',left_on=['ID','COUNTRY','YEAR'],right_on=['ID',"COUNTRY","YEAR"]) 

(the column names should be in caps based on your input tables)

Sign up to request clarification or add additional context in comments.

Comments

2

Have you tried

df1.join(df2) 

You can add parameters later, but it should work.

1 Comment

I had a similar problem that turned out to be pandas not correctly typing the index field. In read_csv, I set the dtype of the index field to str type, but forgot to set the engine='c', so it didn't work. Ram immediately maxed out and the machine locked until throwing a memory error several hours later. Run time after fixing was about 2 minutes including writing a 60Mb file. Pandas should at least throw a warning, but doesn't.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.