pandas combining dataframe

Question

import pandas as pd import numpy as np import matplotlib.pyplot as plt import pickle java = pickle.load(open('JavaSafe.p','rb')) ##import 2d array python = pickle.load(open('PythonSafe.p','rb')) ##import 2d array javaFrame = pd.DataFrame(java,columns=['Town','Java Jobs']) pythonFrame = pd.DataFrame(python,columns=['Town','Python Jobs']) javaFrame = javaFrame.sort_values(by='Java Jobs',ascending=False) pythonFrame = pythonFrame.sort_values(by='Python Jobs',ascending=False) print(javaFrame,"\n",pythonFrame)

This code comes out with the following:

 Town Java Jobs 435 York,NY 3593 212 NewYork,NY 3585 584 Seattle,WA 2080 624 Chicago,IL 1920 301 Boston,MA 1571 ... 79 Holland,MI 5 38 Manhattan,KS 5 497 Vernon,IL 5 30 Clayton,MO 5 90 Waukegan,IL 5 [653 rows x 2 columns] Town Python Jobs 160 NewYork,NY 2949 11 York,NY 2938 349 Seattle,WA 1321 91 Chicago,IL 1312 167 Boston,MA 1117 383 Hanover,NH 5 209 Bulverde,TX 5 203 Salisbury,NC 5 67 Rockford,IL 5 256 Ventura,CA 5 [416 rows x 2 columns]

I want to make a new dataframe that uses the town names as an index and has a column for each java and python. However, some of the towns will only have results for one of the languages.

you could also do given your original code result = pd.merge(pythonFrame, javeFrame, on='Town', how='outer').set_index('Town') — tipanverella
– tipanverella, Commented Jun 3, 2016 at 19:11

unutbu · Accepted Answer · 2016-06-04 00:51:26Z

import pandas as pd javaFrame = pd.DataFrame({'Java Jobs': [3593, 3585, 2080, 1920, 1571, 5, 5, 5, 5, 5], 'Town': ['York,NY', 'NewYork,NY', 'Seattle,WA', 'Chicago,IL', 'Boston,MA', 'Holland,MI', 'Manhattan,KS', 'Vernon,IL', 'Clayton,MO', 'Waukegan,IL']}, index=[435, 212, 584, 624, 301, 79, 38, 497, 30, 90]) pythonFrame = pd.DataFrame({'Python Jobs': [2949, 2938, 1321, 1312, 1117, 5, 5, 5, 5, 5], 'Town': ['NewYork,NY', 'York,NY', 'Seattle,WA', 'Chicago,IL', 'Boston,MA', 'Hanover,NH', 'Bulverde,TX', 'Salisbury,NC', 'Rockford,IL', 'Ventura,CA']}, index=[160, 11, 349, 91, 167, 383, 209, 203, 67, 256]) result = pd.merge(javaFrame, pythonFrame, how='outer').set_index('Town') # Java Jobs Python Jobs # Town # York,NY 3593.0 2938.0 # NewYork,NY 3585.0 2949.0 # Seattle,WA 2080.0 1321.0 # Chicago,IL 1920.0 1312.0 # Boston,MA 1571.0 1117.0 # Holland,MI 5.0 NaN # Manhattan,KS 5.0 NaN # Vernon,IL 5.0 NaN # Clayton,MO 5.0 NaN # Waukegan,IL 5.0 NaN # Hanover,NH NaN 5.0 # Bulverde,TX NaN 5.0 # Salisbury,NC NaN 5.0 # Rockford,IL NaN 5.0 # Ventura,CA NaN 5.0

pd.merge will by default join two DataFrames on all columns shared in common. In this case, javaFrame and pythonFrame share only the Town column in common. So by default pd.merge would join the two DataFrames on the Town column.

how='outer causes pd.merge to use the union of the keys from both frames. In other words it causes pd.merge to return rows whose data come from either javaFrame or pythonFrame even if only one DataFrame contains the Town. Missing data is fill with NaNs.

result = pd.merge(javaFrame, pythonFrame, how='outer').set_index('Town') is, I think, what they are expecting!

piRSquared · Accepted Answer · 2016-06-03 23:25:17Z

Use pd.concat

df = pd.concat([df.set_index('Town') for df in [javaFrame, pythonFrame]], axis=1) Java Jobs Python Jobs Boston,MA 1571.0 1117.0 Bulverde,TX NaN 5.0 Chicago,IL 1920.0 1312.0 Clayton,MO 5.0 NaN Hanover,NH NaN 5.0 Holland,MI 5.0 NaN Manhattan,KS 5.0 NaN NewYork,NY 3585.0 2949.0 Rockford,IL NaN 5.0 Salisbury,NC NaN 5.0 Seattle,WA 2080.0 1321.0 Ventura,CA NaN 5.0 Vernon,IL 5.0 NaN Waukegan,IL 5.0 NaN York,NY 3593.0 2938.0

Collectives™ on Stack Overflow

pandas combining dataframe

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related