1

I have the below table in excel (xls):

Quarter Q1 Q1 Q1 Year 2013 2014 2015 Company A A A PA 500 300 963 NY 200 150 1500 

What I would like to create is the below table by using Python Pandas:

Quarter Year State Company Amount Q1 2013 PA A 500 Q1 2014 PA A 300 Q1 2015 PA A 963 Q1 2013 NY A 200 Q1 2014 NY A 150 Q1 2015 NY A 1500 

Edit: I have used the below code per @jezrael suggestion:

import pandas as p df = p.read_excel('test.xls') df = df.set_index('0') df = df.T.stack().reset_index(name='idx') df.columns = ['Quarter', 'Year', 'Company', 'State', 'Amount'] df = df[['Quarter', 'Year', 'State', 'Company', 'Amount']] df = df.sort_values(by=['State'], ascending=['False']).reset_index(drop=True) print(df) 

I kept getting the below error:

File "pandas\hashtable.pyx", line 676, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12216) KeyError: '0' 

I have also tried with 0, the same error with KeyError: 0

1 Answer 1

1

Very close what you want:

df = pd.read_excel('test.xls', header=None, index_col=0) print df # 1 2 3 #0 #Quarter Q1 Q1 Q1 #Year 2013 2014 2015 #Company A A A #PA 500 300 963 #NY 200 150 1500 #set indexes df = df.T.set_index(['Quarter', 'Year', 'Company']) #stack and reset index df = df.stack().reset_index(name='idx') #rename and reorder columns df.columns = ['Quarter', 'Year', 'Company', 'State', 'Amount'] df = df[['Quarter', 'Year', 'State', 'Company', 'Amount']] #sort by column State df = df.sort_values(by=['State'], ascending=['False']).reset_index(drop=True) print df # Quarter Year State Company Amount #0 Q1 2013 NY A 200 #1 Q1 2014 NY A 150 #2 Q1 2015 NY A 1500 #3 Q1 2013 PA A 500 #4 Q1 2014 PA A 300 #5 Q1 2015 PA A 963 
Sign up to request clarification or add additional context in comments.

6 Comments

instead of df.set_index(0) you need df.set_index('0') at least at python3.4 and pandas 0.17.0
Hi both, I have tried the above code with both 0 and '0' since I have Python 3.5 and I kept getting KeyError: 0 and KeyError: '0' after running both codes. Any advice? Thank you again.
What are column names of input dataframe? You can check it: print df.columns
Here are the column names: Index(['Quarter', 'Q1', 'Q1.1', 'Q1.2'], dtype='object')
Solution was chnged. I add function read_excel to solution and remove problematic df.set_index(0)
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.