2

I have a Pandas dataframe which looks as follows:

Age USA USA USA UK UK UK
Age male female total male female total
2-year-old 2 3 5 1 1 2
3-year-old 8 8 16 7 9 16

In fact I have two row headers (USA + male; USA + female;..)

CSV-File (test.csv):

;USA;USA;USA;UK;UK;UK Age;male;female;total;male;female;total 2-year-old;2;3;5;1;1;2 3-year-old;8;8;16;7;9;16 

My python-code:

df = pd.read_csv('test.csv', delimiter=";", header=[0,1]) df = df.rename(columns={'Unnamed: 0_level_0': 'Age'}) 

How to pivot pandas dataframe to get the following result?

Age Country Gender frequency
2-year-old USA male 2
2-year-old USA female 3
2-year-old UK male 1
2-year-old UK female 1
3-year-old USA male 8
3-year-old USA female 8
3-year-old UK male 7
3-year-old UK female 9

EDIT:

Starting Table:

Kode Country Procedure male male female female
Kode Country Procedure two-year-old three-year-old two-year-old three-year-old
1a US proc_1 4 6 3 6
1a UK proc_1 2 3 5 1
1b US proc_2 15 3 5 2
1b UK proc_2 8 4 7 3

CSV:

Code;Country;Procedure;male;male;female;female Code;Country;Procedure;two-year-old;three-year-old;two-year-old;three-year-old 1a;US;proc_1;4;6;3;6 1a;UK;proc_1;2;3;5;1 1b;US;proc_2;15;3;5;2 1b;UK;proc_2;8;4;7;3 

Resulting Table:

Code Country Procedure Gender Age Frequency
1a US proc_1 male two-year-old 4
1a US proc_1 male three-year-old 6
1a US proc_1 female two-year-old 3
1a US proc_1 female three-year-old 6
1a UK proc_1 male two-year-old 2
1a UK proc_1 male three-year-old 3
1a UK proc_1 female two-year-old 5
1a UK proc_1 female three-year-old 1
1b ...

1 Answer 1

1

Use DataFrame.set_index with DataFrame.stack, also if need remove total add drop:

df = (df.drop('total', axis=1, level=1) .set_index(df.columns[0]) .stack([0,1]) .rename_axis(['Age','Country','Gender']) .reset_index(name='frequency')) print (df) Age Country Gender frequency 0 2-year-old UK female 1 1 2-year-old UK male 1 2 2-year-old USA female 3 3 2-year-old USA male 2 4 3-year-old UK female 9 5 3-year-old UK male 7 6 3-year-old USA female 8 7 3-year-old USA male 8 

Or:

df = (df.set_index(df.columns[0]) .stack([0,1]) .rename_axis(['Age','Country','Gender']) .reset_index(name='frequency')) print (df) Age Country Gender frequency 0 2-year-old UK female 1 1 2-year-old UK male 1 2 2-year-old UK total 2 3 2-year-old USA female 3 4 2-year-old USA male 2 5 2-year-old USA total 5 6 3-year-old UK female 9 7 3-year-old UK male 7 8 3-year-old UK total 16 9 3-year-old USA female 8 10 3-year-old USA male 8 11 3-year-old USA total 16 
Sign up to request clarification or add additional context in comments.

3 Comments

Hello jezreal. Could you pls have a look at my edited question. How would I get the required output in this case? Thanks.
@dublin123 - I will be offline now, can you post new question?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.