1

I have a pandas dataframe like this:

COMMIT_ID | FILE_NAME | COMMITTER | CHANGE TYPE ------------------------------------------------------------- 1 | package.json | A | MODIFY 2 | main.js | B | ADD 2 | class.java | B | DELETE 

I want the row values of the file name as column headers and the changetype as the value.

COMMIT_ID | package.json | main.js | class.java | COMMITTER ----------------------------------------------------------------------------- 1 | MODIFY | NONE | NONE | A 2 | NONE | ADD | DELETE | B 

I have tried with pandas.pivot_table but wasn't very successful. Any chance to do this easily?

1 Answer 1

3

I think you need set_index + unstack:

df = df.set_index(['COMMIT_ID','COMMITTER','FILE_NAME'])['CHANGE TYPE'] .unstack() .reset_index() print (df) FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json 0 1 A None None MODIFY 1 2 B DELETE ADD None 

Solutions with pivot_table - need aggregate function like sum (concatenate strings without separator) or '_'.join (concatenate strings with separator) if duplicates:

print (df) COMMIT_ID FILE_NAME COMMITTER CHANGE TYPE 0 1 package.json A MODIFY 1 2 main.js B ADD 2 2 class.java B DELETE 3 2 class.java B ADD df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], columns='FILE_NAME', values='CHANGE TYPE', aggfunc='sum').reset_index() print (df) FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json 0 1 A None None MODIFY 1 2 B DELETEADD ADD None 

Or:

df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], columns='FILE_NAME', values='CHANGE TYPE', aggfunc='_'.join).reset_index() print (df) FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json 0 1 A None None MODIFY 1 2 B DELETE_ADD ADD None 

Aggregate with first works also, but you can lost duplicates values:

df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], columns='FILE_NAME', values='CHANGE TYPE', aggfunc='first').reset_index() print (df) FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json 0 1 A None None MODIFY 1 2 B DELETE ADD None 

Last for rename columns names add rename_axis:

df = df.rename_axis(None, axis=1) print (df) COMMIT_ID COMMITTER class.java main.js package.json 0 1 A None None MODIFY 1 2 B DELETEADD ADD None 
Sign up to request clarification or add additional context in comments.

1 Comment

Seriously doubt you are a Pandas Robot @jezrael.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.