I think you need set_index + unstack:
df = df.set_index(['COMMIT_ID','COMMITTER','FILE_NAME'])['CHANGE TYPE'] .unstack() .reset_index() print (df) FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json 0 1 A None None MODIFY 1 2 B DELETE ADD None
Solutions with pivot_table - need aggregate function like sum (concatenate strings without separator) or '_'.join (concatenate strings with separator) if duplicates:
print (df) COMMIT_ID FILE_NAME COMMITTER CHANGE TYPE 0 1 package.json A MODIFY 1 2 main.js B ADD 2 2 class.java B DELETE 3 2 class.java B ADD df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], columns='FILE_NAME', values='CHANGE TYPE', aggfunc='sum').reset_index() print (df) FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json 0 1 A None None MODIFY 1 2 B DELETEADD ADD None
Or:
df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], columns='FILE_NAME', values='CHANGE TYPE', aggfunc='_'.join).reset_index() print (df) FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json 0 1 A None None MODIFY 1 2 B DELETE_ADD ADD None
Aggregate with first works also, but you can lost duplicates values:
df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], columns='FILE_NAME', values='CHANGE TYPE', aggfunc='first').reset_index() print (df) FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json 0 1 A None None MODIFY 1 2 B DELETE ADD None
Last for rename columns names add rename_axis:
df = df.rename_axis(None, axis=1) print (df) COMMIT_ID COMMITTER class.java main.js package.json 0 1 A None None MODIFY 1 2 B DELETEADD ADD None