How can I use pyspark or pandas to achieve the below transformation? Thanks a lot. Source File is csv with following info:
expected:
How can I use pyspark or pandas to achieve the below transformation? Thanks a lot. Source File is csv with following info:
expected:
I assume that your source CSV file is as follows:
,Group_1,Group_2 ,Sub_Group_1,Sub_Group_1 Maturity,7/31/20,7/31/20 0.1,0.2,0.3 0.2,0.3,0.4 Note that:
I read them calling df = pd.read_csv('Input.csv', header=[0,1,2]) and the content is:
Group_1 Group_2 Sub_Group_1 Sub_Group_1 Maturity 7/31/20 7/31/20 0 0.1 0.2 0.3 1 0.2 0.3 0.4 (you should have included it instead of a picture).
The first step is melt:
result = pd.melt(df, id_vars=[(' ', ' ', 'Maturity')], var_name=['Group_Name', 'Sub_Group_Name', 'Date']) Note that in the above code I specified id_vars as a tuple, with 2 top elements as spaces (this is why I made the assumption about initial spaces).
The result is:
( , , Maturity) Group_Name Sub_Group_Name Date value 0 0.1 Group_1 Sub_Group_1 7/31/20 0.2 1 0.2 Group_1 Sub_Group_1 7/31/20 0.3 2 0.1 Group_2 Sub_Group_1 7/31/20 0.3 3 0.2 Group_2 Sub_Group_1 7/31/20 0.4 And the only thing to do is to rename the first column:
result.rename(columns={result.columns[0]: 'Maturity'}, inplace=True) Now, when you print the result, you will get:
Maturity Group_Name Sub_Group_Name Date value 0 0.1 Group_1 Sub_Group_1 7/31/20 0.2 1 0.2 Group_1 Sub_Group_1 7/31/20 0.3 2 0.1 Group_2 Sub_Group_1 7/31/20 0.3 3 0.2 Group_2 Sub_Group_1 7/31/20 0.4 The row ordering is a bit different, but this is the way melt works. If you are unhappy about this detail, reorder rows according to your needs.