1

How can I use pyspark or pandas to achieve the below transformation? Thanks a lot. Source File is csv with following info:

enter image description here

expected:

enter image description here

3
  • can you add a code to create your sample data and what you have tried until now and where you are stuck? Commented Aug 3, 2020 at 3:27
  • the source file is csv file. Commented Aug 3, 2020 at 3:28
  • 1
    Does this answer your question? Unpivot in spark-sql/pyspark Commented Aug 3, 2020 at 3:29

1 Answer 1

1

I assume that your source CSV file is as follows:

 ,Group_1,Group_2 ,Sub_Group_1,Sub_Group_1 Maturity,7/31/20,7/31/20 0.1,0.2,0.3 0.2,0.3,0.4 

Note that:

  • there are initial spaces in first 2 rows (names of 2 top MultiIndex levels for the first column), otherwise read_csv would give them Unnamed... default names,
  • there are actually 3 MultiIndex levels on columns (the third level contains dates).

I read them calling df = pd.read_csv('Input.csv', header=[0,1,2]) and the content is:

 Group_1 Group_2 Sub_Group_1 Sub_Group_1 Maturity 7/31/20 7/31/20 0 0.1 0.2 0.3 1 0.2 0.3 0.4 

(you should have included it instead of a picture).

The first step is melt:

result = pd.melt(df, id_vars=[(' ', ' ', 'Maturity')], var_name=['Group_Name', 'Sub_Group_Name', 'Date']) 

Note that in the above code I specified id_vars as a tuple, with 2 top elements as spaces (this is why I made the assumption about initial spaces).

The result is:

 ( , , Maturity) Group_Name Sub_Group_Name Date value 0 0.1 Group_1 Sub_Group_1 7/31/20 0.2 1 0.2 Group_1 Sub_Group_1 7/31/20 0.3 2 0.1 Group_2 Sub_Group_1 7/31/20 0.3 3 0.2 Group_2 Sub_Group_1 7/31/20 0.4 

And the only thing to do is to rename the first column:

result.rename(columns={result.columns[0]: 'Maturity'}, inplace=True) 

Now, when you print the result, you will get:

 Maturity Group_Name Sub_Group_Name Date value 0 0.1 Group_1 Sub_Group_1 7/31/20 0.2 1 0.2 Group_1 Sub_Group_1 7/31/20 0.3 2 0.1 Group_2 Sub_Group_1 7/31/20 0.3 3 0.2 Group_2 Sub_Group_1 7/31/20 0.4 

The row ordering is a bit different, but this is the way melt works. If you are unhappy about this detail, reorder rows according to your needs.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.