1

How to transpose the following PySpark dataframe?

Following is the pyspark dataframe.

+----+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------------+--------+------+ |srab|srsbtp|avgm1|avgm2|avgm3|avgm4|avgm4|avgm6|avgm7|avgm8|avgm9| avgm10| avgm11|avgm12| +----+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------------+--------+------+ |2389| D| null| null| null| null| null| null| null| null| null| null| null| null| |2389| C| null| null| null| null| null| null| null| null| null|54674.1935483871|156820.0| null| +----+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------------+--------+------+ 

I want to convert above dataframe into the following table

Desired Output:

srab month D C 2389 avgm1 null null 2389 avgm2 null null 2389 avgm3 null null 2389 avgm4 null null 2389 avgm5 null null 2389 avgm6 null null 2389 avgm7 null null 2389 avgm8 null null 2389 avgm9 null null 2389 avgm10 null 54674.19355 2389 avgm11 null 156820 2389 avgm12 null null 

2 Answers 2

1

In Spark SQL, you can unpivot/pivot with union all and conditional aggregation:

select srab, month, max(case when srsbtp = 'D' then avgm1 end) as d, max(case when srsbtp = 'C' then avgm1 end) as c from ( select srab, srsbtp, 'avgm1' as month, avgm1 from mytable union all srab, srsbtp, 'avgm2', avgm2 from mytable union all srab, srsbtp, 'avgm3', avgm3 from mytable ... ) t gorup by srab, month 
Sign up to request clarification or add additional context in comments.

Comments

0

First we can stack the avgm columns to rows, then we can pivot the srsbtp rows to columns.

df.createOrReplaceTempView('table') col_list = ' '.join([f"'{'avgm'+str(i+1)}', {'avgm'+str(i+1)}," for i in range(12)])[:-1] ## col_list is a string ## "'avgm1', avgm1, 'avgm2', avgm2, 'avgm3', avgm3, 'avgm4', avgm4, 'avgm5', avgm5, 'avgm6', avgm6, 'avgm7', avgm7, 'avgm8', avgm8, 'avgm9', avgm9, 'avgm10', avgm10, 'avgm11', avgm11, 'avgm12', avgm12" result = spark.sql(f"select srab, srsbtp, stack(12, {col_list}) as (month, value) from table") \ .groupBy('srab', 'month') \ .pivot('srsbtp') \ .agg(F.sum('value')) \ .orderBy('month') result.show() +----+------+----------------+----+ |srab| month| C| D| +----+------+----------------+----+ |2389| avgm1| null|null| |2389|avgm10|54674.1935483871|null| |2389|avgm11| 156820.0|null| |2389|avgm12| null|null| |2389| avgm2| null|null| |2389| avgm3| null|null| |2389| avgm4| null|null| |2389| avgm5| null|null| |2389| avgm6| null|null| |2389| avgm7| null|null| |2389| avgm8| null|null| |2389| avgm9| null|null| +----+------+----------------+----+ 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.