0

I have a pyspark dataframe in the below format

enter image description hereI

I am tring to create a pyspark dataframe in the below format

enter image description here

I tried doing this by splitting the source dataframe into three by filtering on order value (1,2,3 these are present for the every key) and joined them using a left join using key and was able to get the dataframe in required format.

But is there any better way to solve this in pyspark?

Thanks in advance!

1
  • this might help Commented Dec 28, 2022 at 10:13

1 Answer 1

1

Your ORDER1, ORDER2 and ORDER3 columns are a bit redundant. If you can live without them, you can combine a groupBy and pivot like so:

from pyspark.sql.functions import collect_list df = spark.createDataFrame( [ (1, 1, "ABC"), (1, 2, "BCD"), (1, 3, "DEF"), (2, 1, "EFG"), (2, 2, "GHI"), (2, 3, "HIJ"), (3, 1, "EFG"), (3, 2, "BCD"), (3, 3, "HIJ"), ], ["KEY", "ORDER", "RESP"] ) output = df.groupBy("KEY").pivot("ORDER").agg(collect_list("RESP")) output.show() +---+-----+-----+-----+ |KEY| 1| 2| 3| +---+-----+-----+-----+ | 1|[ABC]|[BCD]|[DEF]| | 2|[EFG]|[GHI]|[HIJ]| | 3|[EFG]|[BCD]|[HIJ]| +---+-----+-----+-----+ 

I'm using collect_list to make sure that if you have more than 1 RESP value for specific combination of KEY and ORDER, that you don't lose it but it just gets put inside of an array. So all the values that you're seeing in this dataframe are arrays (with 1 element for your example).

If you're sure you only have 1 value for a combination of KEY and ORDER and you don't want arrays as values, you can also use first() instead of collect_list():

from pyspark.sql.functions import first ... output = df.groupBy("KEY").pivot("ORDER").agg(first("RESP")) output.show() +---+---+---+---+ |KEY| 1| 2| 3| +---+---+---+---+ | 1|ABC|BCD|DEF| | 2|EFG|GHI|HIJ| | 3|EFG|BCD|HIJ| +---+---+---+---+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.