0

i have a requirement to use pivot on below table.

id,week,score 102,1,96 101,1,138 102,1,37 101,1,59 101,2,282 102,2,212 102,2,78 101,2,97 102,3,60 102,3,123 101,3,220 101,3,87 

output

 1 2 3 101 138,59 282,97 220,87 102 96,37 212,78 123,60 

here i need to sort score

i have tried below code but it only works when there is a single record on a particular id

df.groupBy("id").pivot("week").agg(first("score")) 
2
  • 1
    Possible duplicate of How to pivot DataFrame? Commented Dec 28, 2017 at 13:33
  • No shankar, Here i am expecting all records as an array..please check the expected output. Commented Dec 28, 2017 at 17:06

2 Answers 2

3

Instead of first you should be collecting all the values using collect_list, this will give you the result in list

import org.apache.spark.sql.functions._ df.groupBy("id").pivot("week").agg(collect_list("score")).show() 

output :

+---+---------+---------+---------+ |id |1 |2 |3 | +---+---------+---------+---------+ |101|[138, 59]|[282, 97]|[220, 87]| |102|[96, 37] |[212, 78]|[60, 123]| +---+---------+---------+---------+ 
Sign up to request clarification or add additional context in comments.

1 Comment

May i get the above output using sql query in pyspark by using temptable.
1

Equivalent python code for pyspark for the scala answer posted by Prasad Khode is as below

from pyspark.sql import functions as F df.groupBy("id").pivot("week").agg(F.collect_list("score")).show() 

If you check the api document you can see

collect_list(Column e)
Aggregate function: returns a list of objects with duplicates.

You can use collect_set as well which will give you the same output with duplicates removed.

df.groupBy("id").pivot("week").agg(F.collect_set("score")).show() 

api document says the following

collect_set(Column e)
Aggregate function: returns a set of objects with duplicate elements eliminated.

2 Comments

Thanks for your input Ramesh, May i have a chance of getting same output using sql query in pyspark. i have used this query which shows error "select id,list((case when week = 1 then score else null end) as 1,','),list((case when week = 2 then score else null end) as 2,','),list((case when week = 3 then score else null end) as 3,',') from d group by id"
@LUZO, I am afraid I can't be a good help on that as sql queries are nightmares for me. :) you can post another question with much detail explaining your requirements. You will certainly get an answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.