0

I am finding it hard to transpose columns in DF. Given below is the base dataframe and the expected output

Student Class Subject Grade Sam 6th Grade Maths A Sam 6th Grade Science A Sam 7th Grade Maths A- Sam 7th Grade Science A Rob 6th Grade Maths A Rob 6th Grade Science A- Rob 7th Grade Maths A- Rob 7th Grade Science B Rob 7th Grade AP A 

Expected output:

Student Class Math_Grade Science_Grade AP_Grade Sam 6th Grade A A Sam 7th Grade A- A Rob 6th Grade A A- Rob 7th Grade A- B A 

Please suggest what is the best way to solve this.

0

2 Answers 2

3

You can group the DataFrame by Student, Class and pivot Subject as follows:

import org.apache.spark.sql.functions._ val df = Seq( ("Sam", "6th Grade", "Maths", "A"), ("Sam", "6th Grade", "Science", "A"), ("Sam", "7th Grade", "Maths", "A-"), ("Sam", "7th Grade", "Science", "A"), ("Rob", "6th Grade", "Maths", "A"), ("Rob", "6th Grade", "Science", "A-"), ("Rob", "7th Grade", "Maths", "A-"), ("Rob", "7th Grade", "Science", "B"), ("Rob", "7th Grade", "AP", "A") ).toDF("Student", "Class", "Subject", "Grade") df. groupBy("Student", "Class").pivot("Subject").agg(first("Grade")). orderBy("Student", "Class"). show // +-------+---------+----+-----+-------+ // |Student| Class| AP|Maths|Science| // +-------+---------+----+-----+-------+ // | Rob|6th Grade|null| A| A-| // | Rob|7th Grade| A| A-| B| // | Sam|6th Grade|null| A| A| // | Sam|7th Grade|null| A-| A| // +-------+---------+----+-----+-------+ 
Sign up to request clarification or add additional context in comments.

Comments

1

Simply you can use pivot and group based on columns.

 case class StudentRecord(Student: String, `Class`: String, Subject: String, Grade: String) val rows = Seq(StudentRecord ("Sam", "6th Grade", "Maths", "A"), StudentRecord ("Sam", "6th Grade", "Science", "A"), StudentRecord ("Sam", "7th Grade", "Maths", "A-"), StudentRecord ("Sam", "7th Grade", "Science", "A"), StudentRecord ("Rob", "6th Grade", "Maths", "A"), StudentRecord ("Rob", "6th Grade", "Science", "A-"), StudentRecord ("Rob", "7th Grade", "Maths", "A-"), StudentRecord ("Rob", "7th Grade", "Science", "B"), StudentRecord ("Rob", "7th Grade", "AP", "A") ).toDF() rows.groupBy("Student", "Class").pivot("Subject").agg(first("Grade")).orderBy(desc("Student"), asc("Class")).show() /** * +-------+---------+----+-----+-------+ * |Student| Class| AP|Maths|Science| * +-------+---------+----+-----+-------+ * | Sam|6th Grade|null| A| A| * | Sam|7th Grade|null| A-| A| * | Rob|6th Grade|null| A| A-| * | Rob|7th Grade| A| A-| B| * +-------+---------+----+-----+-------+ */ 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.