Scala Unpivot Table

Question

SCALA

I have a table with this struct:

FName	SName	Email	Jan 2021	Feb 2021	Mar 2021	Total 2021
Micheal	Scott	[email protected]	4000	5000	3400	50660
Dwight	Schrute	[email protected]	1200	6900	1000	35000
Kevin	Malone	[email protected]	9000	6000	18000	32000

And i want to transform it to:

I tried with 'stack' method but i couldn't get it to work. Thanks

What version of spark & scala are you using?

sparker
– sparker

2022-06-21 15:57:13 +00:00
Commented Jun 21, 2022 at 15:57 — sparker
– sparker, Commented Jun 21, 2022 at 15:57

Leo C · Accepted Answer · 2022-06-21 18:58:32Z

You can flatten the monthly/total columns via explode as shown below:

val df = Seq( ("Micheal", "Scott", "[email protected]", 4000, 5000, 3400, 50660), ("Dwight", "Schrute", "[email protected]", 1200, 6900, 1000, 35000), ("Kevin", "Malone", "[email protected]", 9000, 6000, 18000, 32000) ).toDF("FName","SName", "Email", "Jan 2021", "Feb 2021", "Mar 2021", "Total 2021") val moYrCols = Array("Jan 2021", "Feb 2021", "Mar 2021", "Total 2021") // (**) val otherCols = df.columns diff moYrCols val structCols = moYrCols.map{ c => val moYr = split(lit(c), "\\s+") struct(moYr(1).as("Year"), moYr(0).as("Month"), col(c).as("Value")) } df. withColumn("flattened", explode(array(structCols: _*))). select(otherCols.map(col) :+ $"flattened.*": _*). show /* +-------+-------+------------------+----+-----+-----+ | FName| SName| Email|Year|Month|Value| +-------+-------+------------------+----+-----+-----+ |Micheal| Scott| [email protected]|2021| Jan| 4000| |Micheal| Scott| [email protected]|2021| Feb| 5000| |Micheal| Scott| [email protected]|2021| Mar| 3400| |Micheal| Scott| [email protected]|2021|Total|50660| | Dwight|Schrute|[email protected]|2021| Jan| 1200| | Dwight|Schrute|[email protected]|2021| Feb| 6900| | Dwight|Schrute|[email protected]|2021| Mar| 1000| | Dwight|Schrute|[email protected]|2021|Total|35000| | Kevin| Malone| [email protected]|2021| Jan| 9000| | Kevin| Malone| [email protected]|2021| Feb| 6000| | Kevin| Malone| [email protected]|2021| Mar|18000| | Kevin| Malone| [email protected]|2021|Total|32000| +-------+-------+------------------+----+-----+-----+ */

(**) Use pattern matching in case there are many columns; for example:

val moYrCols = df.columns.filter(_.matches("[A-Za-z]+\\s+\\d{4}"))

Matt Andruff · Accepted Answer · 2022-06-21 18:51:33Z

val data = Seq( ("Micheal","Scott","[email protected]",4000,5000,3400,50660), ("Dwight","Schrute","[email protected]",1200,6900,1000,35000), ("Kevin","Malone","[email protected]",9000,6000,18000,32000)) ) val columns = Seq("FName","SName","Email","Jan 2021","Feb 2021","Mar 2021","Total 2021") val newColumns = Array( "FName", "SName", "Email","Total 2021" ) val df = spark.createDataFrame( data ).toDF(columns:_*) df .select( struct( (for {column <- df.columns } yield col(column)).toSeq :_* ).as("mystruct")) // create your data set with a column as a struct. .select( $"mystruct.Fname", // refer to sub element of struct with '.' operator $"mystruct.sname", $"mystruct.Email", explode( /make rows for every entry in the array. array( (for {column <- df.columns if !(newColumns contains column) } //filter out the columns we already selected yield // for each element yield the following expression (similar to map) struct( col(s"mystruct.$column").as("value"), // create the value column lit(column).as("date_year")) // create a date column ).toSeq :_* ) // shorthand to pass scala array into varargs for array function ) ) .select( col("*"), // just being lazy instead of typing. col("col.*") // create columns from new column. Seperating the year/date should be easy from here. ).drop($"col") .show(false) +--------------+--------------+------------------+-----+---------+ |mystruct.Fname|mystruct.sname|mystruct.Email |value|date_year| +--------------+--------------+------------------+-----+---------+ |Micheal |Scott |[email protected] |4000 |Jan 2021 | |Micheal |Scott |[email protected] |5000 |Feb 2021 | |Micheal |Scott |[email protected] |3400 |Mar 2021 | |Dwight |Schrute |[email protected]|1200 |Jan 2021 | |Dwight |Schrute |[email protected]|6900 |Feb 2021 | |Dwight |Schrute |[email protected]|1000 |Mar 2021 | |Kevin |Malone |[email protected] |9000 |Jan 2021 | |Kevin |Malone |[email protected] |6000 |Feb 2021 | |Kevin |Malone |[email protected] |18000|Mar 2021 | +--------------+--------------+------------------+-----+---------

Collectives™ on Stack Overflow

Scala Unpivot Table

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related