I have seen a few solutions to unpivot a spark dataframe when the number of columns is reasonably low and that the columns' names can be hardcoded. Do you have a scalable solution to unpivot a dataframe with numerous columns?
Below is a toy problem.
Input:
val df = Seq( (1,1,1,0), (2,0,0,1) ).toDF("ID","A","B","C") +---+--------+----+ | ID| A | B | C | +---+--------+----- | 1| 1 | 1 | 0 | | 2| 0 | 0 | 1 | +---+----------+--+ expected result:
+---+-----+-----+ | ID|names|count| +---+-----------| | 1| A | 1 | | 1| B | 1 | | 1| C | 0 | | 2| A | 0 | | 2| B | 0 | | 2| C | 1 | +---+-----------+ The solution should be applicable to datasets with N columns to unpivot, where N is large (say 100 columns).