How do I transform a Spark dataframe so that my values become column names? [duplicate]

Question

I'm not sure of a good way to phrase the question, but an example will help. Here is the dataframe that I have with the columns: name, type, and count:

+------+------+-------+ | Name | Type | Count | +------+------+-------+ | a | 0 | 5 | | a | 1 | 4 | | a | 5 | 5 | | a | 4 | 5 | | a | 2 | 1 | | b | 0 | 2 | | b | 1 | 4 | | b | 3 | 5 | | b | 4 | 5 | | b | 2 | 1 | | c | 0 | 5 | | c | ... | ... | +------+------+-------+

I want to get a new dataframe structured like this where the Type column values have become new columns:

+------+---+-----+---+---+---+---+ | Name | 0 | 1 | 2 | 3 | 4 | 5 | <- Number columns are types from input +------+---+-----+---+---+---+---+ | a | 5 | 4 | 1 | 0 | 5 | 5 | | b | 2 | 4 | 1 | 5 | 5 | 0 | | c | 5 | ... | | | | | +------+---+-----+---+---+---+---+

The columns here are [Name,0,1,2,3,4,5].

Shaido · Accepted Answer · 2018-03-02 05:35:55Z

Do this by using the pivot function in Spark.

val df2 = df.groupBy("Name").pivot("Type").sum("Count")

Here, if the name and the type is the same for two rows, the count values are simply added together, but other aggregations are possible as well.

Resulting dataframe when using the example data in the question:

+----+---+----+----+----+----+----+ |Name| 0| 1| 2| 3| 4| 5| +----+---+----+----+----+----+----+ | c| 5|null|null|null|null|null| | b| 2| 4| 1| 5| 5|null| | a| 5| 4| 1|null| 5| 5| +----+---+----+----+----+----+----+

Collectives™ on Stack Overflow

How do I transform a Spark dataframe so that my values become column names? [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related