2

I'm not sure of a good way to phrase the question, but an example will help. Here is the dataframe that I have with the columns: name, type, and count:

+------+------+-------+ | Name | Type | Count | +------+------+-------+ | a | 0 | 5 | | a | 1 | 4 | | a | 5 | 5 | | a | 4 | 5 | | a | 2 | 1 | | b | 0 | 2 | | b | 1 | 4 | | b | 3 | 5 | | b | 4 | 5 | | b | 2 | 1 | | c | 0 | 5 | | c | ... | ... | +------+------+-------+ 

I want to get a new dataframe structured like this where the Type column values have become new columns:

+------+---+-----+---+---+---+---+ | Name | 0 | 1 | 2 | 3 | 4 | 5 | <- Number columns are types from input +------+---+-----+---+---+---+---+ | a | 5 | 4 | 1 | 0 | 5 | 5 | | b | 2 | 4 | 1 | 5 | 5 | 0 | | c | 5 | ... | | | | | +------+---+-----+---+---+---+---+ 

The columns here are [Name,0,1,2,3,4,5].

0

1 Answer 1

2

Do this by using the pivot function in Spark.

val df2 = df.groupBy("Name").pivot("Type").sum("Count") 

Here, if the name and the type is the same for two rows, the count values are simply added together, but other aggregations are possible as well.

Resulting dataframe when using the example data in the question:

+----+---+----+----+----+----+----+ |Name| 0| 1| 2| 3| 4| 5| +----+---+----+----+----+----+----+ | c| 5|null|null|null|null|null| | b| 2| 4| 1| 5| 5|null| | a| 5| 4| 1|null| 5| 5| +----+---+----+----+----+----+----+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.