Transpose pyspark rows into columns

Question

I'm trying to transpose some of my PySpark dataframe rows into columns

I've done many attempts but I can't seem to get the correct results.

Dataframe currently looks like this

ArticleID |Category |Value 1 Color Black 1 Gender Male 2 Color Green 2 Gender Female 3 Color Blue 3 Gender Male

Situation I'm trying to get is

ArticleID |Color |Gender 1 Black Male 2 Green Female 3 Blue Male

Edit: Question might be the same in some areas but this one required an aggregation on first item for the pivoted row.

agg(f.first())

Suggested question could aggregate on numerical operations.

Possible duplicate of How to pivot DataFrame?

pault
– pault

2019-04-04 14:28:55 +00:00
Commented Apr 4, 2019 at 14:28 — pault
– pault, Commented Apr 4, 2019 at 14:28

akuiper · Accepted Answer · 2019-04-04 14:07:58Z

Use groupBy + pivot:

import pyspark.sql.functions as f df.groupBy('ArticleID').pivot('Category').agg(f.first('Value')).show() +---------+-----+------+ |ArticleID|Color|Gender| +---------+-----+------+ | 3| Blue| Male| | 1|Black| Male| | 2|Green|Female| +---------+-----+------+

Collectives™ on Stack Overflow

Transpose pyspark rows into columns

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related