0

I'm trying to transpose some of my PySpark dataframe rows into columns

I've done many attempts but I can't seem to get the correct results.

Dataframe currently looks like this

ArticleID |Category |Value 1 Color Black 1 Gender Male 2 Color Green 2 Gender Female 3 Color Blue 3 Gender Male 

Situation I'm trying to get is

ArticleID |Color |Gender 1 Black Male 2 Green Female 3 Blue Male 

Edit: Question might be the same in some areas but this one required an aggregation on first item for the pivoted row.

agg(f.first()) 

Suggested question could aggregate on numerical operations.

1

1 Answer 1

4

Use groupBy + pivot:

import pyspark.sql.functions as f df.groupBy('ArticleID').pivot('Category').agg(f.first('Value')).show() +---------+-----+------+ |ArticleID|Color|Gender| +---------+-----+------+ | 3| Blue| Male| | 1|Black| Male| | 2|Green|Female| +---------+-----+------+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.