0

One question: how can I transpose rows into columns on pyspark? My original dataframe looks like this:

ID | DATE | APP | DOWNLOADS | ACTIVE_USERS ___________________________________________________________ 0 | 2021-01-10 | FACEBOOK | 1000 | 5000 1 | 2021-01-10 | INSTAGRAM | 9000 | 90000 2 | 2021-02-10 | FACEBOOK | 9000 | 72000 3 | 2021-02-10 | INSTAGRAM | 16000 | 500000 

But I need it like this:

ID | DATE | FACEBOOK - DOWNLOADS | FACEBOOK - ACTIVE_USERS | INSTAGRAM - DOWNLOADS | INSTAGRAM - ACTIVE_USERS ___________________________________________________________________________________________________________________ 0 | 2021-01-10 | 1000 | 5000 | 9000 | 90000 1 | 2021-02-10 | 9000 | 72000 | 16000 | 50000 

I tried using the answer on this question: Transpose pyspark rows into columns, but i couldn't make it work.

Could you help me please? Thank you!

1 Answer 1

1

From your example I assume the "ID" column is not needed to group on, as it looks to be reset in your outcome. That would make the query something like below:

import pyspark.sql.functions as F df.groupBy("DATE").pivot('APP').agg( F.first('DOWNLOADS').alias("DOWNLOADS"), F.first("ACTIVE_USERS").alias("ACTIVE_USERS") ) 

We groupby the date and pivot on app and retrieve the first value for downloads and active users.

outcome:

+----------+------------------+---------------------+-------------------+----------------------+ | DATE|FACEBOOK_DOWNLOADS|FACEBOOK_ACTIVE_USERS|INSTAGRAM_DOWNLOADS|INSTAGRAM_ACTIVE_USERS| +----------+------------------+---------------------+-------------------+----------------------+ |2021-02-10| 9000| 72000| 16000| 500000| |2021-01-10| 1000| 5000| 9000| 90000| +----------+------------------+---------------------+-------------------+----------------------+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.