3

i'm the newbie of pyspark, just known the most easiest operation of it. and my english is very bad, sorry, i can't descripe it very detail, the following is the sample! thanks for your answers

  • dataframe like this:

| name | id | flag | cnt | | li | 19196 | true | 10 | | li | 19196 | false | 15 | 
  • i want to convert it to:

| name | id | flag_true | flag_false | | li | 19196 | 10 | 15 | 
0

1 Answer 1

3

You can use a pivot table for that:

df.groupBy(['name', 'id'])\ .pivot('flag')\ .agg(f.sum('cnt'))\ .withColumnRenamed('true', 'flag_true')\ .withColumnRenamed('false', 'flag_false')\ .show() 

That prints:

+----+-----+----------+---------+ |name| id|flag_false|flag_true| +----+-----+----------+---------+ | li|19196| 15| 10| +----+-----+----------+---------+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.