2

I wonder if there is any easy way to combine multiple rows into one in Pyspark, I am new to Python and Spark and been using Spark.sql most of the time.

Here is a data example:

id count1 count2 count3 1 null 1 null 1 3 null null 1 null null 5 2 null 1 null 2 1 null null 2 null null 2 

the expected output is :

 id count1 count2 count3 1 3 1 5 2 1 1 2 

I been using spark SQL to join them multiple times, and wonder if there is any easier way to do that.

Thank you!

4
  • I am not sure if it was intended, but in your data, it looks like every id has only one non-null value for a column? Commented Feb 7, 2020 at 15:22
  • 1
    if every id has only one non-null value, you can do groupBy + first with ignorenulls =True. Something like: df.groupBy('id').agg(*[first(c, True).alias(c) for c in df.columns[1:]]) Commented Feb 7, 2020 at 15:25
  • Or groupBy with max : f.groupBy("id").agg(*[max(c).alias(c) for c in df.columns[1:]]).show()... Commented Feb 7, 2020 at 16:15
  • yes, only one null value. Thank you all, I will give it a try! Commented Feb 7, 2020 at 16:58

1 Answer 1

3

Spark SQL will sum null as zero, so if you know there are no "overlapping" data elements, just group by the column you wish aggregate to and sum.

Assuming that you want to keep your original column names (and not sum the id column), you'll need to specify the columns that are summed and then rename them after the aggregation.

before.show() +---+------+------+------+ | id|count1|count2|count3| +---+------+------+------+ | 1| null| 1| null| | 1| 3| null| null| | 1| null| null| 5| | 2| null| 1| null| | 2| 1| null| null| | 2| null| null| 2| +---+------+------+------+ after = before .groupby('id').sum(*[c for c in before.columns if c != 'id']) .select([col(f"sum({c})").alias(c) for c in before.columns if c != 'id']) after.show() +------+------+------+ |count1|count2|count3| +------+------+------+ | 3| 1| 5| | 1| 1| 2| +------+------+------+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.