I wonder if there is any easy way to combine multiple rows into one in Pyspark, I am new to Python and Spark and been using Spark.sql most of the time.
Here is a data example:
id count1 count2 count3 1 null 1 null 1 3 null null 1 null null 5 2 null 1 null 2 1 null null 2 null null 2 the expected output is :
id count1 count2 count3 1 3 1 5 2 1 1 2 I been using spark SQL to join them multiple times, and wonder if there is any easier way to do that.
Thank you!
groupBy+firstwithignorenulls =True. Something like:df.groupBy('id').agg(*[first(c, True).alias(c) for c in df.columns[1:]])groupBywithmax:f.groupBy("id").agg(*[max(c).alias(c) for c in df.columns[1:]]).show()...