I have a pyspark dataframe like this:
+--------------------+--------------------+ | name| value| +--------------------+--------------------+ | null| null| | null| null| | null| null| | null| null| | null| null| | null| null| | null| null| | null| null| | null| null| | null| null| | id| null| | name| null| | age| null| | food| null| | null| 1| | null| Joe| | null| 47| | null| pizza| +--------------------+--------------------+
I want to remove the null values from each individual columns so the non-null data lines up.
The desired output is:
+--------------------+--------------------+ | name| value| +--------------------+--------------------+ | id| 1| | name| Joe| | age| 47| | food| pizza| +--------------------+--------------------+
I have tried removing nulls doing something like df.dropna(how='any'/'all') but and by separating out the columns and removing the nulls, but then it becomes difficult to join them back together.