I would like to combine rows in the following spark dataframe:
+-------+------------+--------+ | date | description| amount | +-------+------------+--------+ | 01/10 | first | 10 | | null | second | null | | null | third | null | | 02/10 | first | 14 | | 03/10 | third | 12 | | null | third | null | | null | second | null | | 04/10 | first | 15 | +-------+------------+--------+ so that the description field is combined for rows which have a description spanning multiple rows. The result would look like:
+-------+-----------------------+--------+ | date | description | amount | +-------+-----------------------+--------+ | 01/10 | first, second, third | 10 | | 02/10 | first | 14 | | 03/10 | third, third, second | 12 | | 04/10 | first | 15 | +-------+-----------------------+--------+ The null rows don't have any identifier to link them to the correct date row other than they are always the sequential rows below and are null in all other columns.
Thanks!!