I have input dataframe as below with id, app, and customer
Input dataframe
+--------------------+-----+---------+ | id|app |customer | +--------------------+-----+---------+ |id1 | fw| WM | |id1 | fw| CS | |id2 | fw| CS | |id1 | fe| WM | |id3 | bc| TR | |id3 | bc| WM | +--------------------+-----+---------+ Expected output
Using pivot and aggregate - make app values as column name and put aggregated customer names as list in the dataframe
Expected dataframe
+--------------------+----------+-------+----------+ | id| bc | fe| fw | +--------------------+----------+-------+----------+ |id1 | 0 | WM| [WM,CS]| |id2 | 0 | 0| [CS] | |id3 | [TR,WM] | 0| 0 | +--------------------+----------+-------+----------+ What have i tried ?
val newDF = df.groupBy("id").pivot("app").agg(expr("coalesce(first(customer),0)")).drop("app").show()
+--------------------+-----+-------+------+ | id|bc | fe| fw| +--------------------+-----+-------+------+ |id1 | 0 | WM| WM| |id2 | 0 | 0| CS| |id3 | TR | 0| 0| +--------------------+-----+-------+------+ Issue : In my query , i am not able to get the list of customer like [WM,CS] for "id1" under "fw" (as shown in expected output) , only "WM" is coming. Similarly, for "id3" only "TR" is appearing - instead a list should appear with value [TR,WM] under "bc" for "id3"
Need your suggestion to get the list of customer under each app respectively.