I have some JSON that is in the following format:
{"items": ["234", "454", "434", "534"], "time": "1574290618029", "id": "A1", "user": "Bob"} {"items": ["432", "123", "765"], "time": "1574200618021", "id": "B1", "user": "Tim"} {"items": ["437"], "time": "1274600618121", "id": "B1", "user": "Joe"} Each JSON is brought into a dataframe using
spark.read.json(path) and looping through to union them all into a single dataframe.
df.show() shows something like this:
|items| id| time| user| |["234", "454", "434", "534"] | "1574290618029" | "A1" | "Bob"| |["432", "123", "765"] | "1574200618021" | "B1" | "Tim"| |["437"] | "1274600618121" | "B1" | "Joe"| Doing a df.explode(df.id, df.time, df.user, explode(df.items)).show() results in something very close, but not quite what I'm looking for.
|id| time| user| col| |A1| 1574290618029| Bob| 234| |A1| 1574290618029| Bob| 454| |A1| 1574290618029| Bob| 434| |A1| 1574290618029| Bob| 534| |B1| 1574200618021| Tim| 432| |B1| 1574200618021| Tim| 123| |B1| 1574200618021| Tim| 765| |B1| 1274600618121| Joe| 437| What I am actually needing is the data to be in a format like this:
|id| time| user| item_num| col| |A1| 1574290618029| Bob| item1| 234| |A1| 1574290618029| Bob| item2| 454| |A1| 1574290618029| Bob| item3| 434| |A1| 1574290618029| Bob| item4| 534| |B1| 1574200618021| Tim| item1| 432| |B1| 1574200618021| Tim| item2| 123| |B1| 1574200618021| Tim| item3| 765| |B1| 1574200618021| Tim| item4| NA| |B1| 1274600618121| Joe| item1| 437| |B1| 1274600618121| Joe| item2| NA| |B1| 1274600618121| Joe| item3| NA| |B1| 1274600618121| Joe| item4| NA| Is there a simple way to accomplish this utilizing explode that I'm not seeing? I'm also very new to spark coding so please excuse me if there is some very obvious answer...