I want to create a new dataframe from existing dataframe in pyspark. The dataframe "df" contains a column named "data" which has rows of dictionary and has a schema as string. And the keys of each dictionary are not fixed.For example the name and address are the keys for the first row dictionary but that would not be the case for other rows they may be different. following is the example for that;
........................................................ data ........................................................ {"name": "sam", "address":"uk"} ........................................................ {"name":"jack" , "address":"aus", "occupation":"job"} ......................................................... How do I convert into the dataframe with individual columns like following.
name address occupation sam uk jack aus job
dfa pandas DataFrame? Or is thedatacolumn actually of typeStringType()orMapType()? Edit your question with the output ofdf.select('data').printSchema(). Better yet, provide a reproducible example. Maybe you're looking for this answer.