Please bare with me if there are any mistakes as this is my first post.
This is the dataframe df: column 'a' is a string and rest are float.
I have added an image for the dataframe as somehow the formatting is getting messed up when I manually added the data.
On the given dataFrame df, I wanted to groupby column 'a' and find the min and max of each other column.I want to get the output as dictionary.So, I converted the resultant pyspark dataframe toJSON and using json.loads converted to Dictionary.
Code snippet: import pyspark.sql.functions as F cols=['b','c'] req_cols=[F.struct(F.first('a').alias('a'),F.max(col).alias('max'),F.min(col).lias('min')).alias(col) for col in cols] df_cache=df.groupby('a').agg(*req_cols).cache() dict=json.loads(df_cache.toJSON.collect()[0]) My output:
{ "b": { "max": "min": "a":'10' }, "c": { "max": "min": "a":'10' }, } Required output:
{ "b_10": { "max": "min": "a":'10' }, "c_10": { "max": "min": "a":'10' }, "b_20": { "max": "min": "a":'20' }, "c_20": { "max": "min": "a":'20' }, "b_30": { "max": "min": "a":'30' }, "c_30": { "max": "min": "a":'30' }, }