This is my data.
CouponNbr,ItemNbr,TypeCode,DeptNbr,MPQ 10,2,1,10,1 10,3,4,50,2 11,2,1,10,1 11,3,4,50,2 I want to group it in spark in such a way such that it looks like this:
CouponNbr,ItemsInfo 10,[[2,1,10,1],[3,4,50,2]] 11,[[2,1,10,1],[3,4,50,2]] I try to group it by and convert it to dictionary with the following code,
df.groupby("CouponNbr").apply(lambda x:x[["ItemNbr","TypeCode","DeptNbr","MPQ"]].to_dict("r")) But this is in pandas and it returns the following
CouponNbr,ItemsInfo 10,[{[ItemNbr:2,TypeCode:1,DeptNbr:10,MPQ:1],[ItemNbr:3,TypeCode:4,DeptNbr:50,MPQ:2]}] 11,[{[ItemNbr:2,TypeCode:1,DeptNbr:10,MPQ:1],[ItemNbr:3,TypeCode:4,DeptNbr:50,MPQ:2]}] Is there a way I could achieve the format I need in pyspark? Thanks.