I need to apply an aggregation function on a stream of data with apache spark streaming (NO APACHE SPARK STREAMING SQL).
In my case I have a kafka producer tha send messages in JSON format. The format is {'a': String, 'b': String, 'c': Integer, 'd': Double}
I need to aggregate on attributes 'a' and 'b' every 5 Seconds and I have to apply an aggregation function on the other 2 attributes (e.g. Average, or Sum, or Min, or Max).
How can I do that?
Thanks
reducefunction? spark.apache.org/docs/latest/…{'a': String, 'b': String, 'c': Integer, 'd': Double}the resulting schema (with an AVG aggregate function) should be{'GROUPBYa': String, 'GROUPBYb': String, 'AVGc': Integer, 'AVGd': Double}transformorforeachRDDand apply any arbitrary RDD function, or convert to Dataframes and use the dataframes aggregation API