I have a DataSet[Row] where each row is JSON string. I want to just print the JSON stream or count the JSON stream per batch.
Here is my code so far
val ds = sparkSession.readStream() .format("kafka") .option("kafka.bootstrap.servers",bootstrapServers")) .option("subscribe", topicName) .option("checkpointLocation", hdfsCheckPointDir) .load(); val ds1 = ds.select(from_json(col("value").cast("string"), schema) as 'payload) val ds2 = ds1.select($"payload.info") val query = ds2.writeStream.outputMode("append").queryName("table").format("memory").start() query.awaitTermination() select * from table; -- don't see anything and there are no errors. However when I run my Kafka consumer separately (independent ofSpark I can see the data) My question really is what do I need to do just print the data I am receiving from Kafka using Structured Streaming? The messages in Kafka are JSON encoded strings so I am converting JSON encoded strings to some struct and eventually to a dataset. I am using Spark 2.1.0
val query = ds.writeStream.outputMode("append").format("console").start()but that didn't work either