I have three text files in my directory:
a.txt
A B C D A E F b.txt
A B C D A E c.txt
A B C D A E G I use the following streaming query:
val schema = new StructType().add("value", "string") val lines = spark .readStream .schema(schema) .option("maxFilesPerTrigger", 1) .text(...) .as[String] val wordCounts = lines.flatMap(_.split("\\s+")).groupBy("value").count() val query = wordCounts.writeStream .queryName("t") .outputMode("update") // <-- output mode: update .format("memory") .start() while (true) { spark.sql("select * from t").show(truncate = false) println(new Date()) Thread.sleep(1000) } The query always outputs the following results:
+-----+-----+ |value|count| +-----+-----+ |A |2 | |B |1 | |C |1 | |D |1 | |E |1 | |A |4 | |B |2 | |C |2 | |D |2 | |E |2 | |G |1 | |A |6 | |B |3 | |C |3 | |D |3 | |E |3 | |F |1 | +-----+-----+ It looks like each file's result is appended to the output result (as in Append output mode) and I'm not sure I understand what update mode means. How does update output mode work?