1

I'm investigating using Apache Spark for an application. I'm especially interested in the structured streaming mode using temporary views and full SQL queries (for simplicity and low latency).

The application would require running multiple (tens, possibly hundreds) of queries on a single input stream of data. Is there a way to avoid having Spark re-read the input for each query?

1
  • 1
    The question is not clear. Multiple aggregations in "single" streaming query should be possible, "if the group key is same". (Just add all aggregations in "agg()".) If you're talking about arbitrary kinds of aggregations like different group keys, sorry but they should be separate streaming queries and input should be re-read. The workaround is the Mike's answer below. The concept of forking/splitting stream don't exist in Spark, whereas other streaming frameworks exist; so your next bet would be trying out Flink to check whether your requirement is satisfied. Commented Jan 23, 2021 at 3:01

1 Answer 1

2

Multiple streaming queries within the same Spark Structured Streaming application will run concurrently and independent in the sense that they make different progress while reading the same source. Therefore, caching/persisting will not be feasible (and is actually not possible).

Unless you use the following foreachBatch pattern for your streaming queries there is no standard way to cache the input source.

streamingDF.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) => batchDF.persist() batchDF.write.format(...).save(...) // location 1 batchDF.write.format(...).save(...) // location 2 batchDF.unpersist() } 

More details are given in the Structured Streaming Programming Guide on using foreach and foreachbatch

Sign up to request clarification or add additional context in comments.

1 Comment

Just to be clear, if not using foreachbatch, the multiple queries will still use the same microbatch is my understanding, or is that not so?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.