0

Below is the use case for Spark Structure Streaming

Step 1: StreamA = loaded from Kafka topicA containing event of type A

Step 2: StreamB = loaded from Kafka topicB containing event of type B

Step 3: JoinedStream = StreamA inner join StreamB on id

Step 4: Insert matched data into Database

I don't need matched data for further processing. Will Spark stream clear the state on joining ?

If not how do I clear them without watermark?

1 Answer 1

0

For inner join although watermark and join conditions are optional as per the doc (https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#inner-joins-with-optional-watermarking)

To avoid unbounded state, you have to define additional join conditions such that indefinitely old inputs cannot match with future inputs and therefore can be cleared from the state. In other words, you will have to do the following additional steps in the join

Any reason not to include watermark and conditions to clear off the state?

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.