2

if spark streaming gets 50 lines of message in a batch interval of 10 seconds, and after 40.5 lines of the message the 10 seconds is up, and the rest falls into an other 10 second interval, the first 40.5 lines of text is one RDD is processed first , first 40 lines in my use case make sense but the next .5 line do not make sense, same is the case with the second RDD first .5 line, is my question even valid ?.Please advice how to handle this ?.

Thanks Bill.

1 Answer 1

3

It cannot happen. Either element has been received and is a part of a current window, or it hasn't, and will be included in the next one. File based sources require atomic file creation so situation where only a part of a file is loaded is simply not possible.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.