2

I have 2 questions on spark streaming :

  1. I have a spark streaming application running and collection data in 20 seconds batch intervals, out of 4000 batches there are 18 batches which failed because of exception :

Could not compute split, block input-0-1464774108087 not found

I assumed the data size is bigger than spark available memory at that point, also the app StorageLevel is MEMORY_ONLY.

Please advice how to fix this.

  1. Also in the command I use below, I use executor memory 20G(total RAM on the data nodes is 140G), does that mean all that memory is reserved in full for this app, and what happens if I have multiple spark streaming applications ?

would I not run out of memory after a few applications ? do I need that much memory at all ?

/usr/iop/4.1.0.0/spark/bin/spark-submit --master yarn --deploy-mode client --jars /home/blah.jar --num-executors 8 --executor-cores 5 --executor-memory 20G --driver-memory 12G --driver-cores 8
--class com.ccc.nifi.MyProcessor Nifi-Spark-Streaming-20160524.jar

1 Answer 1

0

It seems might be your executor memory will be getting full,try these few optimization techniques like :

  1. Instead of using StorageLevel is MEMORY_AND_DISK.
  2. Use Kyro serialization which is fast and better than normal java serialization.f yougo for caching with memory and serialization.
  3. Check if there are gc,you can find in the tasks being executed.
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.