I have 2 questions on spark streaming :
- I have a spark streaming application running and collection data in
20 secondsbatch intervals, out of4000 batchesthere are18 batcheswhich failed because of exception :
Could not compute split, block input-0-1464774108087 not found
I assumed the data size is bigger than spark available memory at that point, also the app StorageLevel is MEMORY_ONLY.
Please advice how to fix this.
- Also in the command I use below, I use executor memory 20G(total RAM on the data nodes is 140G), does that mean all that memory is reserved in full for this app, and what happens if I have multiple spark streaming applications ?
would I not run out of memory after a few applications ? do I need that much memory at all ?
/usr/iop/4.1.0.0/spark/bin/spark-submit --master yarn --deploy-mode client --jars /home/blah.jar --num-executors 8 --executor-cores 5 --executor-memory 20G --driver-memory 12G --driver-cores 8
--class com.ccc.nifi.MyProcessor Nifi-Spark-Streaming-20160524.jar
Thanks Pradeep