1

I made standalone cluster and wanted to find the fastest way to process my app. My machine has 12g ram. Here is some result I tried.

Test A (took 15mins) 1 worker node spark.executor.memory = 8g spark.driver.memory = 6g Test B(took 8mins) 2 worker nodes spark.executor.memory = 4g spark.driver.memory = 6g Test C(took 6mins) 2 worker nodes spark.executor.memory = 6g spark.driver.memory = 6g Test D(took 6mins) 3 worker nodes spark.executor.memory = 4g spark.driver.memory = 6g Test E(took 6mins) 3 worker nodes spark.executor.memory = 6g spark.driver.memory = 6g 
  1. Compared Test A, Test B just made one more woker (but same memory spend 4*2=8) but It made app fast. Why it happened?
  2. Test C, D, E tried to spend much more memory than it had. but It worked and even faster. is config memory size just for limiting edge of memory?
  3. It does not just as fast as adding worker nodes. How should I know profit number of worker and executor memory size?
3
  • Quick question did you restart your cluster after every test? I just wondering if caching is improving the subsequent performance increase? Commented Mar 24, 2016 at 15:10
  • @charlesgomes really restarting is needed? what if running each scenario once or twice before the real measurement trial for instance? Commented Mar 24, 2016 at 22:32
  • No. master and worker started with 'bin/spark-class'. I just gave re-assignment my cluster by 'bin/spark-submit' Commented Mar 25, 2016 at 0:13

1 Answer 1

1

On TestB, your application was running in parallel on 2 CPUs, therefore the total amount of time was almost a half.

Regarding memory - memory setting defines an upper limit Setting a small amount will make your. app to perform more GC, and if eventually your heap gets full, you'll receive an OutOfMemoryException.

Regarding the most suitable configuration - well, it depends. If your task does not consume much RAM - configure Spark to have as much executors as your CPUs. Otherwise, configure your executors to match the appropriate amount of RAM required. Keep in mind that those limitations should not be constant, and might be changed by your application requirements.

Sign up to request clarification or add additional context in comments.

5 Comments

OutOfMemoryException error comes from execute memory? or driver memory?
It is more likely to come from the executor, since it is the part that parses data.
Than What is driver memory for? I though It's for origin data and fully processed data for RDD.
If you perform an action, such as RDD.collect(), the result will be stored on the driver memory.
Thanks you helped me a lot :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.