1

So, I have a spark standalone cluster with 16 worker nodes and one master node. I start the cluster with "sh start-all.sh" command from the master node in spark_home/conf folder. The master node has 32Gb Ram and 14 VCPUS, while I have 16Gb Ram and 8 VCPUS per node. I also have a spring application which, when it starts(with java -jar app.jar), it initializes the spark context. The spark-env.sh file is:

export SPARK_MASTER_HOST='192.168.100.17' export SPARK_WORKER_CORES=1 export SPARK_WORKER_MEMORY=14000mb export SPARK_WORKER_INSTANCES=1 export SPARK_WORKER_OPTS='-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=172800 -Dspark.worker.cleanup.appDataTtl=172800' 

I do not have anything in spark-defaults.conf and the code for initializing the spark context programmatically is:

@Bean public SparkSession sparksession() { SparkSession sp = SparkSession .builder() .master("spark://....") .config("spark.cassandra.connection.host","192.168.100......") .appName("biomet") .config("spark.driver.memory","20g") .config("spark.driver.maxResultSize", "10g") .config("spark.sql.shuffle.partitions",48) .config("spark.executor.memory","7g") .config("spark.sql.pivotMaxValues","50000") .config("spark.sql.caseSensitive",true) .config("spark.executor.extraClassPath","/home/ubuntu/spark-2.4.3-bin-hadoop2.7/jars/guava-16.0.1.jar") .config("spark.hadoop.fs.s3a.access.key","...") .config("spark.hadoop.fs.s3a.secret.key","...") .getOrCreate(); return sp; } 

After all this the Environment tab of the Spark UI has spark.driver.maxResultSize 10g and spark.driver.memory 20g BUT the executors tab for the storage memory of the driver says 0.0 B / 4.3 GB.

(FYI: I used to have spark.driver.memory at 10g(programmatically set), and in the executor tab was saying 4.3Gb, but now it seems I cannot change it. But I am thinking that even if when I had it 10g, wasn't it suppose to give me more than 4.3Gb?!)

How can I change the driver memory? I tried setting it from spark-defaults.conf but nothing changed. Even if I do not set at all the driver memory(or set it to smaller than 4.3Gb) it still says 4.3Gb in executors tab.

1 Answer 1

0

I suspect that you're running your application in the client mode, then per documentation:

Maximum heap size settings can be set with spark. driver. memory in the cluster mode and through the --driver-memory command line option in the client mode. Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point.

In current case, the Spark job is submitted from the application, so the application itself is a driver, and its memory is regulated as usual for Java applications - via -Xmx, etc.

Sign up to request clarification or add additional context in comments.

8 Comments

Yes it is in client mode. Ok, then I erased it from the SparkConf. I tried putting it in Spark-defaults.conf file but it is like it does not even read this file. From my understanding, as you also said, I cannot set programmatically the driver memory because the JVM has already started(using the sh start-all.sh). But then how am I suppose to set the driver memory? There is only the spark-env.sh file, but there is no configuration in that file for setting the driver memory in client standalone spark cluster.
you need to start spark-submit with --driver-memory option
But I am not using spark submit anywhere...you mean just try "spark-submit --driver-memory=30g" before I use "sh start-all.sh" or start my application?
yes, use spark-submit. I don't know what start-all.sh is doing, but I suspect that it's calling spark-submit under the hood
in that case the driver is your application that you're executing - just set -Xmx as usual
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.