4

Given that a single node has multiple GPUs, is there a way to automatically limit CPU and memory usage depending on the number of GPUs requested?

In particular, if the users job script requests 2 GPUs then the job should automatically be restricted to 2*BaseMEM and 2*BaseCPU, where BaseMEM = TotalMEM/numGPUs and BaseCPU=numCPUs/numGPUs, which would be defined on a per node basis.

Is it possible to configure SLURM this way? If not, can one alternatively "virtually" split a multi-GPU machine into multiple nodes with the appropriate CPU and MEM count?

1 Answer 1

2
+50

On the command line

--cpus-per-gpu $BaseCPU --mem-per-gpu $BaseMEM 

In slurm.conf

DefMemPerGPU=1234 DefCpuPerGPU=1 

Since you can't use variables in slurm.conf, you would need to write a little bash command to calculate $BaseCPU and $BaseMEM

Sign up to request clarification or add additional context in comments.

1 Comment

Nice! I didn't see this because we have been running an older version. It appears this feature became available in version 19.05 (May 2019).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.