SLURM automatically limit memory/cpu usage depending on GRES

Question

Given that a single node has multiple GPUs, is there a way to automatically limit CPU and memory usage depending on the number of GPUs requested?

In particular, if the users job script requests 2 GPUs then the job should automatically be restricted to 2*BaseMEM and 2*BaseCPU, where BaseMEM = TotalMEM/numGPUs and BaseCPU=numCPUs/numGPUs, which would be defined on a per node basis.

Is it possible to configure SLURM this way? If not, can one alternatively "virtually" split a multi-GPU machine into multiple nodes with the appropriate CPU and MEM count?

donaldsa18 · Accepted Answer · 2020-01-17 13:21:57Z

On the command line

--cpus-per-gpu $BaseCPU --mem-per-gpu $BaseMEM

In slurm.conf

DefMemPerGPU=1234 DefCpuPerGPU=1

Since you can't use variables in slurm.conf, you would need to write a little bash command to calculate $BaseCPU and $BaseMEM

Nice! I didn't see this because we have been running an older version. It appears this feature became available in version 19.05 (May 2019).

Collectives™ on Stack Overflow

SLURM automatically limit memory/cpu usage depending on GRES

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related