How to set slurm/salloc for 1 gpu per task but let job use multiple gpus?

Question

We are looking for some advice with slurm salloc gpu allocations. Currently, given:

% salloc -n 4 -c 2 -gres=gpu:1 % srun env | grep CUDA CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0

However, we desire more than just device 0 to be used.
Is there a way to specify an salloc with srun/mpirun to get the following?

CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=1 CUDA_VISIBLE_DEVICES=2 CUDA_VISIBLE_DEVICES=3

This is desired such that each task gets 1 gpu, but overall gpu usage is spread out among the 4 available devices (see gres.conf below). Not where all tasks get device=0.

That way each task is not waiting on device 0 to free up from other tasks, as is currently the case.

Or is this expected behavior even if we have more than 1 gpu available/free (4 total) for the 4 tasks? What are we missing or misunderstanding?

salloc / srun parameter?
slurm.conf or gres.conf setting?

Summary We want to be able to use slurm and mpi such that each rank/task uses 1 gpu, but the job can spread tasks/ranks among the 4 gpus. Currently it appears we are limited to device 0 only. We also want to avoid multiple srun submissions within an salloc/sbatch due to mpi usage.

OS: CentOS 7

Slurm version: 16.05.6

Are we forced to use wrapper based methods for this?

Are there differences with slurm version (14 to 16) in how gpus are allocated?

Thank you!

Reference: gres.conf

Name=gpu File=/dev/nvidia0 Name=gpu File=/dev/nvidia1 Name=gpu File=/dev/nvidia2 Name=gpu File=/dev/nvidia3

damienfrancois · Accepted Answer · 2017-09-05 19:43:40Z

First of all, try requesting four GPUs with

% salloc -n 4 -c 2 -gres=gpu:4

With --gres=gpu:1, it is the expected behaviour that all tasks see only one GPU. With --gres=gpu:4, the output would be

CUDA_VISIBLE_DEVICES=0,1,2,3 CUDA_VISIBLE_DEVICES=0,1,2,3 CUDA_VISIBLE_DEVICES=0,1,2,3 CUDA_VISIBLE_DEVICES=0,1,2,3

To get what you want, you can use a wrapper script, or modify your srun command like this:

srun bash -c 'CUDA_VISIBLE_DEVICES=$SLURM_PROCID env' | grep CUDA

then you will get

CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=1 CUDA_VISIBLE_DEVICES=2 CUDA_VISIBLE_DEVICES=3

Thank you for the reply. We were expecting --gres=gpu:1 to really be --gres_per_task=gpu:1, like the behavior of the -c, --cpus-per-task= option. But appears to be more like a --gres_per_node=gpu:1. We also hoping to avoid any wrapper based methods. We were assuming slurm should be able to handle this use case, since our expectation is it would be fairly common.
@CharlieHemlock Yes --gres is per node, not per task. I am not sure a per task request would be that common. Most of the time, either the tasks are independent, and they are submitted as job arrays, or they are not independent, and are part of an MPI job that then has full control on all GPUs of the node and distributes tasks to the GPUs the best way for the application at hand.

TexasDex · Accepted Answer · 2019-03-06 17:56:26Z

This feature is planned for 19.05. See https://bugs.schedmd.com/show_bug.cgi?id=4979 for details.

Be warned that the 'srun bash...' solution suggested will break if your job doesn't request all GPUs on that node, because another process may be in control of GPU0.

Jehandad · Accepted Answer · 2021-03-19 20:51:09Z

To accomplish one GPU per task you need to use the --gpu-bind switch of the srun command. For example, if I have three nodes with 8 GPUs each and I wish to run eight tasks per node each bound to a unique GPU, the following command would do the trick:

srun -p gfx908_120 -n 24 -G gfx908_120:24 --gpu-bind=single:1 -l bash -c 'echo $(hostname):$ROCR_VISIBLE_DEVICES'

Collectives™ on Stack Overflow

How to set slurm/salloc for 1 gpu per task but let job use multiple gpus?

3 Answers 3

2 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Linked

Related