This repository contains the basic ideas for creating a containerized CUDA machine learning workflow with pytorch and docker. The project was developed under Python 3.11 and Linux Ubuntu 22.04 lts.
Installation of docker. Check your version with:
docker -v Installation of NVIDIA GPU drivers. Check your GPUs:
nvidia-smi Set flag to using new buildsystem of docker (perhaps obsolete in the future).
echo \ '{ "features": { "buildkit": true } }' | sudo tee /etc/docker/daemon.json > /dev/null Installation of NVIDIA Container Toolkit
Enable NVIDIA runtime
sudo nvidia-ctk runtime configure sudo systemctl restart docker.service Build container
sudo docker build -t introcontcudaml:latest . Ensure the presence of the local file structure for the mount of the container
wd_IntroContCudaML/ ├── config │ └── myconfig.json ├── input └── output └── model.pth (here exports the container the output) Have a look into the docs folder for an example config file.
Run container
sudo docker run --gpus all -v ~/Projects/wd_IntroContCudaML:/IntroContCudaML/data introcontcudaml -c /IntroContCudaML/data/config/myconfig.json | option | description |
|---|---|
--gpus all | access NVIDIA GPU resources |
-v ~/path/to/wd:/IntroContCudaML/data | mounts the working directory into the container at data path |
-c /IntroContCudaML/data/config/myconfig.json | custom flag for specifying the config file |
see also docs for docker container run
For apptainer usage (assuming that apptainer pull was used to pull the image and convert it to a sif)
apptainer exec --nv --bind ~/path/to/wd:/IntroContCudaML/data/ --pwd /IntroContCudaML introcontcudaml_latest.sif python3 src/main.py -c data/config/myconfig.json