This is a template for a PyTorch Project for training, testing, inference demo, and FastAPI serving along with Docker support.
Use poetry or python venv or a conda env to install requirements:
- Poetry install full requirements:
poetry install --all-groups(Recommended) - Pip install full requirements:
pip install -r requirements.txt
Example training for mnist digit classification:
python train.py --cfg configs/mnist_config.yamlSet training data inside data directory in the following format:
data |── SOURCE_DATASET ├── CLASS 1 | ├── img1 | └── img2 | ├── ... ├── CLASS 2 | ├── img1 | └── img2 | ├── ... Note: ImageNet style class_dir->subdirs->subdirs->images... is also supported # generate an id to name classmap python scripts/generate_classmap_from_dataset.py --sd data/SOURCE_DATASET --mp data/ID_2_CLASSNAME_MAP_TXT_FILE # create train val test split, also creates an index to classname mapping txt file python scripts/train_val_test_split.py --rd data/SOURCE_DATASET --td data/SOURCE_DATASET_SPLIT --vs VAL_SPLIT_FRAC -ts TEST_SPLIT_FRAC # OPTIONAL duplicate train data if necessary python scripts/duplicate_data.py --rd data/SOURCE_DATASET_SPLIT/train --td data/SOURCE_DATASET_SPLIT/train -n TARGET_NUMBER # create a custom config file based on configs/classifier_cpu_config.yaml and modify train parameters cp configs/classifier_cpu_config.yaml configs/custom_classifier_cpu_config.yamlSample data used in the custom image classification training downloaded from https://www.kaggle.com/datasets/umairshahpirzada/birds-20-species-image-classification.
# train on custom data with custom config python train.py --cfg custom_classifier_cpu_config.yamlConvert existing dataset to a tar archive format used by WebDataset. The data directory must match the structure above.
# ID_2_CLASSNAME_MAP_TXT_FILE is generated using the scripts/train_val_test_split.py file # convert train/val/test splits into tar archives python scripts/convert_dataset_to_tar.py --sd data/SOURCE_DATA_SPLIT --td data/TARGET_TAR_SPLIT.tar --mp ID_2_CLASSNAME_MAP_TXT_FILEAn example configuration for training with the WebDataset format is provided in configs/classifier_webdataset_cpu_config.yaml.
# example training with webdataset tar data format python train.py --cfg configs/classifier_webdataset_cpu_config.yamlTest based on CONFIG_FILE. By default testing is done for mnist classification.
python test.py --cfg CONFIG_FILEpython export.py --cfg CONFIG_FILE -r MODEL_PATH --mode <"ONNX_TS"/"ONNX_DYNAMO"/"TS_TRACE"/"TS_SCRIPT">All tensorboard logs are saved in the tensorboard_log_dir setting in the config file. Logs include train/val epoch accuracy/loss, graph, and preprocessed images per epoch.
To start a tensorboard server reading logs from the experiment dir exposed on port localhost port 6007:
tensorboard --logdir=TF_LOG_DIR --port=6006Install docker in the system first:
bash scripts/build_docker.sh # builds the docker image bash scripts/run_docker.sh # runs the previous docker image creating a shared volume checkpoint_docker outside the container # inside the docker container python train.pyUsing gpus inside docker for training/testing:
--gpus device=0,1 or all
bash server/build_server_docker.sh -m pytorch/onnx bash server/run_server_docker.sh -h/--http 8080Clean cached builds, pycache, .DS_Store files, etc:
bash scripts/cleanup.shCount number of files in sub-directories in PATH
bash scripts/count_files.sh PATH- Line by line GPU memory usage profiling pytorch_memlab
- Line by line time used profiling line_profiler
- https://github.com/victoresque/pytorch-template
- WebDataset https://modelzoo.co/model/webdataset
- PyTorch Ecosystem Tools https://pytorch.org/ecosystem/