dlcalc is a collection of tools for deep learning practitioners, providing calculators and tools for:
- ๐งฎ Performance Modeling - Estimate training throughput, memory usage, and MFU
- ๐ Topology Analysis - Analyze and optimize network topology for distributed training
- ๐ Metrics Conversion - Convert between different performance metrics
- ๐ Checkpoint Analysis - Inspect and summarize model checkpoints
pip install dlcalcor
git clone https://github.com/jfc4050/dlcalc cd dlcalc pip install -e .After this you should have access to the command line tools described below. Some people may need to add --user to their pip install command for them to properly go under $PATH.
Calculator for estimating performance characteristics of ND parallel transformer training:
3dtrn examples/llama3_70b.yamlWe recommend to use this with profilers like NVIDIA Nsight Systems or PyTorch Profiler to give theoretical grounding to your performance profiling.
| Tool | Command | Purpose |
|---|---|---|
| Visualizer | topoviz | Generate network topology graphs from Kubernetes clusters |
| Evaluator | topoeval | Analyze topology optimality for DP rings |
| Scheduler | topoassign | Compute topology-aware rank assignments |
# Visualize cluster topology topoviz -h # Evaluate training job topology topoeval -h # Generate optimal rank assignments topoassign -hConvert training throughput to Model FLOPs Utilization (MFU):
sps2mfu --samples-per-sec 100 --seqlen 2048 --model-size 70b \ --n-accelerators 512 --tflops-per-accelerator 312Calculate daily token throughput:
sps2tpd --samples-per-sec 100 --seqlen 2048Analyze PyTorch checkpoint contents:
ckpt-summarize model.pt# Install with development dependencies pip install -e .[dev] # Install pre-commit hooks pre-commit install# Run all checks (formatting, linting, type checking, tests) bash checks# Run full test suite pytest tests/ -v # Run with coverage pytest tests/ --cov=dlcalc --cov-report=term-missingContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- ๐ Report bugs
- ๐ก Request features
- ๐ Read the docs