Skip to content

anthony-maio/fitcheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fitcheck

Know before you train — VRAM estimation for LLM fine-tuning.

fitcheck predicts GPU memory usage from first principles. Given a model, GPU, and training method, it tells you whether your config will fit — before you spend an hour discovering it won't.

Why fitcheck?

Fine-tuning LLMs means guessing at batch sizes and hoping you don't OOM. The feedback loop is brutal: pick a config, wait for the run to start, crash 2 minutes in, adjust, repeat.

fitcheck collapses that loop. It computes each VRAM component — model weights, optimizer states, gradients, activations, the logits buffer, eval KV-cache spikes — and produces a breakdown with confidence bounds.

What it computes

Component What it is Why it matters
Model weights Base params in training dtype (bf16/NF4) 4.2 GB for QLoRA 8B, 16 GB for full bf16
Optimizer states AdamW momentum + variance per trainable param Dominates full fine-tune (~60 GB for 8B)
Gradients One gradient per trainable param Small for LoRA, huge for full FT
Activations Per-layer stored tensors for backward pass Flash-attention-aware, scales with batch × seq
Logits buffer batch × seq × vocab × 4 bytes (float32) The surprise OOM — 2 GB at bs=4 with 128k vocab
Eval KV-cache Spike during evaluation steps Can exceed training steady-state

Quick Start

pip install -r requirements.txt # Run tests pytest
from fitcheck.hub.resolver import resolve_from_config from fitcheck.hardware.registry import get_hardware from fitcheck.profilers.vram.engine import VRAMEstimator from fitcheck.models.profiles import TrainingMethod, LoRAConfig # QLoRA Llama 8B on an RTX 3090 estimator = VRAMEstimator() breakdown = estimator.estimate( model=resolve_from_config("meta-llama/Llama-3.1-8B", config), hardware=get_hardware("3090"), method=TrainingMethod.QLORA, batch_size=4, seq_len=1024, lora_config=LoRAConfig(rank=16), ) print(f"Steady-state: {breakdown.steady_state_gb:.1f} GB") print(f"Usable VRAM: {get_hardware('3090').usable_vram_gb} GB") # Steady-state: 16.6 GB # Usable VRAM: 22.8 GB ← fits with 6 GB headroom

Development

pytest # Run all tests (51) pytest tests/fitcheck/profilers/test_estimator.py -v # End-to-end estimator tests ruff format fitcheck tests # Format code

License

MIT

About

Know before you train — VRAM estimation for LLM fine-tuning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages