Supercharge Your Model Training
- Updated
Mar 25, 2026 - Python
Supercharge Your Model Training
Efficient Deep Learning Systems course materials (HSE, YSDA)
Run more RL experiments. Wait less for GPUs.
Designing IT and ML Applications using Systems Thinking Approach at IIT Bhilai (CS559)
LICITRA v1 evidence — superseded by licitra-mmr-evidence
LICITRA v1 — superseded by licitra-mmr-core
Structured notes on designing scalable and fault-tolerant ML systems, to refresh your knowledge and help you prepare for a system design interview. Covers system design, MLOps, and case studies.
End-to-end personalized feed ranking system demonstrating retrieval → ranking pipelines, offline evaluation, realistic simulation, and business-aligned diagnostics inspired by large-scale social platforms.
cloud-native machine learning platform for real-time inflation nowcasting
Evidence-based roadmap to becoming an AI System Engineer. Mathematical foundations, ML systems, production habits, and proof-backed progression.
Experimental web application demonstrating how an offline-trained financial fraud detection model can be exposed through a web interface. Built with Flask and a pre-trained XGBoost model to showcase ML inference flow, feature engineering, and result communication — not a production fraud prevention system.
Production-style real-time ML feature store with low-latency inference
Profile-first ML systems project optimizing a multi-camera end-to-end driving model for hardware efficiency using PyTorch, CUDA streams, NVTX instrumentation, and Nsight Systems.
Benchmarking and optimizing transformer inference across PyTorch, ONNXRuntime, and TensorRT with latency/throughput analysis on GPU and CPU.
Deterministic decision gate for AI/ML systems. Risk-Gate enforces strict, schema-driven admissibility boundaries between AI/LLM intent and real system actions. It provides a fixed, human-owned decision structure with deterministic allow/block outcomes, explicit audit logging, and environment-specific policy via configuration — no ML, no heuristics,
Public engineering notes (ML systems, CV, MIT courses). Notes-only; sources linked.
Autonomous training optimizer for nanoGPT using multi-agent patch search, empirical validation, and rollback-safe execution. TinyShakespeare val_loss improved from ~4.17 to ~1.8454.
An automated preprocessing pipeline for Telco Customer Churn data, including cleaning, feature engineering, and CI with GitHub Actions.
Add a description, image, and links to the ml-systems topic page so that developers can more easily learn about it.
To associate your repository with the ml-systems topic, visit your repo's landing page and select "manage topics."