ml-systems

Structured notes on designing scalable and fault-tolerant ML systems, to refresh your knowledge and help you prepare for a system design interview. Covers system design, MLOps, and case studies.

Updated Jan 18, 2025

End-to-end personalized feed ranking system demonstrating retrieval → ranking pipelines, offline evaluation, realistic simulation, and business-aligned diagnostics inspired by large-scale social platforms.

machine-learning recommendation-system learning-to-rank ml-systems feed-ranking

Updated Dec 31, 2025
Python

finnpwalsh / macro-nowcast

Star

cloud-native machine learning platform for real-time inflation nowcasting

aws terraform macroeconomics mlops ml-systems

Updated Mar 23, 2026
Python

4F71 / project-ai-system-engineer

Star

Evidence-based roadmap to becoming an AI System Engineer. Mathematical foundations, ML systems, production habits, and proof-backed progression.

machine-learning linear-algebra software-engineering cosine-similarity mlops learning-in-public ai-engineering mathematics-for-ml ml-systems vector-operations

Updated Feb 15, 2026
Jupyter Notebook

711nishtha / financial-fraud-detection-app

Star

Experimental web application demonstrating how an offline-trained financial fraud detection model can be exposed through a web interface. Built with Flask and a pre-trained XGBoost model to showcase ML inference flow, feature engineering, and result communication — not a production fraud prevention system.

flask web-application fraud-detection machine-learning-demo applied-ml ml-systems model-inference

Updated Jan 24, 2026
HTML

dileepkreddy5 / real-time-ml-feature-store

Star

Production-style real-time ML feature store with low-latency inference

redis streaming kafka prometheus low-latency feature-store fastapi real-time-ml ml-systems ml-inference

Updated Feb 22, 2026
Python

kuttivicky / Waymo-e2e-profiler

Star

Profile-first ML systems project optimizing a multi-camera end-to-end driving model for hardware efficiency using PyTorch, CUDA streams, NVTX instrumentation, and Nsight Systems.

performance-engineering deep-learning async cuda pytorch gpu-optimization nvtx ml-systems nsight-systems automomous-driving

Updated Feb 12, 2026
Python

karun2328 / inference_pipeline

Star

Benchmarking and optimizing transformer inference across PyTorch, ONNXRuntime, and TensorRT with latency/throughput analysis on GPU and CPU.

gpu inference pytorch quantization tensorrt onnxruntime ml-systems

Updated Jan 16, 2026
Python

fractal360 / risk-gate-api

Star

Deterministic decision gate for AI/ML systems. Risk-Gate enforces strict, schema-driven admissibility boundaries between AI/LLM intent and real system actions. It provides a fixed, human-owned decision structure with deterministic allow/block outcomes, explicit audit logging, and environment-specific policy via configuration — no ML, no heuristics,

terraform aws-ecs system-architecture policy-enforcement fastapi applied-ml ml-systems ai-governance llm-safety deterministic-systems auditability decision-gate

Updated Jan 14, 2026
Python

alfonsocruzvelasco / learning-notes

Star

Public engineering notes (ML systems, CV, MIT courses). Notes-only; sources linked.

computer-science mit algorithms data-structures learning-notes ml-systems

Updated Jan 18, 2026

crasofuentes-hub / swarm-forge

Star

Autonomous training optimizer for nanoGPT using multi-agent patch search, empirical validation, and rollback-safe execution. TinyShakespeare val_loss improved from ~4.17 to ~1.8454.

engineering ci rollback transformers pytorch multi-agent reproducibility ray governance multi-agent-systems automl autonomous-agents checkpointing ml-systems nanogpt training-optimization auditability tinyshakespeare

Updated Mar 20, 2026
Python

nikita74939 / Eksperimen_SML_Nikita

Star

An automated preprocessing pipeline for Telco Customer Churn data, including cleaning, feature engineering, and CI with GitHub Actions.

machine-learning data-preprocessing data-pipeline github-actions ml-systems telco-churn

Updated Dec 20, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the ml-systems topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ml-systems topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml-systems

Here are 32 public repositories matching this topic...

mosaicml / composer

mryab / efficient-dl-systems

rlops / rlix

gagan-iitb / ComputerSysDesign

ovshake / mlsys-for-dummies

narendrakumarnutalapati / licitra-evidence

narendrakumarnutalapati / licitra-core

4F71 / 4F71

anastasiamkh / engineering-machine-learning-systems

dahlp94 / feed-ranking-engine

finnpwalsh / macro-nowcast

4F71 / project-ai-system-engineer

711nishtha / financial-fraud-detection-app

dileepkreddy5 / real-time-ml-feature-store

kuttivicky / Waymo-e2e-profiler

karun2328 / inference_pipeline

fractal360 / risk-gate-api

alfonsocruzvelasco / learning-notes

crasofuentes-hub / swarm-forge

nikita74939 / Eksperimen_SML_Nikita

Improve this page

Add this topic to your repo