🧠 AI Engineering Guide

Designed for software engineers crossing over into AI, this guide focuses on system architecture, deployment patterns, and operational rigor for LLMs, RAG, Prompt Engineering, Agents, and Evals.

📖 Read the live guide here

Who This Is For

This guide is for you if:

You're a senior software engineer (5+ years) moving into AI/ML engineering
You're preparing for system design interviews at AI-focused companies or big tech AI teams
You build distributed systems and want to understand how AI components change the design
You want to go from "I've used the OpenAI API" to "I can design and defend a production AI system"

This guide is NOT for you if:

You're looking for ML theory or math (read Goodfellow's Deep Learning textbook instead)
You want paper summaries without practical context
You're a researcher who needs academic rigor over engineering pragmatism

01-transformer-intuition — How transformers work, no math
02-tokenization — Tokens are money
03-attention-mechanisms — Self-attention, KV cache, Flash Attention
04-context-windows — Long context tradeoffs
05-training-pipeline — RAG vs fine-tune vs prompt (THE decision)
06-model-landscape — Model comparison table
07-small-language-models — When to use Phi/Gemma instead of GPT-4o
08-quantization — INT8/INT4, GGUF, running models on cheap hardware
09-fine-tuning — LoRA, QLoRA, when NOT to fine-tune
10-distillation-and-pruning — Making models cheaper

Prompt Engineering

CoT, structured generation, prompt optimization, injection defense.

01-prompting-patterns — Zero-shot to Tree of Thought
02-context-engineering — The underrated skill that separates good from great
03-structured-generation — Instructor, JSON mode, Outlines
04-prompt-optimization — DSPy, meta-prompting, eval-driven
05-prompt-security — Injection attacks and defenses

RAG - Retrieval Augmented Generation

The complete RAG stack: chunking, embeddings, vector DBs, hybrid search, advanced patterns.

01-rag-fundamentals — What/why/when, the naive pipeline
02-embedding-models — MTEB, dimensions, Matryoshka
03-vector-indexing — HNSW vs IVF, FAISS
04-vector-databases — Decision matrix, cost at scale
05-chunking-strategies — THE key lever most teams get wrong
06-hybrid-search — Dense + BM25 + RRF
07-reranking — Cross-encoders, Cohere, two-stage
08-query-transformation — HyDE, multi-query, decomposition
09-advanced-rag-patterns — GraphRAG, Agentic RAG, Self-RAG, CRAG
10-multimodal-rag — ColPali, PDFs with tables and images
11-rag-evaluation — RAGAS, debug flowchart

Agents and Orchestration

ReAct, tool use, MCP, LangGraph, multi-agent systems, memory.

01-agent-fundamentals — ReAct, perception-action loop, failure modes
02-tool-use-and-function-calling — OpenAI vs Claude vs Gemini formats
03-mcp-protocol — Full MCP, server code, security
04-langchain-overview — What it does well and where it falls short
05-langgraph-deep-dive — Stateful graphs, persistence, human-in-loop
06-dspy-framework — Compile don't prompt
07-crewai-and-autogen — Honest assessment of multi-agent frameworks
08-llamaindex-haystack — Data frameworks vs orchestration frameworks
09-multi-agent-systems — When you actually need multiple agents
10-memory-and-state — Memory tiers, Mem0, Zep, checkpointing
11-agentic-patterns — Reflection, map-reduce, DAG patterns
12-browser-and-computer-use — Playwright, Claude Computer Use

LLM Evaluation

How to actually measure if your system works: RAGAS, LLM-as-judge, production eval.

01-eval-fundamentals — Why eval is hard, the eval pipeline
02-retrieval-and-rag-eval — Precision@K, MRR, NDCG, RAGAS
03-llm-as-judge — Pointwise vs pairwise, calibration
04-agent-and-e2e-eval — Task completion, A/B testing, continuous eval

Production Ops

Observability, guardrails, caching, inference infra, cost optimization.

01-observability-and-tracing — LangSmith vs Langfuse, what to log
02-guardrails-and-safety — Defense-in-depth, NeMo, LlamaGuard, Presidio
03-caching-strategies — Multi-layer caching, semantic cache
04-inference-infrastructure — GPU table, vLLM vs TGI, auto-scaling
05-drift-and-monitoring — Drift types, detection, remediation
06-mlops-for-llms — CI/CD, prompt versioning, blue-green
07-cost-optimization — Token optimization, model routing, batch

System Design Interview

Interview framework, 5 full case studies, 30 practice problems, 60+ conceptual questions.

01-interview-framework — The 45-min structure. Worth the whole repo.
02-design-patterns-catalog — Full catalog with decision tree
03-architecture-templates — 6 reference architectures with cost models
04-case-enterprise-rag — Full worked design: enterprise knowledge base
05-case-code-assistant — Full worked design: GitHub Copilot-style
06-case-customer-support — Full worked design: support automation
07-case-doc-intelligence — Full worked design: document understanding
08-case-search-engine — Full worked design: AI-powered search
09-practice-problems — 30 problems with solution skeletons
10-conceptual-questions — 60+ questions with full conversational answers

Appendices

Model pricing, glossary, cost formulas, essential papers.

model-pricing-reference — Current pricing for all major models
glossary — Terms defined in plain English
cost-estimation-formulas — Spreadsheet-ready formulas
essential-papers — The 20 papers worth reading

Demo Samples

Working implementations: RAG pipeline, LangGraph agent, MCP server, eval pipeline.

01-basic-rag — Minimal RAG in 100 lines
02-advanced-rag — Hybrid search + reranking
03-langgraph-agent — Stateful agent with tools
04-mcp-server — Working MCP server
05-eval-pipeline — RAGAS + LLM-as-judge
06-semantic-cache — Semantic caching with Redis
07-structured-output — Instructor + Pydantic

Contributing

The guide is intentionally opinionated. If you disagree with a recommendation, open an issue with your reasoning and production evidence. PRs welcome for:

Factual errors or outdated information (especially model specs and pricing)
Missing failure modes from your production experience

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
01-llm-foundations		01-llm-foundations
02-prompt-engineering		02-prompt-engineering
03-retrieval-and-rag		03-retrieval-and-rag
04-agents-and-orchestration		04-agents-and-orchestration
05-evaluation		05-evaluation
06-production-and-ops		06-production-and-ops
07-system-design-interview		07-system-design-interview
appendices		appendices
code		code
docs		docs
.gitignore		.gitignore
README.md		README.md
guide-demo.gif		guide-demo.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 AI Engineering Guide

Who This Is For

Table of Contents

LLM Foundations

Prompt Engineering

RAG - Retrieval Augmented Generation

Agents and Orchestration

LLM Evaluation

Production Ops

System Design Interview

Appendices

Demo Samples

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 AI Engineering Guide

Who This Is For

Table of Contents

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages