AI Engineer building tools to understand what language models are doing internally, and catching them when they fail.
Working on mechanistic interpretability, LLM evaluation, and trust/safety tooling. Most of my projects are small, sharp, and designed to answer one question well.
Trust Bench - Interpretability toolkit that probes SAE features, activations, and circuits in Llama 3.1 8B. Three probes (feature survey, hallucination, cross-lingual), statistical analysis, publication-quality viz, and a CLI.
| Project | What it does |
|---|---|
| sae-explorer | Found a single SAE feature (#10543) that fires on "and" across six languages in Gemma 2 2B. Zero false positives. |
| superposition-viz | Reproduces Anthropic's Toy Models of Superposition. Found phase transition at 0.7 sparsity. |
| activation-atlas | Layer-by-layer UMAP projections showing how neural networks organize learned representations. |
| scaling-laws | Train transformers from 100K to 10M params, fit power laws, plot the curves. Do they hold at toy scale? |
| loss-landscape | 3D surface plots of loss landscapes around trained weights. Sharpness comparison across training configs. |
| Project | What it does |
|---|---|
| calibration-probe | Measure how well LLMs know what they know. Reliability diagrams and ECE across prompting strategies. |
| attention-bench | Benchmark MHA vs GQA vs MQA vs Sliding Window. Train small transformers, compare perplexity and throughput. |
| Project | What it does |
|---|---|
| crux | Terminal dashboard for AI coding tool token usage. |
| tokenizer-arena | Compare how different LLM tokenizers handle the same text. Color-coded token boundaries. |
| gguf-inspect | Inspect GGUF model files from the terminal. Architecture, quantization, tensors, memory estimates. |
Recent posts on amaljithkuttamath.github.io/work:
- A Single Neuron for 'And' in Six Languages - Cross-lingual SAE features in Gemma 2 2B
- Why Trust Bench - The case for probing LLM internals
- How Large Language Models Actually Work - From raw text to trained model, in 200 lines of Python

