You get an OpenAI invoice for $800 this month. You have no idea which function caused it.
Langfuse and Braintrust require you to sign up, configure API keys, and reroute traffic through their cloud proxy just to see what's happening. tokenspy intercepts in-process — one decorator, no proxy, no account, no monthly fee.
import tokenspy @tokenspy.profile def run_pipeline(query): docs = fetch_and_summarize(query) # ← costs $0.038? entities = extract_entities(docs) # ← or this? return generate_report(entities) # ← or this? run_pipeline("Analyze Q3 earnings") tokenspy.report()╔══════════════════════════════════════════════════════════════════════╗ ║ tokenspy cost report ║ ║ total: $0.0523 · 18,734 tokens · 3 calls ║ ╠══════════════════════════════════════════════════════════════════════╣ ║ fetch_and_summarize $0.038 ████████████░░░░ 73% ║ ║ └─ gpt-4o $0.038 ████████████░░░░ 73% ║ ║ generate_report $0.011 ████░░░░░░░░░░░░ 21% ║ ║ extract_entities $0.003 █░░░░░░░░░░░░░░░ 6% ║ ╠══════════════════════════════════════════════════════════════════════╣ ║ 🔴 fetch_and_summarize → switch to gpt-4o-mini: 94% cheaper ║ ╚══════════════════════════════════════════════════════════════════════╝ Now you know: fetch_and_summarize is burning 73% of your budget.
Everything Langfuse and Braintrust do, without sending a single byte to the cloud.
| Feature | v0.1 | v0.2.0 |
|---|---|---|
| Cost flame graph | ✅ | ✅ |
| Budget alerts | ✅ | ✅ |
| SQLite persistence | ✅ | ✅ |
| Structured tracing (Trace + Span) | ❌ | ✅ |
| OpenTelemetry export | ❌ | ✅ |
| Evaluations + datasets | ❌ | ✅ |
| Prompt versioning | ❌ | ✅ |
| Live web dashboard | ❌ | ✅ |
pip install tokenspy[server] tokenspy serve # → http://localhost:72345 tabs: Overview · Traces · Evaluations · Prompts · Settings — all your LLM data, local, real-time.
tokenspy.init(persist=True) with tokenspy.trace("pipeline", input={"query": q}) as t: with tokenspy.span("retrieve") as s: docs = fetch(q); s.update(output=docs) with tokenspy.span("generate", span_type="llm") as s: answer = llm_call(docs) # ← auto-linked to span t.update(output=answer) t.score("quality", 0.9)from tokenspy.eval import scorers ds = tokenspy.dataset("qa-golden") ds.add(input={"q": "Capital of France?"}, expected_output="Paris") exp = tokenspy.experiment("gpt4o-mini-v1", dataset="qa-golden", fn=my_fn, scorers=[scorers.exact_match]) exp.run().summary()p = tokenspy.prompts.push("summarizer", "Summarize in {{style}}: {{text}}") p.compile(style="concise", text="...") tokenspy.prompts.set_production("summarizer", version=2)@tokenspy.profile(budget_usd=0.10, on_exceeded="raise") def strict_agent(query): ... # raises BudgetExceededError if cost > $0.10| Langfuse | Braintrust | tokenspy | |
|---|---|---|---|
| Requires cloud proxy | ✅ yes | ✅ yes | ❌ no |
| Requires signup | ✅ yes | ✅ yes | ❌ no |
| Data leaves your machine | ✅ yes | ✅ yes | ❌ never |
| Works offline | ❌ no | ❌ no | ✅ yes |
| Zero core dependencies | ❌ no | ❌ no | ✅ yes |
| Structured tracing | ✅ yes | ✅ yes | ✅ yes |
| Evaluations + datasets | ✅ yes | ✅ yes | ✅ yes |
| LLM-as-judge scoring | ✅ yes | ✅ yes | ✅ yes |
| Prompt versioning | ✅ yes | ✅ yes | ✅ yes |
| OpenTelemetry export | ⚡ partial | ❌ no | ✅ yes |
| Flame graph by function | ❌ no | ❌ no | ✅ yes |
@decorator API | ❌ no | ❌ no | ✅ yes |
| Budget alerts | ⚡ partial | ⚡ partial | ✅ yes |
| Git commit cost tracking | ❌ no | ❌ no | ✅ yes |
| GitHub Actions cost diff | ❌ no | ❌ no | ✅ yes |
| Monthly cost | $0–$250 | $0–$300 | free forever |
| Provider | Auto-instrumented |
|---|---|
| OpenAI | chat.completions.create (sync + async + streaming) |
| Anthropic | messages.create (sync + async + streaming) |
| Google Gemini | generate_content |
| LangChain / LangGraph | Callback handler |
Exports: OpenTelemetry → Grafana, Jaeger, Datadog, Honeycomb (docs)
CI: GitHub Actions cost diff per PR (docs)
pip install tokenspy # core (zero dependencies) pip install tokenspy[otel] # + OpenTelemetry export pip install tokenspy[server] # + web dashboard (fastapi + uvicorn) pip install tokenspy[all] # openai + anthropic + langchain→ Full documentation at pinakimishra95.github.io/tokenspy
| Guide | Description |
|---|---|
| Tracing | Trace + Span context managers, auto LLM linking, scores |
| Evaluations & Datasets | Datasets, scorers, llm_judge, experiment comparison |
| Prompt Versioning | push / pull / compile / set_production |
| Web Dashboard | Local dashboard, REST API |
| OpenTelemetry | OTEL export to Grafana, Jaeger, Datadog |
| GitHub Actions | Cost diff annotations per PR |
git clone https://github.com/pinakimishra95/tokenspy cd tokenspy && pip install -e ".[dev]" pytest tests/ # 139 tests, ~0.3sIssues and PRs welcome — especially new provider support and pricing updates.
MIT © Pinaki Mishra. See LICENSE.


