- Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Display Name
Iris
Category
Tooling
Sub-Category
General
Primary Link
https://github.com/iris-eval/mcp-server
Author Name
iris-eval
Author Link
License
MIT
Description
MCP-native agent evaluation server that scores output quality, catches safety failures, and enforces cost budgets. Ships 12 deterministic eval rules across 4 categories (completeness, relevance, safety, cost) including PII detection, prompt injection scanning, and hallucination markers. Zero-config — add it to your MCP config and any compatible agent discovers it automatically. Includes a real-time web dashboard with trace visualization, hierarchical span trees with per-tool-call latency and token usage, and SQLite-backed storage. Works with Claude Desktop, Claude Code, Cursor, and Windsurf. Glama AAA rated. The eval layer that infrastructure monitoring misses — your observability sees 200 OK, but Iris sees the agent leaked a social security number in its response.
Validate Claims
Install and test with:
-
Add to Claude Code:
claude mcp add --transport stdio iris-eval -- npx @iris-eval/mcp-server -
Or add to any MCP config (Claude Desktop, Cursor, etc.):
{ "mcpServers": { "iris-eval": { "command": "npx", "args": ["@iris-eval/mcp-server"] } } } -
Launch the dashboard to verify:
npx @iris-eval/mcp-server --dashboard
Open http://localhost:6920
Specific Task(s)
Add Iris to your Claude Code MCP config, then run any agent task. Iris automatically logs the trace and evaluates the output. Open the dashboard at localhost:6920 to see trace trees, eval scores, cost breakdowns, and safety flags.
Specific Prompt(s)
After adding Iris as an MCP server, try any normal prompt. Iris works passively — it evaluates agent outputs without requiring special prompts. Then check the dashboard for results.
Recommendation Checklist
- No prior submission for this resource
- Repository is at least one week old
- All links are working/verified
- No other open issues for this resource
- I am a human (not a bot) submitting this