Skip to content

Resource Recommendation: Iris — MCP-native agent evaluation server #1177

@irparent

Description

@irparent

Display Name

Iris

Category

Tooling

Sub-Category

General

Primary Link

https://github.com/iris-eval/mcp-server

Author Name

iris-eval

Author Link

https://github.com/iris-eval

License

MIT

Description

MCP-native agent evaluation server that scores output quality, catches safety failures, and enforces cost budgets. Ships 12 deterministic eval rules across 4 categories (completeness, relevance, safety, cost) including PII detection, prompt injection scanning, and hallucination markers. Zero-config — add it to your MCP config and any compatible agent discovers it automatically. Includes a real-time web dashboard with trace visualization, hierarchical span trees with per-tool-call latency and token usage, and SQLite-backed storage. Works with Claude Desktop, Claude Code, Cursor, and Windsurf. Glama AAA rated. The eval layer that infrastructure monitoring misses — your observability sees 200 OK, but Iris sees the agent leaked a social security number in its response.

Validate Claims

Install and test with:

  1. Add to Claude Code:
    claude mcp add --transport stdio iris-eval -- npx @iris-eval/mcp-server

  2. Or add to any MCP config (Claude Desktop, Cursor, etc.):

    { "mcpServers": { "iris-eval": { "command": "npx", "args": ["@iris-eval/mcp-server"] } } }
  3. Launch the dashboard to verify:
    npx @iris-eval/mcp-server --dashboard
    Open http://localhost:6920

Specific Task(s)

Add Iris to your Claude Code MCP config, then run any agent task. Iris automatically logs the trace and evaluates the output. Open the dashboard at localhost:6920 to see trace trees, eval scores, cost breakdowns, and safety flags.

Specific Prompt(s)

After adding Iris as an MCP server, try any normal prompt. Iris works passively — it evaluates agent outputs without requiring special prompts. Then check the dashboard for results.

Recommendation Checklist

  • No prior submission for this resource
  • Repository is at least one week old
  • All links are working/verified
  • No other open issues for this resource
  • I am a human (not a bot) submitting this

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions