v0.1Iris — The agent eval standard for MCP. 12 eval rules, open source
Open Source
Glama AAA ScoreCursor Directorynpm versionnpm downloadsGitHub starsCI statusMIT License

See what your AI agents are

actually shipping.

The agent eval standard for MCP. Score output quality, catch safety failures, enforce cost budgets — across every agent, every execution. One command to start.

$ npx @iris-eval/mcp-server
Iris Dashboard — localhost:3838
Agents: 5Traces: 1,247Cost (7d): $127.43
Total Traces
1,247
+12%
Avg Score
0.84
+0.03
Total Cost
$127.43
+8%
PII Alerts
3
-2
Recent TracesLast 24 hours
research-agentpass0.94$0.122.3s5
code-review-botpass0.87$0.041.1s3
support-agentfail0.32$0.474.8s7
data-pipelinepass0.91$0.080.6s2
content-writerwarn0.62$0.213.4s4

Works with any MCP-compatible agent

Claude DesktopCursorClaude CodeWindsurfLangChainCrewAIMCP SDKAutoGenClaude DesktopCursorClaude CodeWindsurfLangChainCrewAIMCP SDKAutoGen

The Problem

Your agents pass every health check.

Infrastructure monitoring tells you the request succeeded. It cannot tell you the answer was wrong. Your agents need a quality gate — something that scores every output for safety, accuracy, and cost before it reaches a user.

What your APM sees
Status200 OK
Latency143ms
Memory245 MB
CPU12%
Throughput847 req/min
HealthAll systems operational
What Iris sees
PII Detected
SSN pattern in output (***-**-6789)
Injection Risk
Prompt manipulation attempt detected
Cost: $0.47 / query
4.7x over $0.10 threshold
Hallucination Markers
"As an AI language model" in output
Tool call #3 error
database_lookup timed out (30s)
Quality Score
0.32 / 1.0 — FAIL

Product

Three tools. One quality standard.

Iris registers as an MCP server. Your agent discovers it and invokes its tools automatically. No SDK. No code changes.

Every execution. Every tool call. Every token.

log_trace captures full agent runs with hierarchical spans, per-tool-call latency, token usage, and cost in USD.

  • Hierarchical span tree with OpenTelemetry-compatible span kinds
  • Per-tool-call latency tracking
  • Token usage breakdown (prompt, completion, total)
  • Arbitrary metadata for custom attribution
Span Tree
AGENTresearch-agent2.3s
├─LLMsystem_prompt0.1s
├─TOOLweb_search0.8s
├─LLMsummarize_results0.4s
├─TOOLdatabase_query0.3s
├─LLMfinal_response0.7s

Built for

Three problems. One MCP server.

Every team building AI agents hits the same walls. Iris was built to tear them down — without touching your code.

Developers shipping MCP agents

You deployed an agent and you have no idea what it's doing.

Iris traces every execution, tool call, and token automatically. No SDK. No code changes. Add it to your MCP config and start seeing everything.

60s
to first trace
Teams monitoring agent costs

Your agent burned $0.47 on a single query and your APM showed 200 OK.

Iris tracks cost per trace, per agent, per time window. Set budget thresholds and get flagged when agents overspend — before finance finds out.

$0.07avg cost visibility per trace
Companies preventing PII leaks

Your agent leaked a Social Security number in its output and nobody noticed for 3 months.

Iris evaluates every output against 12 built-in rules including PII detection (SSN, credit card, phone, email), prompt injection, and hallucination markers. Real-time, every trace.

12built-in safety eval rules

Join the community

0
MCP tools
log_trace, evaluate_output, get_traces
0
Built-in eval rules
Completeness, relevance, safety, cost
<0ms
Eval latency
Heuristic rules. Fast and deterministic.
0
Lines of code to integrate
Add to MCP config. You're done.

Open Source — Free Forever to Self-Host

60 seconds to first trace.

Install Iris locally and start seeing what your agents are doing. Works with Claude Desktop, Cursor, Windsurf, or any MCP-compatible agent. Free, MIT-licensed, your data stays on your machine.

claude_desktop_config.json
{
  "mcpServers": {
    "iris-eval": {
      "command": "npx",
      "args": ["@iris-eval/mcp-server"]
    }
  }
}
Terminal
$ npm install -g @iris-eval/mcp-server
$ iris-mcp --dashboard
✓ Dashboard running at http://localhost:3838
Cursor
Install Iris in Cursor

One-click install for Cursor IDE.
No config file needed.

Pricing

Free to self-host. Cloud when you're ready.

The open-source core is MIT licensed with no limits. The cloud adds team dashboards, alerting, and managed infrastructure — starting free.

Open Source

Self-Hosted

$0forever

Everything you need to evaluate your MCP agents in production. Your machine, your data, your eval rules.

  • 3 MCP tools (log_trace, evaluate_output, get_traces)
  • 12 built-in eval rules + custom rules
  • Web dashboard with trace visualization
  • SQLite storage — zero infrastructure
  • Production security (auth, rate limiting)
  • Cost tracking per trace
  • Docker + npm + npx install
  • Community support (GitHub + Discord)
FreeComing Soon

Cloud Starter

$0/month

Run evaluations in the cloud with no commitment. Same eval engine, managed for you. No credit card.

  • Everything in Self-Hosted, plus:
  • 10,000 evaluations / month
  • 7-day eval history
  • 1 team member
  • Managed PostgreSQL
  • Personal dashboard
  • No credit card required
Most PopularComing Soon

Cloud Pro

$49/month

For teams that need shared eval results, alerting on quality regressions, and room to scale.

  • Everything in Starter, plus:
  • 25,000 evaluations included
  • $0.005 per additional evaluation
  • 90-day eval history
  • Unlimited team members
  • Team dashboards with shared views
  • Alerting (webhook + email)
  • API key management
  • CSV / JSON data export
  • Priority support
CustomComing Soon

Enterprise

Custom

For organizations that need audit-grade evaluation records, compliance, and dedicated support.

  • Everything in Pro, plus:
  • SSO / SAML (Okta, Azure AD, Google)
  • RBAC with custom roles
  • Audit logs with export
  • SOC 2 Type II documentation
  • Custom retention policies
  • SLA with uptime guarantee
  • Dedicated support + onboarding
  • EU AI Act compliance support

All plans include unlimited eval rules, both transports (stdio + HTTP), and full API access.
Waitlist members get founding-member pricing and a direct line to shape the roadmap.

Get early access to Iris Cloud

No spam. One email when the cloud tier launches.

I kept running into the same problem building AI agents: once they're running, you have no visibility into what they're actually doing. Traditional monitoring tells you the request succeeded. It can't tell you the agent leaked PII, hallucinated an answer, or burned through your budget on a single query.

So I built Iris — an MCP server that any agent discovers and uses automatically. No SDK. No code changes. Just add it to your config and start seeing everything.

Ian Parent
Founder & Builder

Roadmap

Built in public. Shipping fast.

v0.1Released

Core MCP Server

3 tools, 12 eval rules, SQLite storage, web dashboard, production security

v0.2Planned

Cloud Tier

PostgreSQL, multi-tenancy, team dashboards, API key management

v0.3Planned

Alerting & Retention

Alert rules, webhooks, email notifications, retention policies

v0.4Planned

LLM-as-Judge

Semantic evaluation, OpenTelemetry export, drift detection, A/B testing

v0.5Planned

Enterprise

SSO/SAML, RBAC, audit logs, SOC 2 compliance