See what your AI agents are
The agent eval standard for MCP. Score output quality, catch safety failures, enforce cost budgets — across every agent, every execution. One command to start.
Works with any MCP-compatible agent
The Problem
Your agents pass every health check.
Infrastructure monitoring tells you the request succeeded. It cannot tell you the answer was wrong. Your agents need a quality gate — something that scores every output for safety, accuracy, and cost before it reaches a user.
Product
Three tools. One quality standard.
Iris registers as an MCP server. Your agent discovers it and invokes its tools automatically. No SDK. No code changes.
Every execution. Every tool call. Every token.
log_trace captures full agent runs with hierarchical spans, per-tool-call latency, token usage, and cost in USD.
- Hierarchical span tree with OpenTelemetry-compatible span kinds
- Per-tool-call latency tracking
- Token usage breakdown (prompt, completion, total)
- Arbitrary metadata for custom attribution
Built for
Three problems. One MCP server.
Every team building AI agents hits the same walls. Iris was built to tear them down — without touching your code.
“You deployed an agent and you have no idea what it's doing.”
Iris traces every execution, tool call, and token automatically. No SDK. No code changes. Add it to your MCP config and start seeing everything.
“Your agent burned $0.47 on a single query and your APM showed 200 OK.”
Iris tracks cost per trace, per agent, per time window. Set budget thresholds and get flagged when agents overspend — before finance finds out.
“Your agent leaked a Social Security number in its output and nobody noticed for 3 months.”
Iris evaluates every output against 12 built-in rules including PII detection (SSN, credit card, phone, email), prompt injection, and hallucination markers. Real-time, every trace.
Join the community
Open Source — Free Forever to Self-Host
60 seconds to first trace.
Install Iris locally and start seeing what your agents are doing. Works with Claude Desktop, Cursor, Windsurf, or any MCP-compatible agent. Free, MIT-licensed, your data stays on your machine.
Pricing
Free to self-host. Cloud when you're ready.
The open-source core is MIT licensed with no limits. The cloud adds team dashboards, alerting, and managed infrastructure — starting free.
Self-Hosted
Everything you need to evaluate your MCP agents in production. Your machine, your data, your eval rules.
- 3 MCP tools (log_trace, evaluate_output, get_traces)
- 12 built-in eval rules + custom rules
- Web dashboard with trace visualization
- SQLite storage — zero infrastructure
- Production security (auth, rate limiting)
- Cost tracking per trace
- Docker + npm + npx install
- Community support (GitHub + Discord)
Cloud Starter
Run evaluations in the cloud with no commitment. Same eval engine, managed for you. No credit card.
- Everything in Self-Hosted, plus:
- 10,000 evaluations / month
- 7-day eval history
- 1 team member
- Managed PostgreSQL
- Personal dashboard
- No credit card required
Cloud Pro
For teams that need shared eval results, alerting on quality regressions, and room to scale.
- Everything in Starter, plus:
- 25,000 evaluations included
- $0.005 per additional evaluation
- 90-day eval history
- Unlimited team members
- Team dashboards with shared views
- Alerting (webhook + email)
- API key management
- CSV / JSON data export
- Priority support
Enterprise
For organizations that need audit-grade evaluation records, compliance, and dedicated support.
- Everything in Pro, plus:
- SSO / SAML (Okta, Azure AD, Google)
- RBAC with custom roles
- Audit logs with export
- SOC 2 Type II documentation
- Custom retention policies
- SLA with uptime guarantee
- Dedicated support + onboarding
- EU AI Act compliance support
All plans include unlimited eval rules, both transports (stdio + HTTP), and full API access.
Waitlist members get founding-member pricing and a direct line to shape the roadmap.
Get early access to Iris Cloud
No spam. One email when the cloud tier launches.
“I kept running into the same problem building AI agents: once they're running, you have no visibility into what they're actually doing. Traditional monitoring tells you the request succeeded. It can't tell you the agent leaked PII, hallucinated an answer, or burned through your budget on a single query.
So I built Iris — an MCP server that any agent discovers and uses automatically. No SDK. No code changes. Just add it to your config and start seeing everything.
Research
Publications and insights.
Original research on MCP agent observability, evaluation methodology, and the evolving landscape of AI agent infrastructure.
The State of MCP Agent Observability
The gap between deploying AI agents and understanding what they're doing. Covers protocol-native observability, heuristic vs. semantic eval, cost visibility, and EU AI Act implications.
Read reportWhy Your AI Agents Need Observability
AI agents fail silently. Traditional monitoring can't see the difference between a correct response and a hallucinated one. Why protocol-native observability changes the equation.
Read postMCP Agent Observability Survey 2026
We're collecting data on how teams evaluate, monitor, and track costs for AI agents in production.
Roadmap
Built in public. Shipping fast.
Core MCP Server
3 tools, 12 eval rules, SQLite storage, web dashboard, production security
Cloud Tier
PostgreSQL, multi-tenancy, team dashboards, API key management
Alerting & Retention
Alert rules, webhooks, email notifications, retention policies
LLM-as-Judge
Semantic evaluation, OpenTelemetry export, drift detection, A/B testing
Enterprise
SSO/SAML, RBAC, audit logs, SOC 2 compliance