Skip to main content
MCP Documentation

Evaluation & QA Service Guide

Run automated self-checks, rubric scoring, and unit tests to ensure outputs meet quality standards before delivery.

Test harness

`eval.run_unit_tests` executes inside isolated runners, collecting pass/fail metrics and logs that the orchestrator can enforce.

  • Supports custom commands, timeouts, and environment variables.
  • Artifacts like coverage reports attach to episodic memory.
  • Failures can trigger plan branches for remediation.

Rubric scoring

`eval.score_qa` evaluates text outputs against structured rubrics, enabling guardrails for summaries, drafts, or analyses.

  • Scores include rationales so humans can understand the verdict.
  • Thresholds decide whether the orchestrator retries, requests approval, or ships the result.
  • Rubrics are versioned for auditability.

Integration

Evaluation steps slot directly into orchestrator plans so every workflow can include automated QA before completion.

  • Attach evaluation nodes inline or as final gates before writer actions.
  • Results propagate into notifications and final reports.
  • APIs expose telemetry for dashboards and alerting.