MCP Documentation

Evaluation & QA Service Guide

Run automated self-checks, rubric scoring, and unit tests to ensure outputs meet quality standards before delivery.

Test harness

`eval.run_unit_tests` executes inside isolated runners, collecting pass/fail metrics and logs that the orchestrator can enforce.

`eval.score_qa` evaluates text outputs against structured rubrics, enabling guardrails for summaries, drafts, or analyses.

Scores include rationales so humans can understand the verdict.
Thresholds decide whether the orchestrator retries, requests approval, or ships the result.
Rubrics are versioned for auditability.

Evaluation steps slot directly into orchestrator plans so every workflow can include automated QA before completion.