Evaluation & QA Service Guide
Run automated self-checks, rubric scoring, and unit tests to ensure outputs meet quality standards before delivery.
Test harness
`eval.run_unit_tests` executes inside isolated runners, collecting pass/fail metrics and logs that the orchestrator can enforce.
- Supports custom commands, timeouts, and environment variables.
- Artifacts like coverage reports attach to episodic memory.
- Failures can trigger plan branches for remediation.
Rubric scoring
`eval.score_qa` evaluates text outputs against structured rubrics, enabling guardrails for summaries, drafts, or analyses.
- Scores include rationales so humans can understand the verdict.
- Thresholds decide whether the orchestrator retries, requests approval, or ships the result.
- Rubrics are versioned for auditability.
Integration
Evaluation steps slot directly into orchestrator plans so every workflow can include automated QA before completion.
- Attach evaluation nodes inline or as final gates before writer actions.
- Results propagate into notifications and final reports.
- APIs expose telemetry for dashboards and alerting.