Skip to main content
Plan → Execute → Verify

How Personal Assistant System (PAS) runs autonomous work safely

The orchestrator plans directed workflows, uses approved tools, and verifies results—all on your machine by default.

The Orchestrator

The orchestrator converts natural language goals into a directed plan (DAG). It enforces autonomy budgets, handles retries, and pauses for approvals automatically. Every decision is logged to episodic memory so you can replay a run step-by-step.

Directed planning

Goals become DAGs with explicit dependencies, retries, and fallbacks.

Autonomy budgets

Bound each run with time, token, and cost budgets that cannot be exceeded without approval.

Approvals

Pause at high-risk nodes. Continue only when an approver confirms the action.

Audit trail

Every step writes structured logs to episodic memory and emits metrics.

API endpoints: /tasks, /tasks/{id}/run, /approvals
DAG validated against tool schemas
Budget remaining: 34% compute / 68% token
Pending approvals: 1 (docs.write_doc)
Policy mode: constrained • Outbound disabled

Tools & Memory

Personal Assistant System (PAS) ships with reader tools for code, documents, vector memory, and data. Writer tools are gated with approvals. Vector and knowledge graph stores give the assistant durable recall.

Reader tools first

Search docs, inspect code, query data sources—all scoped by allowlists.

Writer tools (gated)

Diff proposals, doc writes, and outbound comms require explicit approval.

Vector memory

Embed content locally with llm-local and retrieve top-K results without network calls.

Knowledge graph & episodic

Structured facts and chronological logs give the assistant durable recall.

Tools: docs.search_docs, code.search, vector.search, kg.query, sql.query

Retrieval + recall example

Goal
"Summarize latest sprint notes and highlight blockers."
Steps

1. docs.search_docs("sprint notes") → returns markdown files

2. vector.search(top_k=5) → fetch prior decision context

3. llm-local.generate → create summary & blocker table

4. docs.write_doc (gated) → approval required before publishing

Artifacts
Markdown summary, blocker checklist, audit log with hashes.

Verification with mcp-eval

Every plan can include evaluation nodes that run before results are published. Failed checks trigger revisions or hold for approval, guaranteeing that risky actions are reviewed.

mcp-eval

Bundle rubric checks, assertions, and unit-test stubs to validate outputs.

Self-check loops

The orchestrator can branch into revision steps if evaluations fail.

Policy integration

Mark specific tools as requiring eval passes before the plan can complete.

Eval flow example

  1. Prepare: orchestrator registers eval nodes in the DAG.
  2. Run: eval.run_unit_tests executes containerized tests (CI-friendly).
  3. Score: eval.score_qa checks rubric adherence for natural language outputs.
  4. Decide: pass → continue; fail → branch into revision or hold for approval.
  5. Record: episodic.append stores outcomes with artifacts for audit.
Connect evaluations to your CI pipelines—every PR can require eval pass results before merging.

Safety modes & policy controls

Choose the autonomy posture that fits your environment. Switch modes instantly and audit every change.

Constrained (default)

Manual approvals required for writer tools, outbound connections disabled, budgets enforced.

Autonomous

Pre-approved plans can run unattended but still respect budgets and risk classes.

Rate limits & allowlists

Throttle tool usage and restrict access by path, domain, or schema.

Tool risk classes

Tag tools with Low/Medium/High risk and tailor approval rules accordingly.

All features run locally by default. Cloud LLMs are opt-in and require explicit configuration.

Plan → Execute → Verify loop

1

Plan

Create DAG with explicit edges, retries, and budgets.

2

Execute

Call approved tools with automatic telemetry and guardrails.

3

Verify

Run eval harnesses and gather artifacts for review.

4

Approve

Pause for consent when policy requires it or when eval signals fail.

5

Publish

Deliver results, log everything, and notify subscribers.

Every node emits structured audit events and stores artifacts in episodic memory. Approvers can replay any run.

Ready to see it in action?

Download Personal Assistant System (PAS), run the verification script, and explore the sample plans included in the repo.