MCP Documentation

LLM Engine Service Guide

Serve private local models for generation and embeddings with opt-in routing to cloud providers when required.

Local-first inference

The default configuration downloads and serves a compact model via the `llm-local` runtime so prompts never leave your machine.

Every build runs a non-echo regression to ensure the model produces novel reasoning before it is promoted.

`mcp doctor --llm` triggers the verification harness on demand.
Evaluation metrics (entropy, similarity) stream into episodic memory for auditing.
Failures revert to the previous known-good model automatically.

`/embed` delivers dense vectors that integrate with vector memory and search services without exposing documents to the cloud.