Skip to main content
Available now

Fast local embeddings for retrieval and recall

Semantic search over your data with local LLM embeddings. Upsert, search, and delete vectors—all stored on your device.

What Vector Memory does

Vector Memory provides fast semantic search over your local data. It works with the LLM Engine to generate embeddings and stores them in a local vector index. Perfect for RAG workflows, knowledge recall, and context-aware assistance.

Core capabilities

  • Upsert vectors with metadata
  • Semantic search with top-K results
  • Delete vectors by ID or filter
  • Scoped indices for multi-tenant use

Integration

  • Works with local LLM embeddings
  • Pairs with Document Tools for RAG
  • Supports quarantine and deletion workflows
Local-first by default: All embeddings and vector storage remain on your device. Encryption optional for sensitive data.

Who benefits from Vector Memory

Individuals

Personal knowledge recall without cloud dependencies

Example: Save research snippets and recall them later with semantic search—"What did I read about X?"

Teams & Managers

Team knowledgebase retrieval on-premises

Example: Build a private team wiki with semantic search—no data leaves your infrastructure.

Developers & IT

Stable /upsert, /search, /delete contracts for RAG apps

Example: Integrate semantic search into internal tools with predictable REST endpoints.

Security & Compliance

On-disk local store with scoped indices

Control: All vectors stored locally. Optional encryption. Index allowlists prevent cross-contamination.

How it works

1

Chunk text and generate embeddings

Use llm.embed to convert text into semantic vectors via the local LLM.

2

Upsert vectors with metadata

Call vector.upsert to store vectors with IDs and metadata (source, timestamp, tags).

3

Search for similar vectors

Use vector.search with a query vector to retrieve top-K most similar results.

4

Delete or quarantine vectors

Use vector.delete to remove vectors by ID or filter criteria.

Storage: Vectors are stored in a local index on disk. No network calls required for search or retrieval.

Example workflows

Private RAG over local docs

Runs entirely offline
Input:

"What did we decide about the Q3 roadmap?"

Steps:
  1. llm.embed (query text)
  2. vector.search (find top-5 similar docs)
  3. docs.get_doc (retrieve full content)
  4. llm.generate (answer with context)
Output:

Answer with citations—all embeddings and search happen locally

Save research snippets for later recall

Runs entirely offline
Input:

"Save this article excerpt about AI safety for later recall"

Steps:
  1. llm.embed (article excerpt)
  2. vector.upsert (store with metadata: source, date, tags)
Output:

Snippet saved. Later: "What did I read about AI safety?" retrieves it instantly.

Build a team knowledgebase

Runs entirely offline
Input:

Index team docs and enable semantic search

Steps:
  1. docs.index_blob (ingest team docs)
  2. llm.embed (each doc chunk)
  3. vector.upsert (store all chunks)
  4. vector.search (query: "How do we handle X?")
Output:

Team wiki with semantic search—no cloud, no data exfiltration

Technical details

Key endpoints

  • vector.upsert
  • vector.search
  • vector.delete
  • vector.list_indices
View API schemas

Configuration

  • VECTOR_DIM — embedding dimension (default: 384)
  • TOP_K_DEFAULT — default search results (default: 5)
  • INDEX_DIR — local storage path
  • ENCRYPTION — optional (default: false)

Performance notes

  • Search: 10-50ms for 10K vectors
  • Upsert: 5-20ms per vector
  • Scales to 100K+ vectors on typical hardware

Observability

  • Index size and vector count
  • Search latency and recall metrics
  • Upsert/delete throughput

Security posture

Local storage by default

All vectors stored on disk. No network calls for search or retrieval.

Optional encryption

Enable encryption for sensitive data. Keys managed locally.

Scoped indices

Separate indices for different use cases. Prevents cross-contamination.

Audit logs

All upsert, search, and delete operations logged locally.

Roadmap & status

Available

Current features

  • Upsert, search, and delete vectors
  • Local LLM embeddings integration
  • Scoped indices and metadata filtering
Planned

Coming soon

  • Larger stores with FAISS/Qdrant adapters
  • Hybrid search (dense + sparse)
  • Vector compression for larger indices
View full roadmap

Frequently asked questions

Ready to build private RAG workflows?

Get started with Vector Memory in minutes. All embeddings and search happen locally.