Available now

Fast local embeddings for retrieval and recall

Semantic search over your data with local LLM embeddings. Upsert, search, and delete vectors—all stored on your device.

What Vector Memory does

Vector Memory provides fast semantic search over your local data. It works with the LLM Engine to generate embeddings and stores them in a local vector index. Perfect for RAG workflows, knowledge recall, and context-aware assistance.

Core capabilities

Upsert vectors with metadata
Semantic search with top-K results
Delete vectors by ID or filter
Scoped indices for multi-tenant use

Integration

Works with local LLM embeddings
Pairs with Document Tools for RAG
Supports quarantine and deletion workflows

Local-first by default: All embeddings and vector storage remain on your device. Encryption optional for sensitive data.

Who benefits from Vector Memory

Individuals

Personal knowledge recall without cloud dependencies

Example: Save research snippets and recall them later with semantic search—"What did I read about X?"

Teams & Managers

Team knowledgebase retrieval on-premises

Example: Build a private team wiki with semantic search—no data leaves your infrastructure.

Developers & IT

Stable /upsert, /search, /delete contracts for RAG apps

Example: Integrate semantic search into internal tools with predictable REST endpoints.

Security & Compliance

On-disk local store with scoped indices

Control: All vectors stored locally. Optional encryption. Index allowlists prevent cross-contamination.

How it works

Chunk text and generate embeddings

Use llm.embed to convert text into semantic vectors via the local LLM.

Upsert vectors with metadata

Call vector.upsert to store vectors with IDs and metadata (source, timestamp, tags).

Search for similar vectors

Use vector.search with a query vector to retrieve top-K most similar results.

Delete or quarantine vectors

Use vector.delete to remove vectors by ID or filter criteria.

Storage: Vectors are stored in a local index on disk. No network calls required for search or retrieval.

Example workflows

Private RAG over local docs

Runs entirely offline

Input:

"What did we decide about the Q3 roadmap?"

Steps:

llm.embed (query text)
vector.search (find top-5 similar docs)
docs.get_doc (retrieve full content)
llm.generate (answer with context)

Output:

Answer with citations—all embeddings and search happen locally

Save research snippets for later recall

Runs entirely offline

Input:

"Save this article excerpt about AI safety for later recall"

Steps:

llm.embed (article excerpt)
vector.upsert (store with metadata: source, date, tags)

Output:

Snippet saved. Later: "What did I read about AI safety?" retrieves it instantly.

Build a team knowledgebase

Runs entirely offline

Input:

Index team docs and enable semantic search

Steps:

docs.index_blob (ingest team docs)
llm.embed (each doc chunk)
vector.upsert (store all chunks)
vector.search (query: "How do we handle X?")

Output:

Team wiki with semantic search—no cloud, no data exfiltration

Technical details

Key endpoints

vector.upsert
vector.search
vector.delete
vector.list_indices

View API schemas

Configuration

VECTOR_DIM — embedding dimension (default: 384)
TOP_K_DEFAULT — default search results (default: 5)
INDEX_DIR — local storage path
ENCRYPTION — optional (default: false)

Performance notes

Search: 10-50ms for 10K vectors
Upsert: 5-20ms per vector
Scales to 100K+ vectors on typical hardware

Observability

Index size and vector count
Search latency and recall metrics
Upsert/delete throughput

Security posture

Local storage by default

All vectors stored on disk. No network calls for search or retrieval.

Optional encryption

Enable encryption for sensitive data. Keys managed locally.

Scoped indices

Separate indices for different use cases. Prevents cross-contamination.

Audit logs

All upsert, search, and delete operations logged locally.

Roadmap & status

Available

Current features

Upsert, search, and delete vectors
Local LLM embeddings integration
Scoped indices and metadata filtering

Planned

Coming soon

Larger stores with FAISS/Qdrant adapters
Hybrid search (dense + sparse)
Vector compression for larger indices

View full roadmap

Frequently asked questions

Ready to build private RAG workflows?

Get started with Vector Memory in minutes. All embeddings and search happen locally.

Try RAG locally View examples Talk to us

Fast local embeddings for retrieval and recall

What Vector Memory does

Core capabilities

Integration

Who benefits from Vector Memory

Individuals

Teams & Managers

Developers & IT

Security & Compliance

How it works

Chunk text and generate embeddings

Upsert vectors with metadata

Search for similar vectors

Delete or quarantine vectors

Example workflows

Private RAG over local docs

Save research snippets for later recall

Build a team knowledgebase

Technical details

Key endpoints

Configuration

Performance notes

Observability

Security posture

Local storage by default

Optional encryption

Scoped indices

Audit logs

Roadmap & status

Current features

Coming soon

Frequently asked questions

How many vectors can I store?

Can I use cloud embeddings instead of local?

How do I enable encryption?

Can I have multiple indices?

What embedding dimension should I use?

How do I delete vectors?

Can I export my vector index?

Ready to build private RAG workflows?