Fast local embeddings for retrieval and recall
Semantic search over your data with local LLM embeddings. Upsert, search, and delete vectors—all stored on your device.
What Vector Memory does
Vector Memory provides fast semantic search over your local data. It works with the LLM Engine to generate embeddings and stores them in a local vector index. Perfect for RAG workflows, knowledge recall, and context-aware assistance.
Core capabilities
- Upsert vectors with metadata
- Semantic search with top-K results
- Delete vectors by ID or filter
- Scoped indices for multi-tenant use
Integration
- Works with local LLM embeddings
- Pairs with Document Tools for RAG
- Supports quarantine and deletion workflows
Who benefits from Vector Memory
Individuals
Personal knowledge recall without cloud dependencies
Teams & Managers
Team knowledgebase retrieval on-premises
Developers & IT
Stable /upsert, /search, /delete contracts for RAG apps
Security & Compliance
On-disk local store with scoped indices
How it works
Chunk text and generate embeddings
Use llm.embed to convert text into semantic vectors via the local LLM.
Upsert vectors with metadata
Call vector.upsert to store vectors with IDs and metadata (source, timestamp, tags).
Search for similar vectors
Use vector.search with a query vector to retrieve top-K most similar results.
Delete or quarantine vectors
Use vector.delete to remove vectors by ID or filter criteria.
Example workflows
Private RAG over local docs
Runs entirely offline"What did we decide about the Q3 roadmap?"
- llm.embed (query text)
- vector.search (find top-5 similar docs)
- docs.get_doc (retrieve full content)
- llm.generate (answer with context)
Answer with citations—all embeddings and search happen locally
Save research snippets for later recall
Runs entirely offline"Save this article excerpt about AI safety for later recall"
- llm.embed (article excerpt)
- vector.upsert (store with metadata: source, date, tags)
Snippet saved. Later: "What did I read about AI safety?" retrieves it instantly.
Build a team knowledgebase
Runs entirely offlineIndex team docs and enable semantic search
- docs.index_blob (ingest team docs)
- llm.embed (each doc chunk)
- vector.upsert (store all chunks)
- vector.search (query: "How do we handle X?")
Team wiki with semantic search—no cloud, no data exfiltration
Technical details
Configuration
VECTOR_DIM— embedding dimension (default: 384)TOP_K_DEFAULT— default search results (default: 5)INDEX_DIR— local storage pathENCRYPTION— optional (default: false)
Performance notes
- Search: 10-50ms for 10K vectors
- Upsert: 5-20ms per vector
- Scales to 100K+ vectors on typical hardware
Observability
- Index size and vector count
- Search latency and recall metrics
- Upsert/delete throughput
Security posture
Local storage by default
All vectors stored on disk. No network calls for search or retrieval.
Optional encryption
Enable encryption for sensitive data. Keys managed locally.
Scoped indices
Separate indices for different use cases. Prevents cross-contamination.
Audit logs
All upsert, search, and delete operations logged locally.
Roadmap & status
Current features
- Upsert, search, and delete vectors
- Local LLM embeddings integration
- Scoped indices and metadata filtering
Coming soon
- Larger stores with FAISS/Qdrant adapters
- Hybrid search (dense + sparse)
- Vector compression for larger indices
Frequently asked questions
Ready to build private RAG workflows?
Get started with Vector Memory in minutes. All embeddings and search happen locally.