Vector Memory¶
Target Architecture — Final-State Design
This page describes the final-state semantic memory of the Knowledge Platform: how knowledge is chunked, embedded, indexed in Qdrant, and retrieved by semantic and hybrid search. Owned by VectorMemoryService, EmbeddingService, and KnowledgeSearchService.
Vector memory is the platform's similarity memory. Where the knowledge graph captures explicit relationships, vector memory captures meaning: it lets an agent ask "have we solved something like this before?" and retrieve the most semantically relevant blueprints, decisions, code, documentation, and patterns — even when no explicit link exists. It is a core input to the Context Builder.
Pipeline Overview¶
flowchart LR
Record["MemoryRecord"] --> Chunk["ArtifactChunkingWorker /<br/>MarkdownChunkingWorker"]
Chunk --> Chunks["VectorChunk[]"]
Chunks --> Embed["EmbeddingService<br/>(EmbeddingJob)"]
Embed --> Vec["VectorDocument"]
Vec --> Qdrant[("Qdrant collection")]
Query["search/semantic | search/hybrid"] --> VM["VectorMemoryService"]
VM --> Qdrant
VM --> MI["MetadataIndexService"]
VM --> Fuse["fuse + re-rank"]
Fuse --> Results["ranked results"]
Embeddings¶
- Embeddings are produced by
EmbeddingServicethroughConnectSoft.Extensions.AI.*, which abstracts the embedding model behind a port so models can be swapped or upgraded without touching callers. - Each
VectorDocumentrecords itsmodelVersion; when the model is upgraded, theEmbeddingRefreshWorkerre-embeds affected cohorts so the index stays consistent (mixed-model search is never served). contentHashties vectors to source content — unchanged content is never re-embedded, which keeps cost and latency down across the multi-tenant estate.
Chunking¶
Large artifacts are split into overlapping VectorChunks so retrieval is precise and stays within token limits:
| Content type | Worker | Strategy |
|---|---|---|
| Artifacts (blueprints, models, configs) | ArtifactChunkingWorker |
Structure-aware splitting with overlap |
| Markdown / docs | MarkdownChunkingWorker |
Heading-aware sections, maxTokens per chunk |
| Code | CodeEmbeddingWorker (via CodeSymbolExtractionWorker) |
Symbol-level chunks (type/method granularity) |
Each chunk records ordinal, span, and tokenCount; the parent VectorDocument records chunkCount and contentHash.
Qdrant Collections¶
Vectors are stored in Qdrant, the primary vector store. Collections are partitioned for isolation and relevance:
| Collection (per tenant) | Holds | Payload filters |
|---|---|---|
kn-artifacts-{tenantId} |
Blueprints, models, contracts, configs | projectId, artifactType, classification |
kn-docs-{tenantId} |
Documentation, design docs | projectId, sourceRef, classification |
kn-code-{tenantId} |
Code symbols | projectId, repository, symbolKind |
kn-patterns-{tenantId} |
Knowledge patterns | category, classification |
kn-decisions-{tenantId} |
Decision records | projectId, status |
Each collection uses an HNSW index for approximate nearest-neighbour search with payload indexes on tenantId, projectId, kind, and classification so filtered ANN search remains tenant-isolated and governance-aware.
Managed alternative — Azure AI Search
VectorMemoryService abstracts the backend behind a common port. For tenants or regions that require a fully managed service, Azure AI Search (vector + semantic) is the supported alternative; the rest of the platform — chunking, jobs, fusion, governance — is unchanged. Qdrant remains the default primary store.
Hybrid Search¶
POST /knowledge/search/hybrid fuses dense-vector results with structured-metadata results, then re-ranks:
- Dense retrieval — ANN over the relevant Qdrant collection(s) for the query embedding.
- Structured retrieval —
MetadataIndexServicereturns exact matches for the hard filters. - Fusion — results are combined with Reciprocal Rank Fusion (default) or weighted score fusion.
- Re-rank — blended scoring adds recency, quality score, and reuse signals (mirroring the Context Builder ranking).
- Govern — every candidate is access-checked; disallowed sources are dropped or redacted.
Pure semantic (search/semantic) and pure metadata (search/metadata) are available when only one mode is needed. See APIs.
EmbeddingJob Lifecycle¶
stateDiagram-v2
[*] --> Queued
Queued --> Running: EmbeddingWorker claims
Running --> Completed: vectors upserted
Running --> Failed: error
Failed --> Retrying: EmbeddingRetryWorker (transient)
Retrying --> Running
Failed --> DeadLettered: max attempts exceeded
Completed --> Refreshing: model upgrade
Refreshing --> Completed: re-embedded
Completed --> [*]
DeadLettered --> [*]
| State | Owner | Notes |
|---|---|---|
Queued |
EmbeddingService |
Created from ChunksReady |
Running |
EmbeddingWorker |
Embeds chunks, upserts to Qdrant |
Completed |
EmbeddingWorker |
Emits EmbeddingCompleted; VectorDocument live |
Failed / Retrying |
EmbeddingRetryWorker |
Transient failures retried with capped backoff |
DeadLettered |
— | Preserved for replay after fix |
Refreshing |
EmbeddingRefreshWorker |
Re-embeds on model upgrade; emits EmbeddingRefreshed |
Quality, Freshness, and Governance¶
- Quality —
KnowledgeQualityServicescores records; low-quality content is down-ranked in retrieval. - Freshness —
StaleMemoryWorkerflags superseded content so the builder prefers current versions. - Governance —
classificationis a payload field on every vector;MemoryPolicyServiceenforces access at retrieval time, andMemoryRedactionServiceredactsConfidentialmatches.Secretcontent is never embedded.
Relationship to Existing Implementation¶
Implemented
The indexing and embedding design builds on the existing Knowledge & Memory Indices and Knowledge & Memory System documentation.