Vector Memory¶

Target Architecture — Final-State Design

This page describes the final-state semantic memory of the Knowledge Platform: how knowledge is chunked, embedded, indexed in Qdrant, and retrieved by semantic and hybrid search. Owned by VectorMemoryService, EmbeddingService, and KnowledgeSearchService.

Vector memory is the platform's similarity memory. Where the knowledge graph captures explicit relationships, vector memory captures meaning: it lets an agent ask "have we solved something like this before?" and retrieve the most semantically relevant blueprints, decisions, code, documentation, and patterns — even when no explicit link exists. It is a core input to the Context Builder.

Pipeline Overview¶

flowchart LR
    Record["MemoryRecord"] --> Chunk["ArtifactChunkingWorker /<br/>MarkdownChunkingWorker"]
    Chunk --> Chunks["VectorChunk[]"]
    Chunks --> Embed["EmbeddingService<br/>(EmbeddingJob)"]
    Embed --> Vec["VectorDocument"]
    Vec --> Qdrant[("Qdrant collection")]
    Query["search/semantic | search/hybrid"] --> VM["VectorMemoryService"]
    VM --> Qdrant
    VM --> MI["MetadataIndexService"]
    VM --> Fuse["fuse + re-rank"]
    Fuse --> Results["ranked results"]

Hold "Alt" / "Option" to enable pan & zoom

Embeddings¶

Embeddings are produced by EmbeddingService through ConnectSoft.Extensions.AI.*, which abstracts the embedding model behind a port so models can be swapped or upgraded without touching callers.
Each VectorDocument records its modelVersion; when the model is upgraded, the EmbeddingRefreshWorker re-embeds affected cohorts so the index stays consistent (mixed-model search is never served).
contentHash ties vectors to source content — unchanged content is never re-embedded, which keeps cost and latency down across the multi-tenant estate.

Chunking¶

Large artifacts are split into overlapping VectorChunks so retrieval is precise and stays within token limits:

Content type	Worker	Strategy
Artifacts (blueprints, models, configs)	`ArtifactChunkingWorker`	Structure-aware splitting with overlap
Markdown / docs	`MarkdownChunkingWorker`	Heading-aware sections, `maxTokens` per chunk
Code	`CodeEmbeddingWorker` (via `CodeSymbolExtractionWorker`)	Symbol-level chunks (type/method granularity)

Each chunk records ordinal, span, and tokenCount; the parent VectorDocument records chunkCount and contentHash.

Qdrant Collections¶

Vectors are stored in Qdrant, the primary vector store. Collections are partitioned for isolation and relevance:

Collection (per tenant)	Holds	Payload filters
`kn-artifacts-{tenantId}`	Blueprints, models, contracts, configs	`projectId`, `artifactType`, `classification`
`kn-docs-{tenantId}`	Documentation, design docs	`projectId`, `sourceRef`, `classification`
`kn-code-{tenantId}`	Code symbols	`projectId`, `repository`, `symbolKind`
`kn-patterns-{tenantId}`	Knowledge patterns	`category`, `classification`
`kn-decisions-{tenantId}`	Decision records	`projectId`, `status`

Each collection uses an HNSW index for approximate nearest-neighbour search with payload indexes on tenantId, projectId, kind, and classification so filtered ANN search remains tenant-isolated and governance-aware.

Managed alternative — Azure AI Search

VectorMemoryService abstracts the backend behind a common port. For tenants or regions that require a fully managed service, Azure AI Search (vector + semantic) is the supported alternative; the rest of the platform — chunking, jobs, fusion, governance — is unchanged. Qdrant remains the default primary store.

Hybrid Search¶

POST /knowledge/search/hybrid fuses dense-vector results with structured-metadata results, then re-ranks:

Dense retrieval — ANN over the relevant Qdrant collection(s) for the query embedding.
Structured retrieval — MetadataIndexService returns exact matches for the hard filters.
Fusion — results are combined with Reciprocal Rank Fusion (default) or weighted score fusion.
Re-rank — blended scoring adds recency, quality score, and reuse signals (mirroring the Context Builder ranking).
Govern — every candidate is access-checked; disallowed sources are dropped or redacted.

Pure semantic (search/semantic) and pure metadata (search/metadata) are available when only one mode is needed. See APIs.

EmbeddingJob Lifecycle¶

stateDiagram-v2
    [*] --> Queued
    Queued --> Running: EmbeddingWorker claims
    Running --> Completed: vectors upserted
    Running --> Failed: error
    Failed --> Retrying: EmbeddingRetryWorker (transient)
    Retrying --> Running
    Failed --> DeadLettered: max attempts exceeded
    Completed --> Refreshing: model upgrade
    Refreshing --> Completed: re-embedded
    Completed --> [*]
    DeadLettered --> [*]

Hold "Alt" / "Option" to enable pan & zoom

State	Owner	Notes
`Queued`	`EmbeddingService`	Created from `ChunksReady`
`Running`	`EmbeddingWorker`	Embeds chunks, upserts to Qdrant
`Completed`	`EmbeddingWorker`	Emits `EmbeddingCompleted`; `VectorDocument` live
`Failed` / `Retrying`	`EmbeddingRetryWorker`	Transient failures retried with capped backoff
`DeadLettered`	—	Preserved for replay after fix
`Refreshing`	`EmbeddingRefreshWorker`	Re-embeds on model upgrade; emits `EmbeddingRefreshed`

Quality, Freshness, and Governance¶

Quality — KnowledgeQualityService scores records; low-quality content is down-ranked in retrieval.
Freshness — StaleMemoryWorker flags superseded content so the builder prefers current versions.
Governance — classification is a payload field on every vector; MemoryPolicyService enforces access at retrieval time, and MemoryRedactionService redacts Confidential matches. Secret content is never embedded.

Relationship to Existing Implementation¶

Implemented

The indexing and embedding design builds on the existing Knowledge & Memory Indices and Knowledge & Memory System documentation.

Vector Memory¶

Pipeline Overview¶

Embeddings¶

Chunking¶

Qdrant Collections¶

Hybrid Search¶

EmbeddingJob Lifecycle¶

Quality, Freshness, and Governance¶

Relationship to Existing Implementation¶

Related Pages¶