Skip to content

Vector Memory

Target Architecture — Final-State Design

This page describes the final-state semantic memory of the Knowledge Platform: how knowledge is chunked, embedded, indexed in Qdrant, and retrieved by semantic and hybrid search. Owned by VectorMemoryService, EmbeddingService, and KnowledgeSearchService.

Vector memory is the platform's similarity memory. Where the knowledge graph captures explicit relationships, vector memory captures meaning: it lets an agent ask "have we solved something like this before?" and retrieve the most semantically relevant blueprints, decisions, code, documentation, and patterns — even when no explicit link exists. It is a core input to the Context Builder.

Pipeline Overview

flowchart LR
    Record["MemoryRecord"] --> Chunk["ArtifactChunkingWorker /<br/>MarkdownChunkingWorker"]
    Chunk --> Chunks["VectorChunk[]"]
    Chunks --> Embed["EmbeddingService<br/>(EmbeddingJob)"]
    Embed --> Vec["VectorDocument"]
    Vec --> Qdrant[("Qdrant collection")]
    Query["search/semantic | search/hybrid"] --> VM["VectorMemoryService"]
    VM --> Qdrant
    VM --> MI["MetadataIndexService"]
    VM --> Fuse["fuse + re-rank"]
    Fuse --> Results["ranked results"]
Hold "Alt" / "Option" to enable pan & zoom

Embeddings

  • Embeddings are produced by EmbeddingService through ConnectSoft.Extensions.AI.*, which abstracts the embedding model behind a port so models can be swapped or upgraded without touching callers.
  • Each VectorDocument records its modelVersion; when the model is upgraded, the EmbeddingRefreshWorker re-embeds affected cohorts so the index stays consistent (mixed-model search is never served).
  • contentHash ties vectors to source content — unchanged content is never re-embedded, which keeps cost and latency down across the multi-tenant estate.

Chunking

Large artifacts are split into overlapping VectorChunks so retrieval is precise and stays within token limits:

Content type Worker Strategy
Artifacts (blueprints, models, configs) ArtifactChunkingWorker Structure-aware splitting with overlap
Markdown / docs MarkdownChunkingWorker Heading-aware sections, maxTokens per chunk
Code CodeEmbeddingWorker (via CodeSymbolExtractionWorker) Symbol-level chunks (type/method granularity)

Each chunk records ordinal, span, and tokenCount; the parent VectorDocument records chunkCount and contentHash.

Qdrant Collections

Vectors are stored in Qdrant, the primary vector store. Collections are partitioned for isolation and relevance:

Collection (per tenant) Holds Payload filters
kn-artifacts-{tenantId} Blueprints, models, contracts, configs projectId, artifactType, classification
kn-docs-{tenantId} Documentation, design docs projectId, sourceRef, classification
kn-code-{tenantId} Code symbols projectId, repository, symbolKind
kn-patterns-{tenantId} Knowledge patterns category, classification
kn-decisions-{tenantId} Decision records projectId, status

Each collection uses an HNSW index for approximate nearest-neighbour search with payload indexes on tenantId, projectId, kind, and classification so filtered ANN search remains tenant-isolated and governance-aware.

Managed alternative — Azure AI Search

VectorMemoryService abstracts the backend behind a common port. For tenants or regions that require a fully managed service, Azure AI Search (vector + semantic) is the supported alternative; the rest of the platform — chunking, jobs, fusion, governance — is unchanged. Qdrant remains the default primary store.

POST /knowledge/search/hybrid fuses dense-vector results with structured-metadata results, then re-ranks:

  1. Dense retrieval — ANN over the relevant Qdrant collection(s) for the query embedding.
  2. Structured retrievalMetadataIndexService returns exact matches for the hard filters.
  3. Fusion — results are combined with Reciprocal Rank Fusion (default) or weighted score fusion.
  4. Re-rank — blended scoring adds recency, quality score, and reuse signals (mirroring the Context Builder ranking).
  5. Govern — every candidate is access-checked; disallowed sources are dropped or redacted.

Pure semantic (search/semantic) and pure metadata (search/metadata) are available when only one mode is needed. See APIs.

EmbeddingJob Lifecycle

stateDiagram-v2
    [*] --> Queued
    Queued --> Running: EmbeddingWorker claims
    Running --> Completed: vectors upserted
    Running --> Failed: error
    Failed --> Retrying: EmbeddingRetryWorker (transient)
    Retrying --> Running
    Failed --> DeadLettered: max attempts exceeded
    Completed --> Refreshing: model upgrade
    Refreshing --> Completed: re-embedded
    Completed --> [*]
    DeadLettered --> [*]
Hold "Alt" / "Option" to enable pan & zoom
State Owner Notes
Queued EmbeddingService Created from ChunksReady
Running EmbeddingWorker Embeds chunks, upserts to Qdrant
Completed EmbeddingWorker Emits EmbeddingCompleted; VectorDocument live
Failed / Retrying EmbeddingRetryWorker Transient failures retried with capped backoff
DeadLettered Preserved for replay after fix
Refreshing EmbeddingRefreshWorker Re-embeds on model upgrade; emits EmbeddingRefreshed

Quality, Freshness, and Governance

  • QualityKnowledgeQualityService scores records; low-quality content is down-ranked in retrieval.
  • FreshnessStaleMemoryWorker flags superseded content so the builder prefers current versions.
  • Governanceclassification is a payload field on every vector; MemoryPolicyService enforces access at retrieval time, and MemoryRedactionService redacts Confidential matches. Secret content is never embedded.

Relationship to Existing Implementation

Implemented

The indexing and embedding design builds on the existing Knowledge & Memory Indices and Knowledge & Memory System documentation.