Skip to content

Storage

Target Architecture — Final-State Design

This page describes the final-state polyglot storage of the Knowledge Platform: which data lives where, which service owns it, how it is accessed, and how long it is retained.

The Knowledge Platform deliberately uses the right store for each shape of knowledge. Structured metadata and the graph live in a relational engine; high-dimensional embeddings live in a vector database; large artifact bodies live in object storage; durable, reviewable memory lives in source control; hot context lives in an in-memory cache; and runtime telemetry lives in the observability backend. Each store is owned by specific services and isolated by tenantId.

Storage Map

flowchart TB
    subgraph Services["Knowledge Platform Services"]
        KG["KnowledgeGraphService"]
        MI["MetadataIndexService"]
        Ingest["KnowledgeIngestionService"]
        VM["VectorMemoryService"]
        Embed["EmbeddingService"]
        Art["ArtifactMemoryService"]
        Code["CodebaseKnowledgeService"]
        Prompt["PromptMemoryService"]
        Decision["DecisionMemoryService"]
        Pattern["PatternCatalogService"]
        Runtime["RuntimeMemoryService"]
        CB["ContextBuilderService"]
        Gov["Governance Services"]
    end

    SQL[("PostgreSQL / Azure SQL<br/>metadata + graph")]
    Qdrant[("Qdrant<br/>vectors")]
    Blob[("Azure Blob<br/>artifact bodies")]
    Git[("Azure DevOps Git<br/>source-controlled memory")]
    Redis[("Redis<br/>hot context packages")]
    OTEL[("App Insights / OTEL<br/>runtime telemetry")]

    KG --> SQL
    MI --> SQL
    Ingest --> SQL
    Gov --> SQL
    VM --> Qdrant
    Embed --> Qdrant
    Art --> Blob
    Code --> SQL
    Code --> Qdrant
    Prompt --> Git
    Decision --> Git
    Pattern --> Git
    Runtime --> OTEL
    CB --> Redis
    CB --> SQL
Hold "Alt" / "Option" to enable pan & zoom

Data-to-Store Matrix

Data Store Owner Service Access Pattern Retention Notes
Memory records, metadata index PostgreSQL / Azure SQL KnowledgeIngestionService, MetadataIndexService High-read structured queries, dedup by contentHash Project lifetime + 24 months NHibernate via ConnectSoft.Extensions.PersistenceModel.NHibernate
Knowledge graph (nodes, edges, projections) PostgreSQL / Azure SQL KnowledgeGraphService Traversal & neighbourhood queries; projection caching Project lifetime Graph projections cached in Redis
Embeddings & chunks Qdrant (Azure AI Search alt.) VectorMemoryService, EmbeddingService ANN + filtered hybrid search Tied to source memory record Collection per (tenantId, kind); HNSW index
Artifact bodies, versions, snapshots Azure Blob ArtifactMemoryService Write-once, read-many; content-addressed Immutable; cold-tier after 12 months Path namespaced by tenantId/projectId
Code repositories & symbols PostgreSQL / Azure SQL + Qdrant CodebaseKnowledgeService, LibraryKnowledgeService Symbol lookup, dependency traversal, code search Per repository index lifecycle Embeddings in Qdrant code collection
Prompt templates & versions Azure DevOps Git PromptMemoryService Version-controlled read/commit Full history (Git) Reviewable, diffable memory
Decision records & alternatives Azure DevOps Git DecisionMemoryService Version-controlled read/commit Full history (Git) ADRs as source-controlled memory
Knowledge patterns & versions Azure DevOps Git PatternCatalogService, TemplateKnowledgeService Version-controlled read/commit Full history (Git) Promoted reusable solutions
Prompt runs PostgreSQL / Azure SQL PromptMemoryService Append + analytics queries 12 months Links runs to prompt versions + tasks
Runtime signals, incidents, feedback App Insights / OTEL + SQL index RuntimeMemoryService Time-series ingest; correlation queries Raw 30–90 days; aggregates longer Index in SQL for graph linkage
Hot context packages Redis ContextBuilderService Key lookup by contextPackageId ttlSeconds (default 1800s) Durable record also kept in SQL 90 days
Governance (policies, decisions, audits, classifications, quality) PostgreSQL / Azure SQL Governance services Policy evaluation; append-only audit Audits at compliance retention (≈7 yrs) Audit tables append-only

Store-by-Store Detail

PostgreSQL / Azure SQL — metadata + graph

The relational system of record for structured metadata, the knowledge graph, governance, and indexes. Accessed through NHibernate mappings in ConnectSoft.Extensions.PersistenceModel.NHibernate. Tenant-first composite indexes back every query; the graph uses adjacency tables (knowledge_node, knowledge_edge) with traversal indexes.

Qdrant — vectors

The primary vector store. Collections are partitioned per (tenantId, kind) with payload filters (projectId, classification) so filtered ANN search stays tenant-isolated. Azure AI Search is the supported managed alternative for tenants/regions preferring a fully managed vector service; the VectorMemoryService abstracts the backend behind a common port so the rest of the platform is unaffected.

Azure Blob — artifact bodies

Object storage for full artifact bodies and snapshots. Writes are content-addressed (contentHash) and immutable; superseded versions are retained for lineage and cold-tiered. storageRef URIs (blob://artifacts/{tenant}/{project}/{artifactId}/{version}) are stored in SQL so metadata queries never touch blobs.

Azure DevOps Git — source-controlled memory

Durable, reviewable memory — prompt templates, decision records, and knowledge patterns — lives in Git so that memory itself is versioned, diffable, and subject to pull-request review. This is what makes the factory's most important knowledge auditable and human-governable.

Redis — hot context packages

Low-latency cache for built Context Packages keyed by contextPackageId, honouring ttlSeconds. Also caches graph projections. A cache miss transparently rebuilds from the durable SQL record / source stores.

App Insights / OTEL — runtime telemetry

Runtime signals from the Observability & Feedback Platform land in the OTEL/Application Insights backend; RuntimeMemoryService indexes their identity and linkage in SQL so signals join the knowledge graph and close the improvement loop.

Cross-Cutting Storage Rules

  • Single writer — only the owning service writes a given store; everyone else reads via API/events.
  • Tenant isolationtenantId namespaces every table partition, Qdrant collection, and Blob/Git path.
  • Content-addressable dedupcontentHash prevents redundant storage and re-embedding.
  • Backup & DR — relational and blob stores use geo-redundant backups; Git is inherently replicated; Qdrant collections are snapshotted on a schedule.
  • Observability — all data access is traced (traceId) and metered through ConnectSoft.Extensions.Observability.