Storage¶
Target Architecture — Final-State Design
This page describes the final-state polyglot storage of the Knowledge Platform: which data lives where, which service owns it, how it is accessed, and how long it is retained.
The Knowledge Platform deliberately uses the right store for each shape of knowledge. Structured metadata and the graph live in a relational engine; high-dimensional embeddings live in a vector database; large artifact bodies live in object storage; durable, reviewable memory lives in source control; hot context lives in an in-memory cache; and runtime telemetry lives in the observability backend. Each store is owned by specific services and isolated by tenantId.
Storage Map¶
flowchart TB
subgraph Services["Knowledge Platform Services"]
KG["KnowledgeGraphService"]
MI["MetadataIndexService"]
Ingest["KnowledgeIngestionService"]
VM["VectorMemoryService"]
Embed["EmbeddingService"]
Art["ArtifactMemoryService"]
Code["CodebaseKnowledgeService"]
Prompt["PromptMemoryService"]
Decision["DecisionMemoryService"]
Pattern["PatternCatalogService"]
Runtime["RuntimeMemoryService"]
CB["ContextBuilderService"]
Gov["Governance Services"]
end
SQL[("PostgreSQL / Azure SQL<br/>metadata + graph")]
Qdrant[("Qdrant<br/>vectors")]
Blob[("Azure Blob<br/>artifact bodies")]
Git[("Azure DevOps Git<br/>source-controlled memory")]
Redis[("Redis<br/>hot context packages")]
OTEL[("App Insights / OTEL<br/>runtime telemetry")]
KG --> SQL
MI --> SQL
Ingest --> SQL
Gov --> SQL
VM --> Qdrant
Embed --> Qdrant
Art --> Blob
Code --> SQL
Code --> Qdrant
Prompt --> Git
Decision --> Git
Pattern --> Git
Runtime --> OTEL
CB --> Redis
CB --> SQL
Data-to-Store Matrix¶
| Data | Store | Owner Service | Access Pattern | Retention | Notes |
|---|---|---|---|---|---|
| Memory records, metadata index | PostgreSQL / Azure SQL | KnowledgeIngestionService, MetadataIndexService |
High-read structured queries, dedup by contentHash |
Project lifetime + 24 months | NHibernate via ConnectSoft.Extensions.PersistenceModel.NHibernate |
| Knowledge graph (nodes, edges, projections) | PostgreSQL / Azure SQL | KnowledgeGraphService |
Traversal & neighbourhood queries; projection caching | Project lifetime | Graph projections cached in Redis |
| Embeddings & chunks | Qdrant (Azure AI Search alt.) | VectorMemoryService, EmbeddingService |
ANN + filtered hybrid search | Tied to source memory record | Collection per (tenantId, kind); HNSW index |
| Artifact bodies, versions, snapshots | Azure Blob | ArtifactMemoryService |
Write-once, read-many; content-addressed | Immutable; cold-tier after 12 months | Path namespaced by tenantId/projectId |
| Code repositories & symbols | PostgreSQL / Azure SQL + Qdrant | CodebaseKnowledgeService, LibraryKnowledgeService |
Symbol lookup, dependency traversal, code search | Per repository index lifecycle | Embeddings in Qdrant code collection |
| Prompt templates & versions | Azure DevOps Git | PromptMemoryService |
Version-controlled read/commit | Full history (Git) | Reviewable, diffable memory |
| Decision records & alternatives | Azure DevOps Git | DecisionMemoryService |
Version-controlled read/commit | Full history (Git) | ADRs as source-controlled memory |
| Knowledge patterns & versions | Azure DevOps Git | PatternCatalogService, TemplateKnowledgeService |
Version-controlled read/commit | Full history (Git) | Promoted reusable solutions |
| Prompt runs | PostgreSQL / Azure SQL | PromptMemoryService |
Append + analytics queries | 12 months | Links runs to prompt versions + tasks |
| Runtime signals, incidents, feedback | App Insights / OTEL + SQL index | RuntimeMemoryService |
Time-series ingest; correlation queries | Raw 30–90 days; aggregates longer | Index in SQL for graph linkage |
| Hot context packages | Redis | ContextBuilderService |
Key lookup by contextPackageId |
ttlSeconds (default 1800s) |
Durable record also kept in SQL 90 days |
| Governance (policies, decisions, audits, classifications, quality) | PostgreSQL / Azure SQL | Governance services | Policy evaluation; append-only audit | Audits at compliance retention (≈7 yrs) | Audit tables append-only |
Store-by-Store Detail¶
PostgreSQL / Azure SQL — metadata + graph¶
The relational system of record for structured metadata, the knowledge graph, governance, and indexes. Accessed through NHibernate mappings in ConnectSoft.Extensions.PersistenceModel.NHibernate. Tenant-first composite indexes back every query; the graph uses adjacency tables (knowledge_node, knowledge_edge) with traversal indexes.
Qdrant — vectors¶
The primary vector store. Collections are partitioned per (tenantId, kind) with payload filters (projectId, classification) so filtered ANN search stays tenant-isolated. Azure AI Search is the supported managed alternative for tenants/regions preferring a fully managed vector service; the VectorMemoryService abstracts the backend behind a common port so the rest of the platform is unaffected.
Azure Blob — artifact bodies¶
Object storage for full artifact bodies and snapshots. Writes are content-addressed (contentHash) and immutable; superseded versions are retained for lineage and cold-tiered. storageRef URIs (blob://artifacts/{tenant}/{project}/{artifactId}/{version}) are stored in SQL so metadata queries never touch blobs.
Azure DevOps Git — source-controlled memory¶
Durable, reviewable memory — prompt templates, decision records, and knowledge patterns — lives in Git so that memory itself is versioned, diffable, and subject to pull-request review. This is what makes the factory's most important knowledge auditable and human-governable.
Redis — hot context packages¶
Low-latency cache for built Context Packages keyed by contextPackageId, honouring ttlSeconds. Also caches graph projections. A cache miss transparently rebuilds from the durable SQL record / source stores.
App Insights / OTEL — runtime telemetry¶
Runtime signals from the Observability & Feedback Platform land in the OTEL/Application Insights backend; RuntimeMemoryService indexes their identity and linkage in SQL so signals join the knowledge graph and close the improvement loop.
Cross-Cutting Storage Rules¶
- Single writer — only the owning service writes a given store; everyone else reads via API/events.
- Tenant isolation —
tenantIdnamespaces every table partition, Qdrant collection, and Blob/Git path. - Content-addressable dedup —
contentHashprevents redundant storage and re-embedding. - Backup & DR — relational and blob stores use geo-redundant backups; Git is inherently replicated; Qdrant collections are snapshotted on a schedule.
- Observability — all data access is traced (
traceId) and metered throughConnectSoft.Extensions.Observability.