Data Model¶
Target Architecture — Final-State Design
This page defines the final-state logical data model of the Knowledge Platform. Physical layout follows the polyglot persistence described in Storage; this page focuses on the logical entities, keys, indexes, retention, and sensitivity rules.
The platform uses polyglot persistence: relational metadata and the graph in PostgreSQL / Azure SQL (via NHibernate), vectors in Qdrant, artifact bodies in Azure Blob, source-controlled memory in Azure DevOps Git, hot context packages in Redis, and runtime telemetry in the App Insights / OTEL backend. Every logical entity is anchored by tenantId and the cross-cutting metadata fields.
Core Graph & Memory Entities (erDiagram)¶
erDiagram
MemoryRecord ||--o{ VectorDocument : "embedded as"
MemoryRecord ||--o| ArtifactMemory : "body stored as"
MemoryRecord ||--o{ KnowledgeNode : "projected as"
KnowledgeNode ||--o{ KnowledgeEdge : "from"
KnowledgeNode ||--o{ KnowledgeEdge : "to"
VectorDocument ||--o{ VectorChunk : "contains"
MemoryRecord {
string memoryRecordId PK
string tenantId
string projectId
string kind
string artifactId
string contentHash
string classification
float qualityScore
string traceId
datetime createdAt
string status
}
KnowledgeNode {
string nodeId PK
string tenantId
string nodeType
string ref
string title
json properties
}
KnowledgeEdge {
string edgeId PK
string tenantId
string fromNodeId FK
string toNodeId FK
string edgeType
float weight
}
VectorDocument {
string vectorDocumentId PK
string memoryRecordId FK
string tenantId
string collection
string modelVersion
int chunkCount
string contentHash
}
VectorChunk {
string vectorChunkId PK
string vectorDocumentId FK
int ordinal
int tokenCount
}
ArtifactMemory {
string artifactMemoryId PK
string artifactId
string tenantId
string artifactType
string currentVersion
string classification
string storageRef
}
Logical Tables and Containers¶
| Logical entity | Store | Container / table | Primary key | Partitioning |
|---|---|---|---|---|
MemoryRecord |
PostgreSQL / Azure SQL | memory_record |
memoryRecordId |
by tenantId |
KnowledgeNode |
PostgreSQL / Azure SQL | knowledge_node |
nodeId |
by tenantId |
KnowledgeEdge |
PostgreSQL / Azure SQL | knowledge_edge |
edgeId |
by tenantId |
GraphProjection |
PostgreSQL / Azure SQL (+Redis) | graph_projection |
graphProjectionId |
by tenantId |
VectorDocument / VectorChunk |
Qdrant | collection per (tenantId, kind) |
vectorDocumentId / point id |
by collection |
EmbeddingJob |
PostgreSQL / Azure SQL | embedding_job |
embeddingJobId |
by tenantId |
ArtifactMemory / ArtifactVersion / ArtifactSnapshot |
Azure Blob (+SQL index) | artifact_memory, blob path |
artifactMemoryId |
by tenantId/projectId |
CodeRepository / CodeSymbol / CodeIndexJob |
PostgreSQL / Azure SQL (+Qdrant) | code_repository, code_symbol, code_index_job |
respective ids | by tenantId |
PromptTemplate / PromptVersion |
Azure DevOps Git (+SQL index) | git path + prompt_template |
promptTemplateId |
by tenantId |
PromptRun |
PostgreSQL / Azure SQL | prompt_run |
promptRunId |
by tenantId |
DecisionRecord / DecisionAlternative |
Azure DevOps Git (+SQL index) | git path + decision_record |
decisionRecordId |
by tenantId |
RuntimeSignal |
OTEL backend (+SQL index) | runtime_signal |
runtimeSignalId |
by tenantId/environment |
IncidentMemory / FeedbackItem |
PostgreSQL / Azure SQL | incident_memory, feedback_item |
respective ids | by tenantId |
KnowledgePattern / PatternVersion |
Azure DevOps Git (+SQL index) | git path + knowledge_pattern |
knowledgePatternId |
by tenantId |
| Governance aggregates | PostgreSQL / Azure SQL | memory_access_policy, memory_access_decision, memory_access_audit, memory_classification, knowledge_quality_assessment, quality_rule |
respective ids | by tenantId |
ContextBuildRequest / ContextPackage / ContextSource |
Redis (hot) + SQL (durable) | context:{id} / context_package |
contextPackageId |
by tenantId |
Indexes¶
- Tenant-first composite indexes — every tenant-scoped query uses an index prefixed by
tenantId(e.g.(tenantId, projectId, kind)onmemory_record). - Correlation indexes —
(tenantId, traceId),(tenantId, correlationId),(tenantId, artifactId)enable lifecycle correlation queries. - Graph traversal —
(tenantId, fromNodeId, edgeType)and(tenantId, toNodeId, edgeType)onknowledge_edge;(tenantId, nodeType, ref)unique onknowledge_node. - Dedup — unique
(tenantId, contentHash)onmemory_recordpowers ingestion deduplication. - Vector — Qdrant HNSW index per collection with payload indexes on
tenantId,projectId,kind,classificationfor filtered ANN search. - Full-text / metadata — GIN/B-tree indexes on metadata columns used by
POST /knowledge/search/metadata.
Multi-Tenancy¶
tenantIdis a mandatory, non-null column/property on every entity and the leading key of every tenant-scoped index.- Qdrant collections and Blob/Git paths are namespaced by
tenantIdso isolation is physical as well as logical. - Every query handler asserts the request
tenantIdagainst the resolved scope before executing; cross-tenant access is impossible by construction. - Optional dedicated-tier tenants can be assigned isolated Qdrant clusters and SQL databases without changing the logical model.
Sensitivity & Classification¶
Every MemoryRecord carries a classification (Public / Internal / Confidential / Secret) assigned by MemoryClassificationService:
| Classification | Storage handling | Retrieval handling |
|---|---|---|
Public |
Stored normally; eligible for marketplace listings | Returned without redaction |
Internal |
Standard tenant isolation | Returned within tenant scope |
Confidential |
Stored normally; access via MemoryAccessPolicy |
Redacted unless explicitly allowed |
Secret |
Never stored as artifact body; only Key Vault references | Excluded; reference only |
See Governance for enforcement.
Retention¶
| Data | Default retention | Lifecycle worker |
|---|---|---|
MemoryRecord & vectors |
Project lifetime + 24 months | RetentionWorker |
Superseded ArtifactVersion bodies |
Retained (immutable lineage); cold-tiered after 12 months | RetentionWorker |
ContextPackage (Redis) |
ttlSeconds (default 1800s); durable record kept 90 days |
TTL + RetentionWorker |
RuntimeSignal raw telemetry |
30–90 days (OTEL backend); aggregates retained longer | RetentionWorker |
MemoryAccessAudit |
Compliance retention (e.g. 7 years), append-only | never auto-deleted before retention |
| Stale/superseded memory | Flagged then archived | StaleMemoryWorker |
Data Integrity Rules¶
- Content-addressable —
contentHashties memory records, vectors, and artifact versions to immutable content; unchanged content is never re-embedded or re-projected. - Single writer per aggregate — only the owning service writes its store; others read via API/events.
- Append-only audit — access and audit tables are never updated in place.
- Referential resolvability — graph edges, lineage links, and context sources must reference existing entities within the same tenant.