Skip to content

Data Model

Target Architecture — Final-State Design

This page defines the final-state logical data model of the Knowledge Platform. Physical layout follows the polyglot persistence described in Storage; this page focuses on the logical entities, keys, indexes, retention, and sensitivity rules.

The platform uses polyglot persistence: relational metadata and the graph in PostgreSQL / Azure SQL (via NHibernate), vectors in Qdrant, artifact bodies in Azure Blob, source-controlled memory in Azure DevOps Git, hot context packages in Redis, and runtime telemetry in the App Insights / OTEL backend. Every logical entity is anchored by tenantId and the cross-cutting metadata fields.

Core Graph & Memory Entities (erDiagram)

erDiagram
    MemoryRecord ||--o{ VectorDocument : "embedded as"
    MemoryRecord ||--o| ArtifactMemory : "body stored as"
    MemoryRecord ||--o{ KnowledgeNode : "projected as"
    KnowledgeNode ||--o{ KnowledgeEdge : "from"
    KnowledgeNode ||--o{ KnowledgeEdge : "to"
    VectorDocument ||--o{ VectorChunk : "contains"

    MemoryRecord {
        string memoryRecordId PK
        string tenantId
        string projectId
        string kind
        string artifactId
        string contentHash
        string classification
        float qualityScore
        string traceId
        datetime createdAt
        string status
    }
    KnowledgeNode {
        string nodeId PK
        string tenantId
        string nodeType
        string ref
        string title
        json properties
    }
    KnowledgeEdge {
        string edgeId PK
        string tenantId
        string fromNodeId FK
        string toNodeId FK
        string edgeType
        float weight
    }
    VectorDocument {
        string vectorDocumentId PK
        string memoryRecordId FK
        string tenantId
        string collection
        string modelVersion
        int chunkCount
        string contentHash
    }
    VectorChunk {
        string vectorChunkId PK
        string vectorDocumentId FK
        int ordinal
        int tokenCount
    }
    ArtifactMemory {
        string artifactMemoryId PK
        string artifactId
        string tenantId
        string artifactType
        string currentVersion
        string classification
        string storageRef
    }
Hold "Alt" / "Option" to enable pan & zoom

Logical Tables and Containers

Logical entity Store Container / table Primary key Partitioning
MemoryRecord PostgreSQL / Azure SQL memory_record memoryRecordId by tenantId
KnowledgeNode PostgreSQL / Azure SQL knowledge_node nodeId by tenantId
KnowledgeEdge PostgreSQL / Azure SQL knowledge_edge edgeId by tenantId
GraphProjection PostgreSQL / Azure SQL (+Redis) graph_projection graphProjectionId by tenantId
VectorDocument / VectorChunk Qdrant collection per (tenantId, kind) vectorDocumentId / point id by collection
EmbeddingJob PostgreSQL / Azure SQL embedding_job embeddingJobId by tenantId
ArtifactMemory / ArtifactVersion / ArtifactSnapshot Azure Blob (+SQL index) artifact_memory, blob path artifactMemoryId by tenantId/projectId
CodeRepository / CodeSymbol / CodeIndexJob PostgreSQL / Azure SQL (+Qdrant) code_repository, code_symbol, code_index_job respective ids by tenantId
PromptTemplate / PromptVersion Azure DevOps Git (+SQL index) git path + prompt_template promptTemplateId by tenantId
PromptRun PostgreSQL / Azure SQL prompt_run promptRunId by tenantId
DecisionRecord / DecisionAlternative Azure DevOps Git (+SQL index) git path + decision_record decisionRecordId by tenantId
RuntimeSignal OTEL backend (+SQL index) runtime_signal runtimeSignalId by tenantId/environment
IncidentMemory / FeedbackItem PostgreSQL / Azure SQL incident_memory, feedback_item respective ids by tenantId
KnowledgePattern / PatternVersion Azure DevOps Git (+SQL index) git path + knowledge_pattern knowledgePatternId by tenantId
Governance aggregates PostgreSQL / Azure SQL memory_access_policy, memory_access_decision, memory_access_audit, memory_classification, knowledge_quality_assessment, quality_rule respective ids by tenantId
ContextBuildRequest / ContextPackage / ContextSource Redis (hot) + SQL (durable) context:{id} / context_package contextPackageId by tenantId

Indexes

  • Tenant-first composite indexes — every tenant-scoped query uses an index prefixed by tenantId (e.g. (tenantId, projectId, kind) on memory_record).
  • Correlation indexes(tenantId, traceId), (tenantId, correlationId), (tenantId, artifactId) enable lifecycle correlation queries.
  • Graph traversal(tenantId, fromNodeId, edgeType) and (tenantId, toNodeId, edgeType) on knowledge_edge; (tenantId, nodeType, ref) unique on knowledge_node.
  • Dedup — unique (tenantId, contentHash) on memory_record powers ingestion deduplication.
  • Vector — Qdrant HNSW index per collection with payload indexes on tenantId, projectId, kind, classification for filtered ANN search.
  • Full-text / metadata — GIN/B-tree indexes on metadata columns used by POST /knowledge/search/metadata.

Multi-Tenancy

  • tenantId is a mandatory, non-null column/property on every entity and the leading key of every tenant-scoped index.
  • Qdrant collections and Blob/Git paths are namespaced by tenantId so isolation is physical as well as logical.
  • Every query handler asserts the request tenantId against the resolved scope before executing; cross-tenant access is impossible by construction.
  • Optional dedicated-tier tenants can be assigned isolated Qdrant clusters and SQL databases without changing the logical model.

Sensitivity & Classification

Every MemoryRecord carries a classification (Public / Internal / Confidential / Secret) assigned by MemoryClassificationService:

Classification Storage handling Retrieval handling
Public Stored normally; eligible for marketplace listings Returned without redaction
Internal Standard tenant isolation Returned within tenant scope
Confidential Stored normally; access via MemoryAccessPolicy Redacted unless explicitly allowed
Secret Never stored as artifact body; only Key Vault references Excluded; reference only

See Governance for enforcement.

Retention

Data Default retention Lifecycle worker
MemoryRecord & vectors Project lifetime + 24 months RetentionWorker
Superseded ArtifactVersion bodies Retained (immutable lineage); cold-tiered after 12 months RetentionWorker
ContextPackage (Redis) ttlSeconds (default 1800s); durable record kept 90 days TTL + RetentionWorker
RuntimeSignal raw telemetry 30–90 days (OTEL backend); aggregates retained longer RetentionWorker
MemoryAccessAudit Compliance retention (e.g. 7 years), append-only never auto-deleted before retention
Stale/superseded memory Flagged then archived StaleMemoryWorker

Data Integrity Rules

  • Content-addressablecontentHash ties memory records, vectors, and artifact versions to immutable content; unchanged content is never re-embedded or re-projected.
  • Single writer per aggregate — only the owning service writes its store; others read via API/events.
  • Append-only audit — access and audit tables are never updated in place.
  • Referential resolvability — graph edges, lineage links, and context sources must reference existing entities within the same tenant.