Aggregate Roots¶
Target Architecture — Final-State Design
This page describes the final-state aggregate roots of the Observability & Feedback Platform. Each aggregate enforces its own invariants, emits domain events through the canonical event envelope, and is persisted by a single owning service. Every aggregate carries tenantId as an isolation boundary.
The platform owns eleven aggregate roots across its seven bounded contexts. Aggregates are the consistency boundaries of the domain: each is loaded, mutated, and saved as a whole by exactly one service through a single repository.
TraceRecord¶
Purpose — A correlated, queryable projection of an end-to-end distributed trace, anchored by traceId, spanning factory services, agents, and generated SaaS.
Fields — traceId (identity), tenantId, projectId, moduleId, rootSpanName, startedAt, durationMs, status, dimensions (required telemetry dimensions), spanCount.
Entities — Span (spanId, parentSpanId, name, moduleId, durationMs, attributes), SpanEvent (timestamp, name, attributes).
Value Objects — TraceStatus (ok | error | unset), TelemetryDimensions (the eleven required dimensions), TimeRange.
Invariants — exactly one root span (null parentSpanId); every span shares the aggregate's traceId and tenantId; durationMs ≥ 0; spans form a valid parent/child tree.
Domain Events — TraceRecorded.
Repository — ITraceRecordRepository (query by traceId, tenant, time range). Read-optimised over Application Insights.
Persistence — Application Insights (span store); correlated projection cached for query. No relational mutation path.
LogRecordReference¶
Purpose — A governed, tenant-scoped index reference to structured Serilog log entries stored in Log Analytics — the platform does not duplicate log bodies.
Fields — referenceId, tenantId, projectId, moduleId, traceId, executionId, level, timestamp, logAnalyticsRef (workspace + query handle).
Entities — none (reference aggregate).
Value Objects — LogLevel (Verbose | Debug | Information | Warning | Error | Fatal), LogAnalyticsRef.
Invariants — every reference is tenant-scoped; traceId is required for correlation; references are immutable once created.
Domain Events — none (query-only context; emits no integration events).
Repository — ILogRecordReferenceRepository (resolve and authorize Log Analytics queries by tenant/trace).
Persistence — Log Analytics is the system of record; reference metadata in Azure SQL / PostgreSQL for governance and access control.
MetricSeries¶
Purpose — An aggregated time-series of a single metric, grouped by required dimensions, used by dashboards, alerts, SLOs, and anomaly detection.
Fields — seriesId, tenantId, metricName, unit, groupKey (dimension key), window, points (timestamp/value), aggregation (sum | avg | p50 | p95 | p99 | count).
Entities — MetricPoint (timestamp, value).
Value Objects — MetricKey (metric name + dimension key), AggregationKind, Window.
Invariants — points are ordered and non-overlapping within a window; aggregation is deterministic per (metricName, window, groupKey); values consistent with unit.
Domain Events — MetricAggregated.
Repository — IMetricSeriesRepository (upsert by deterministic key; query by metric/time range/group).
Persistence — Application Insights / metric store for points; series metadata in Azure SQL.
DashboardDefinition¶
Purpose — A reusable, multi-tenant dashboard composed of panels over metrics, traces, cost, and quality.
Fields — dashboardId, tenantId, name, scope (project/module), panels, version, createdAt, updatedAt.
Entities — Panel (type, metric/sloId, groupBy, layout).
Value Objects — DashboardScope, PanelType (timeseries | stat | table | trace | heatmap).
Invariants — unique name per tenant+scope; at least one panel; panel metric/SLO references must resolve within the tenant.
Domain Events — DashboardDefined, DashboardUpdated (internal).
Repository — IDashboardDefinitionRepository.
Persistence — Azure SQL / PostgreSQL (NHibernate); optionally projected to App Insights workbooks / Grafana.
AlertRule¶
Purpose — A condition over metric series (or signals) that, when met, raises an alert trigger and optionally opens an incident.
Fields — alertRuleId, tenantId, name, scope, condition (metric, operator, threshold, forMinutes), severity, actions, enabled, version.
Entities — AlertAction (type, target).
Value Objects — AlertCondition, Severity (info | warning | high | critical), AlertScope.
Invariants — condition references a resolvable metric; forMinutes ≥ 0; a disabled rule never triggers; unique name per tenant+scope.
Domain Events — AlertTriggered.
Repository — IAlertRuleRepository.
Persistence — Azure SQL / PostgreSQL (NHibernate).
Incident¶
Purpose — The unit of operational reaction: a tracked problem with lifecycle, severity, trace lineage, and resolution, bridging detection and learning.
Fields — incidentId, tenantId, projectId, moduleId, title, severity, status, source (alert/SLO/trace/manual), traceId, openedAt, acknowledgedAt, resolvedAt, rootCause, dimensions.
Entities — IncidentEvent (timestamp, type, actor, note), EscalationStep (level, target, at).
Value Objects — IncidentStatus (open | acknowledged | investigating | mitigated | resolved), Severity, IncidentSource.
Invariants — status transitions follow the allowed lifecycle (see Workflows); resolvedAt requires rootCause; a resolved incident is immutable except for post-mortem notes; tenantId matches source.
Domain Events — IncidentOpened, IncidentResolved (plus internal IncidentAcknowledged, IncidentEscalated).
Repository — IIncidentRepository.
Persistence — Azure SQL / PostgreSQL (NHibernate); event log appended per transition.
FeedbackItem¶
Purpose — A durable feedback record distilled from a runtime signal, incident, human, or agent — the platform's primary contribution to the improvement loop.
Fields — feedbackItemId, tenantId, projectId, artifactId, source, sourceId, category, sentiment, summary, detail, traceId, status, createdAt.
Entities — FeedbackAttachment (blob ref, kind).
Value Objects — FeedbackSource (incident | runtime-signal | human | agent), FeedbackCategory (reliability | performance | cost | maintainability | correctness), Sentiment (positive | neutral | negative), FeedbackStatus (captured | routed | applied).
Invariants — summary is required; sourceId required when source is incident/runtime-signal; artifactId or projectId present for attribution; immutable summary once routed.
Domain Events — FeedbackItemCreated.
Repository — IFeedbackItemRepository.
Persistence — Azure SQL / PostgreSQL (metadata) + Azure Blob (attachments, exports).
QualityScore¶
Purpose — A computed, multi-dimensional quality score for a project (and its artifacts), derived from feedback, incidents, and SLO adherence.
Fields — qualityScoreId, tenantId, projectId, computedAt, overall, dimensions (reliability, performance, cost_efficiency, maintainability, correctness), artifactScores, window.
Entities — ArtifactScore (artifactId, score, openFeedbackCount).
Value Objects — ScoreVector (dimension → 0..1), Window.
Invariants — all scores in [0,1]; overall is a deterministic function of dimensions; recomputation for the same (project, window) is idempotent.
Domain Events — QualityScoreComputed.
Repository — IQualityScoreRepository.
Persistence — Azure SQL / PostgreSQL (NHibernate).
CostSignal¶
Purpose — Attribution of model, compute, and infrastructure cost to a tenant/project, with anomaly detection feeding the economic side of the loop.
Fields — costSignalId, tenantId, projectId, period, currency, total, breakdown (category → amount), anomalies, computedAt.
Entities — CostBreakdownLine (category, amount), CostAnomaly (category, detectedAt, deltaPct, baseline).
Value Objects — Money (amount + currency), CostCategory (model_inference | compute | storage | network), Period.
Invariants — total equals the sum of breakdown lines; amounts ≥ 0; anomalies reference an existing category; deterministic per (project, period).
Domain Events — CostAnomalyDetected.
Repository — ICostSignalRepository.
Persistence — Azure SQL / PostgreSQL (metadata) + Azure Blob (detailed cost exports).
SloDefinition¶
Purpose — A service-level objective with target, window, and error budget, evaluated continuously to detect breaches.
Fields — sloId, tenantId, projectId, name, indicator (SLI metric), target (e.g. 99.9), window (rolling), errorBudget, budgetRemaining, status.
Entities — BudgetBurnEvent (timestamp, burnedPct).
Value Objects — Sli (metric + good/total definition), SloTarget, SloStatus (healthy | at-risk | breached).
Invariants — target in (0,100]; budgetRemaining in [0, errorBudget]; status breached requires budgetRemaining = 0; window is positive.
Domain Events — SloBreached.
Repository — ISloDefinitionRepository.
Persistence — Azure SQL / PostgreSQL (NHibernate); budget burn computed from MetricSeries.
TelemetryCorrelation¶
Purpose — The join aggregate that stitches traces, logs, metrics, incidents, and feedback into a single correlated view per traceId — the universal correlation key of the platform.
Fields — traceId (identity), tenantId, projectId, dimensions, traceRef, logRefs, metricRefs, incidentRefs, feedbackRefs, updatedAt.
Entities — CorrelationLink (kind, targetId, store).
Value Objects — TelemetryDimensions, CorrelationKind (trace | log | metric | incident | feedback).
Invariants — exactly one correlation per traceId per tenant (upsert); all linked refs share the same tenantId; links reference resolvable records.
Domain Events — emits internal correlation snapshots; no public integration event.
Repository — ITelemetryCorrelationRepository (upsert by traceId).
Persistence — Application Insights (trace anchor) + Azure SQL (link index).
Aggregate Summary¶
| Aggregate Root | Bounded Context | Owning Service | Key Events | Primary Store |
|---|---|---|---|---|
TraceRecord |
Tracing | TraceService | TraceRecorded |
Application Insights |
LogRecordReference |
Logs | LogQueryService | — | Log Analytics |
MetricSeries |
Metrics & SLO | MetricAggregationService | MetricAggregated |
App Insights + SQL |
DashboardDefinition |
Dashboards & Alerts | DashboardService | DashboardDefined |
Azure SQL / PostgreSQL |
AlertRule |
Dashboards & Alerts | AlertRuleService | AlertTriggered |
Azure SQL / PostgreSQL |
Incident |
Incidents | IncidentService | IncidentOpened, IncidentResolved |
Azure SQL / PostgreSQL |
FeedbackItem |
Feedback & Quality | FeedbackService | FeedbackItemCreated |
Azure SQL + Blob |
QualityScore |
Feedback & Quality | QualityScoreService | QualityScoreComputed |
Azure SQL / PostgreSQL |
CostSignal |
Cost | CostTelemetryService | CostAnomalyDetected |
Azure SQL + Blob |
SloDefinition |
Metrics & SLO | SloService | SloBreached |
Azure SQL / PostgreSQL |
TelemetryCorrelation |
Tracing | TelemetryCorrelationService | (internal) | App Insights + SQL |