Bounded Contexts¶
Target Architecture — Final-State Design
This page describes the final-state bounded-context decomposition of the Observability & Feedback Platform. Each context owns its aggregate roots, services, and stores, and communicates with the others only through the canonical event envelope.
The Observability & Feedback Platform is decomposed into seven bounded contexts. Each owns a clear slice of the observability and feedback domain, persists its own aggregates, and integrates with the rest of the factory through events rather than shared state. This keeps the platform independently scalable — high-volume ingestion contexts (Tracing, Logs, Metrics) scale separately from the transactional contexts (Incidents, Feedback & Quality, Cost).
Context Map¶
flowchart TB
subgraph Ingestion["High-volume ingestion"]
Tracing["Tracing<br/>TraceRecord, TelemetryCorrelation"]
Logs["Logs<br/>LogRecordReference"]
Metrics["Metrics & SLO<br/>MetricSeries, SloDefinition"]
end
subgraph Reaction["Detection & reaction"]
DashAlerts["Dashboards & Alerts<br/>DashboardDefinition, AlertRule"]
Incidents["Incidents<br/>Incident"]
end
subgraph Learning["Learning & economics"]
FeedbackQuality["Feedback & Quality<br/>FeedbackItem, QualityScore"]
Cost["Cost<br/>CostSignal"]
end
Tracing -->|"correlated signals"| Metrics
Logs -->|"log-derived metrics"| Metrics
Metrics -->|"metric series"| DashAlerts
Metrics -->|"SloBreached"| Incidents
DashAlerts -->|"AlertTriggered"| Incidents
Incidents -->|"IncidentResolved"| FeedbackQuality
Tracing -->|"runtime signals"| FeedbackQuality
Metrics -->|"usage"| Cost
Cost -->|"CostAnomalyDetected"| FeedbackQuality
FeedbackQuality -->|"FeedbackItemCreated"| KP["Knowledge Platform"]
Contexts¶
Tracing¶
Owns end-to-end distributed traces. Ingests OpenTelemetry spans from factory services, agents, and generated SaaS products and reconstructs the full causal path of a request by traceId. The TelemetryCorrelation aggregate stitches traces, logs, metrics, and feedback into one correlated view, making it the join point for the whole platform.
Logs¶
Owns references to structured Serilog logs. The platform does not duplicate log bodies — Log Analytics is the system of record — but it maintains LogRecordReference aggregates that index and govern access to logs by traceId, tenant, project, and module so queries stay tenant-isolated and traceable.
Metrics & SLO¶
Owns aggregated time-series (MetricSeries) and service-level objectives (SloDefinition). Rolls raw counters and histograms into queryable series, tracks error budgets, and detects SLO breaches that feed alerting and incidents.
Dashboards & Alerts¶
Owns reusable, multi-tenant DashboardDefinition views and AlertRule definitions. Dashboards visualise metrics, traces, cost, and quality; alert rules evaluate metrics and signals and raise triggers that can open incidents.
Incidents¶
Owns the Incident lifecycle — open, analyse, escalate, mitigate, resolve — with full trace lineage. Incidents are the platform's unit of operational reaction and the bridge between detection (alerts, SLO breaches) and learning (feedback).
Feedback & Quality¶
Owns FeedbackItem and QualityScore aggregates. This is the learning context: it distils runtime signals, incidents, and human/agent feedback into durable feedback items, computes quality scores per project and artifact, and publishes them to the Knowledge Platform to close the improvement loop.
Cost¶
Owns CostSignal aggregates: per-tenant, per-project attribution of model, compute, and infrastructure cost, with anomaly detection. Provides the economic feedback the factory needs to optimise generation cost-effectively.
Context-to-Aggregate Map¶
| Bounded Context | Aggregate Roots | Primary Store | Key Events | Scaling Profile |
|---|---|---|---|---|
| Tracing | TraceRecord, TelemetryCorrelation |
Application Insights | TraceRecorded |
High-volume ingestion |
| Logs | LogRecordReference |
Log Analytics | — (query-only) | High-volume ingestion |
| Metrics & SLO | MetricSeries, SloDefinition |
Application Insights + Azure SQL | MetricAggregated, SloBreached |
High-volume aggregation |
| Dashboards & Alerts | DashboardDefinition, AlertRule |
Azure SQL / PostgreSQL | AlertTriggered |
Transactional |
| Incidents | Incident |
Azure SQL / PostgreSQL | IncidentOpened, IncidentResolved |
Transactional |
| Feedback & Quality | FeedbackItem, QualityScore |
Azure SQL / PostgreSQL + Blob | FeedbackItemCreated, QualityScoreComputed |
Transactional |
| Cost | CostSignal |
Azure SQL / PostgreSQL + Blob | CostAnomalyDetected |
Transactional + batch |
Design Principles¶
- Ingestion separated from reaction. Tracing, Logs, and Metrics absorb factory-wide telemetry volume; Incidents, Feedback & Quality, and Cost run as transactional services that react to distilled signals. This separation is what lets the platform scale ingestion independently of the learning loop.
- Correlation is a first-class context.
TelemetryCorrelationexists specifically to maketraceIdthe universal join key across stores that are otherwise heterogeneous (App Insights, Log Analytics, SQL). - Events are the only cross-context contract. Contexts never read each other's stores. All cross-context flow uses the canonical envelope (see Events).
- Multi-tenant by construction. Every aggregate carries
tenantId; it is an isolation boundary in every store, query, dashboard, and alert.