Observability¶
Target Architecture — Final-State Design
Observability is built on Serilog + OpenTelemetry exporting to Application Insights, the same stack the factory uses everywhere. The Runtime & Cloud Platform is both observed (its own services) and the conduit through which generated runtimes become observable to the rest of the factory.
The platform's job is to make running SaaS legible. It instruments its own services and standardizes telemetry from every generated workload, then forwards the operationally significant signals to the Observability & Feedback Platform where they drive SLOs, incidents, and the factory's learning loop.
Logs¶
- Structured logging via Serilog with the canonical envelope context (
tenantId,projectId,moduleId,traceId,correlationId) bound to every log entry. - Platform service logs and generated-workload logs flow to Application Insights; deployment step logs are also persisted to Azure Blob for long-term audit.
- Logs are tenant-scoped and queryable by
traceIdfor end-to-end correlation.
Metrics¶
- OpenTelemetry metrics exported to Application Insights, covering both platform operations and generated-runtime SLOs.
| Metric | Description |
|---|---|
runtime.deployment.duration |
Time from Requested to Completed/RolledBack |
runtime.deployment.rollback.rate |
Fraction of deployments that roll back |
runtime.health.status |
Healthy/Degraded/Unhealthy per service |
runtime.scaling.replicas |
Current vs desired replicas per service |
runtime.scaling.actions |
Scale-up/down actions per policy |
runtime.drift.open |
Open drift findings by severity |
runtime.drift.remediation.time |
Time to remediate a finding |
runtime.secret.rotation.age |
Age since last rotation per binding |
| Generated SaaS SLIs | Latency, throughput, error rate, saturation per generated component |
Traces¶
- Distributed tracing via OpenTelemetry;
traceIdpropagates from the triggering command through every service, worker, and emitted event. - A single trace spans provision → configure → deploy → health gate → promote → scale → drift, linking back through the DevOps delivery trace to the originating business intent.
Dashboards¶
- Runtime Center tiles (see UI) render live service inventory, health, deployments, scaling, and drift.
- Application Insights workbooks provide per-environment and per-tenant operational dashboards.
- Cross-platform SLO dashboards live in Observability & Feedback, fed by this platform's signals.
Alerts¶
| Alert | Condition | Routing |
|---|---|---|
| Deployment failure | RuntimeDeploymentRolledBack or Failed state |
On-call + Runtime Center incident |
| Service unhealthy | HealthCheckCompleted = Unhealthy beyond threshold |
On-call + Observability incident |
| Scaling violation | ScalingPolicyViolated (can't meet target) |
On-call + capacity review |
| High-severity drift | RuntimeDriftDetected severity = High in prod |
Governance + on-call |
| Secret rotation overdue | runtime.secret.rotation.age exceeds policy |
Security + Governance |
Required Dimensions¶
Every log, metric, trace, and alert carries the cross-cutting dimensions so signals are filterable and correlatable:
tenantId,projectId,moduleId,environmentId,serviceIdtraceId,correlationIdstage(dev/test/staging/prod),region,computeTargetversion(image/deployment),deploymentId
Feeding Observability & Feedback¶
flowchart LR
Gen["Generated SaaS Runtimes"] -->|"logs, metrics, traces"| RC["Runtime & Cloud Platform"]
RC -->|"HealthCheckCompleted, RuntimeDriftDetected, ScalingPolicyApplied, RuntimeDeploymentCompleted"| Obs["Observability & Feedback"]
Obs -->|"SLOs, alert rules, scaling targets"| RC
Obs -->|"runtime signals + incidents"| Knowledge["Knowledge Platform"]
Generated runtimes emit standardized telemetry because they are produced from factory templates that wire in ConnectSoft.Extensions.Observability / Extensions.Telemetry / Extensions.Logging.Serilog and ConnectSoft.Extensions.Diagnostics.HealthChecks. The Runtime & Cloud Platform normalizes and forwards the operationally significant events as integration events, closing the loop: runtime behaviour becomes feedback that the factory attributes back to the context and decisions that produced each component.