Observability¶

Target Architecture — Final-State Design

Observability is built on Serilog + OpenTelemetry exporting to Application Insights, the same stack the factory uses everywhere. The Runtime & Cloud Platform is both observed (its own services) and the conduit through which generated runtimes become observable to the rest of the factory.

The platform's job is to make running SaaS legible. It instruments its own services and standardizes telemetry from every generated workload, then forwards the operationally significant signals to the Observability & Feedback Platform where they drive SLOs, incidents, and the factory's learning loop.

Logs¶

Structured logging via Serilog with the canonical envelope context (tenantId, projectId, moduleId, traceId, correlationId) bound to every log entry.
Platform service logs and generated-workload logs flow to Application Insights; deployment step logs are also persisted to Azure Blob for long-term audit.
Logs are tenant-scoped and queryable by traceId for end-to-end correlation.

Metrics¶

OpenTelemetry metrics exported to Application Insights, covering both platform operations and generated-runtime SLOs.

Metric	Description
`runtime.deployment.duration`	Time from `Requested` to `Completed`/`RolledBack`
`runtime.deployment.rollback.rate`	Fraction of deployments that roll back
`runtime.health.status`	Healthy/Degraded/Unhealthy per service
`runtime.scaling.replicas`	Current vs desired replicas per service
`runtime.scaling.actions`	Scale-up/down actions per policy
`runtime.drift.open`	Open drift findings by severity
`runtime.drift.remediation.time`	Time to remediate a finding
`runtime.secret.rotation.age`	Age since last rotation per binding
Generated SaaS SLIs	Latency, throughput, error rate, saturation per generated component

Traces¶

Distributed tracing via OpenTelemetry; traceId propagates from the triggering command through every service, worker, and emitted event.
A single trace spans provision → configure → deploy → health gate → promote → scale → drift, linking back through the DevOps delivery trace to the originating business intent.

Dashboards¶

Runtime Center tiles (see UI) render live service inventory, health, deployments, scaling, and drift.
Application Insights workbooks provide per-environment and per-tenant operational dashboards.
Cross-platform SLO dashboards live in Observability & Feedback, fed by this platform's signals.

Alerts¶

Alert	Condition	Routing
Deployment failure	`RuntimeDeploymentRolledBack` or `Failed` state	On-call + Runtime Center incident
Service unhealthy	`HealthCheckCompleted` = Unhealthy beyond threshold	On-call + Observability incident
Scaling violation	`ScalingPolicyViolated` (can't meet target)	On-call + capacity review
High-severity drift	`RuntimeDriftDetected` severity = High in prod	Governance + on-call
Secret rotation overdue	`runtime.secret.rotation.age` exceeds policy	Security + Governance

Required Dimensions¶

Every log, metric, trace, and alert carries the cross-cutting dimensions so signals are filterable and correlatable:

tenantId, projectId, moduleId, environmentId, serviceId
traceId, correlationId
stage (dev/test/staging/prod), region, computeTarget
version (image/deployment), deploymentId

Feeding Observability & Feedback¶

flowchart LR
    Gen["Generated SaaS Runtimes"] -->|"logs, metrics, traces"| RC["Runtime & Cloud Platform"]
    RC -->|"HealthCheckCompleted, RuntimeDriftDetected, ScalingPolicyApplied, RuntimeDeploymentCompleted"| Obs["Observability & Feedback"]
    Obs -->|"SLOs, alert rules, scaling targets"| RC
    Obs -->|"runtime signals + incidents"| Knowledge["Knowledge Platform"]

Hold "Alt" / "Option" to enable pan & zoom

Generated runtimes emit standardized telemetry because they are produced from factory templates that wire in ConnectSoft.Extensions.Observability / Extensions.Telemetry / Extensions.Logging.Serilog and ConnectSoft.Extensions.Diagnostics.HealthChecks. The Runtime & Cloud Platform normalizes and forwards the operationally significant events as integration events, closing the loop: runtime behaviour becomes feedback that the factory attributes back to the context and decisions that produced each component.