Generated SaaS Observability¶
Target Architecture — Final-State Design
This page describes the observability architecture generated into every SaaS Product. Instrumentation is generated from the standard stack — Serilog for structured logs, OpenTelemetry for metrics and traces, and Azure Application Insights as the backend — and dashboards/alerts are generated as artifacts. The product's signals feed the factory's Observability & Feedback platform.
Every generated product is observable by construction. Each service and worker emits structured logs, metrics, and distributed traces correlated by the canonical event envelope traceId/correlationId. Generated dashboards and alerts make the running product legible to operators, and the same signals flow back to the factory to close the feedback loop.
Logs¶
- Structured logging via Serilog with a consistent JSON schema; no unstructured string logs.
- Log context carries the required dimensions on every entry (
tenantId,traceId,correlationId, service, operation). - Sinks — Application Insights / Azure Monitor in production; console in development. PII is redacted at the sink per data-classification rules.
- Correlation — log scopes enrich entries with the ambient
traceIdso logs join traces and metrics.
Metrics¶
| Metric category | Examples |
|---|---|
| Request | request rate, latency (p50/p95/p99), error rate per route + tenant |
| Messaging | events published/consumed, consumer lag, retry count, dead-letter count |
| Worker | messages processed, processing duration, failures per worker |
| Domain | tenants provisioned, subscriptions activated, notifications sent, reports generated |
| Resource | CPU, memory, connection-pool usage, cache hit ratio |
| Business | active tenants, active users, edition distribution, metered usage |
Metrics are emitted via OpenTelemetry instruments and tagged with the required dimensions below.
Traces¶
- Distributed tracing via OpenTelemetry spans across the gateway, services, message consumers, and external calls.
- Trace propagation —
traceId/correlationIdflow from the inbound request through synchronous hops and into published events, so an async side effect (e.g. a notification) shares the originating trace. - Span attributes include
tenantId,eventType, operation, and outcome.
Required dimensions¶
Every signal (log, metric, trace) carries these dimensions so the factory can correlate and slice consistently:
| Dimension | Source | Purpose |
|---|---|---|
tenantId |
token / envelope | Per-tenant slicing and isolation analysis |
traceId |
envelope / OTEL | End-to-end correlation |
correlationId |
envelope | Workflow/saga correlation |
service / moduleId |
runtime | Service attribution |
eventType |
envelope | Event-level analysis |
environment |
runtime | Env separation (dev/staging/prod) |
version |
build | Release correlation |
Dashboards¶
Generated dashboards (from observability dashboard templates) give operators a ready-made view per product:
- Product Health — request rate, error rate, latency, dependency health, by service and tenant.
- Messaging & Workers — topic throughput, consumer lag, retries, dead-letter trends.
- Tenant Activity — active tenants/users, onboarding funnel, edition distribution.
- SaaS Spine — subscriptions, billing health, feature-flag rollout, notification delivery, report generation.
- Security — auth failures, authorization denials, rate-limit hits, audit volume.
These dashboards are the product-side counterpart to the factory's Observability & Feedback dashboards; product signals are exported upstream.
Alerts¶
| Alert | Condition | Action |
|---|---|---|
| High error rate | error rate > threshold over window, per service/tenant | Page on-call; link to trace |
| Latency SLO breach | p99 latency > SLO | Notify; autoscale evaluation |
| Consumer lag | Service Bus lag > threshold | Scale workers; investigate poison |
| Dead-letter spike | DLQ growth > threshold | Operator replay workflow |
| Provisioning failure | TenantProvisioningTimedOut / failure rate |
Notify onboarding owner |
| Auth failure spike | auth failures > baseline | Security review |
| Resource saturation | CPU/memory/connection pool > threshold | Autoscale; capacity review |
How observability contributes to the pillars¶
- Traceability —
traceId/correlationIdon every signal make any request reconstructable end to end and back to factory intent. - Reusability — instrumentation, dimensions, dashboards, and alerts are generated identically across products.
- Autonomy — agents generate the observability artifacts; SRE-style agents can act on alerts.
- Governance — security metrics and audit volume are observable; PII redaction enforces classification.
- Observability — this is the observability pillar realized in the generated product and fed back to the factory.
- Multi-tenant scale —
tenantIdon every signal enables per-tenant SLOs, cost attribution, and noisy-neighbor detection.