Workers¶
Target Architecture — Final-State Design
This page describes the final-state background workers of the Observability & Feedback Platform. Workers run as hosted services / Azure Functions, consume the canonical event envelope, and are idempotent on eventId.
The platform runs eight background workers that do the heavy, asynchronous work of turning raw telemetry into signals and feedback. Ingestion workers (TraceIngestionWorker, MetricAggregationWorker) handle factory-scale volume on Azure Functions / Container Apps with consumption scaling; reaction workers (AlertEvaluationWorker, IncidentAnalysisWorker, FeedbackCreationWorker, QualityScoreWorker, CostAnomalyWorker, TelemetryCorrelationWorker) run as scheduled or event-driven hosted services.
Worker Catalog¶
| Worker | Trigger | Purpose | Input | Output | Retry | Idempotency |
|---|---|---|---|---|---|---|
TraceIngestionWorker |
OTLP stream / queue | Normalise and project incoming spans into correlated TraceRecord views. |
OTLP spans | TraceRecord, TraceRecorded |
Exponential backoff; DLQ after 5 | Dedup on span+traceId; upsert by traceId |
MetricAggregationWorker |
Schedule (1m/5m/1h) | Roll raw counters/histograms into MetricSeries. |
Raw metrics | MetricSeries, MetricAggregated |
Re-run window on failure (idempotent) | Deterministic per (metric, window, key) |
AlertEvaluationWorker |
Schedule (per rule cadence) | Evaluate AlertRule conditions against metrics/signals. |
MetricSeries, AlertRule |
AlertTriggered |
Retry next cadence | Dedup trigger per (ruleId, window) |
IncidentAnalysisWorker |
Event AlertTriggered, SloBreached |
Correlate triggers into incidents; enrich with trace lineage. | Triggers, correlation views | Incident, IncidentOpened |
Backoff; DLQ after 5 | Dedup on (source, sourceId); idempotent key |
FeedbackCreationWorker |
Event IncidentResolved, runtime signals |
Distil incidents/signals into FeedbackItem. |
Incidents, signals | FeedbackItem, FeedbackItemCreated |
Backoff; DLQ after 5 | Idempotency key from sourceId |
QualityScoreWorker |
Schedule + event FeedbackItemCreated |
Recompute QualityScore per project/artifact. |
Feedback, incidents, SLO status | QualityScore, QualityScoreComputed |
Re-run (idempotent) | Deterministic per (project, window) |
CostAnomalyWorker |
Schedule (hourly/daily) | Attribute cost and detect anomalies. | Metering/usage, CostSignal |
CostSignal, CostAnomalyDetected |
Re-run window | Dedup anomaly per (category, day) |
TelemetryCorrelationWorker |
Event (any) | Stitch traces, logs, metrics, feedback into TelemetryCorrelation by traceId. |
All platform events | TelemetryCorrelation snapshot |
Backoff; DLQ after 5 | Upsert by traceId |
Event-Flow Diagram¶
flowchart TB
OTLP["OTLP spans / metrics<br/>from Runtime & Agents"] --> TraceW["TraceIngestionWorker"]
OTLP --> MetricW["MetricAggregationWorker"]
TraceW -->|"TraceRecorded"| CorrW["TelemetryCorrelationWorker"]
MetricW -->|"MetricAggregated"| CorrW
MetricW -->|"series"| AlertW["AlertEvaluationWorker"]
MetricW -->|"series"| SloEval["SloService (breach eval)"]
AlertW -->|"AlertTriggered"| IncW["IncidentAnalysisWorker"]
SloEval -->|"SloBreached"| IncW
IncW -->|"IncidentOpened"| Store[("Incidents store")]
IncW -->|"IncidentResolved"| FbW["FeedbackCreationWorker"]
TraceW -->|"runtime signals"| FbW
FbW -->|"FeedbackItemCreated"| QualW["QualityScoreWorker"]
FbW -->|"FeedbackItemCreated"| KP["Knowledge Platform"]
MetricW -->|"usage"| CostW["CostAnomalyWorker"]
CostW -->|"CostAnomalyDetected"| FbW
QualW -->|"QualityScoreComputed"| KP
CorrW -->|"correlated views"| IncW
CorrW -->|"correlated views"| QualW
Hold "Alt" / "Option" to enable pan & zoom
Worker Design¶
- Idempotency — every worker deduplicates on the envelope
eventIdand computes a handler-specific idempotency key (e.g.eventId+ handler name, or a deterministic key per aggregation window). Re-delivery never produces duplicate aggregates, incidents, or feedback. - Retry and poison handling — event-driven workers use exponential backoff and move unprocessable messages to a dead-letter subqueue with the full envelope preserved for replay (see event envelope consumer rules). Scheduled workers simply re-run the window, which is safe because their output is deterministic.
- Tenant guard — every handler asserts
tenantIdfrom the envelope against the operation scope before touching a store. - Trace propagation — workers continue the inbound
traceIdinto their OTEL spans and Serilog context, so the platform's own processing is itself traceable. - Scaling —
TraceIngestionWorkerandMetricAggregationWorkerscale on queue depth / schedule concurrency; reaction workers scale on subscription backlog. See Deployment.