Workflows¶
Target Architecture — Final-State Design
This page describes the final-state operational workflows of the Integration Platform. Each workflow is event-driven, idempotent, tenant-scoped, and correlated by traceId so the full path from trigger to outcome is traceable in Observability.
The platform's behaviour is defined by four workflows: outbound webhook delivery (with retry and dead-lettering), the integration run lifecycle, credential rotation, and the cross-cutting failure / retry / poison handling that all the others feed into.
Webhook Delivery (Outbound) with Retry¶
When a factory event matches an active subscription, the WebhookDeliveryWorker delivers it to the external endpoint with a signed payload, retrying with exponential backoff and dead-lettering on exhaustion.
sequenceDiagram
participant Bus as Azure Service Bus
participant GW as WebhookGatewayService
participant W as WebhookDeliveryWorker
participant EP as External Endpoint
participant KV as Key Vault
Bus->>GW: factory event (e.g. DeploymentPromoted)
GW->>GW: match active subscriptions (tenant-scoped)
GW->>W: enqueue WebhookDelivery (Pending)
W->>KV: resolve signing secret (signingSecretRef)
KV-->>W: signing key (in-memory only)
W->>W: sign payload (HMAC)
W->>EP: POST payload + signature
alt 2xx response
EP-->>W: 200 OK
W->>GW: mark Delivered
W->>Bus: publish WebhookDelivered
else non-2xx / timeout
EP-->>W: 5xx / timeout
W->>GW: record DeliveryAttempt (failed)
W->>W: backoff (exponential)
Note over W: retry up to max attempts
W->>EP: POST (retry)
alt still failing after cap
W->>GW: mark DeadLettered + IntegrationFailure
W->>Bus: publish IntegrationFailed
end
end
Integration Run Lifecycle¶
Every outbound operation (open PR, send SMS, create charge) is an IntegrationRun that moves through a strict state machine. Inbound deliveries reuse the same failure/retry semantics.
stateDiagram-v2
[*] --> Pending
Pending --> Running : ExecuteIntegrationRun (connection Active + Healthy)
Running --> Completed : vendor 2xx / success
Running --> Failed : vendor error / timeout
Failed --> Running : IntegrationRetryWorker (transient, within cap)
Failed --> DeadLettered : poison or retry cap reached
Completed --> [*]
DeadLettered --> [*]
note right of Running
Each attempt recorded as RunAttempt;
traceId propagated to vendor call span
end note
note right of Failed
IntegrationFailure created and classified:
Transient / RateLimited / Auth / Validation / Poison
end note
State transitions are monotonic and each emits a canonical event: entering Completed emits IntegrationRunCompleted; entering Failed or DeadLettered emits IntegrationFailed. See Events.
Credential Rotation¶
Rotation is triggered on a schedule (rotation policy) or on demand via POST /integrations/credentials/rotate. The prior secret version stays active until the new version passes verification, so there is no window of broken connectivity.
sequenceDiagram
participant Sch as Scheduler / API
participant CR as CredentialRotationWorker
participant KV as Key Vault
participant Svc as Owning Integration Service
participant Bus as Azure Service Bus
Sch->>CR: RotateCredential (credentialId, reason)
CR->>KV: create new secret version
KV-->>CR: newVersionRef
CR->>CR: mark credential Rotating
CR->>Svc: verification IntegrationRun (test call with new version)
alt verification succeeds
Svc-->>CR: run Completed
CR->>KV: promote new version (deactivate prior)
CR->>CR: mark credential Active
CR->>Bus: publish CredentialRotated
else verification fails
Svc-->>CR: run Failed
CR->>KV: keep prior version active
CR->>CR: mark credential Active (prior)
CR->>Bus: publish IntegrationFailed (rotation)
end
Failure / Retry / Poison Handling¶
All failures across delivery, runs, and health probes converge on the IntegrationFailure aggregate, which is classified to decide the next action. The IntegrationRetryWorker re-executes transient failures within policy; poison messages are dead-lettered with the full envelope preserved for replay.
flowchart TB
Failure["IntegrationFailed event"] --> Classify{"Classify failure"}
Classify -->|Transient| Backoff["Backoff per RetryPolicy"]
Classify -->|RateLimited| Wait["Wait for rate window"]
Classify -->|Auth| Rotate["Trigger CredentialRotationWorker"]
Classify -->|Validation| Reject["Reject (no retry) + escalate"]
Classify -->|Poison| DLQ["Dead-letter + preserve envelope"]
Backoff --> Retry["IntegrationRetryWorker re-executes"]
Wait --> Retry
Retry -->|success| Ok["IntegrationRunCompleted + IntegrationFailureResolved"]
Retry -->|"cap reached"| DLQ
Rotate -->|rotated| Retry
DLQ --> Escalate["Escalate to Observability & Control Plane"]
Reject --> Escalate
Rules.
- Idempotent retries. Each retry is keyed by
failureId+ attempt counter; redelivery never double-executes a vendor side effect onceCompleted. - Classification-driven action. Only
TransientandRateLimitedfailures auto-retry;Authtriggers rotation;ValidationandPoisonnever silently retry. - Poison preservation. Dead-lettered messages keep the full canonical envelope so they can be replayed after a fix.
- Escalation is audited. Every escalation creates an
EscalationRecordand surfaces to Observability & Feedback and the Control Plane.