Skip to content

Workflows

Target Architecture — Final-State Design

This page describes the final-state operational workflows of the Integration Platform. Each workflow is event-driven, idempotent, tenant-scoped, and correlated by traceId so the full path from trigger to outcome is traceable in Observability.

The platform's behaviour is defined by four workflows: outbound webhook delivery (with retry and dead-lettering), the integration run lifecycle, credential rotation, and the cross-cutting failure / retry / poison handling that all the others feed into.

Webhook Delivery (Outbound) with Retry

When a factory event matches an active subscription, the WebhookDeliveryWorker delivers it to the external endpoint with a signed payload, retrying with exponential backoff and dead-lettering on exhaustion.

sequenceDiagram
    participant Bus as Azure Service Bus
    participant GW as WebhookGatewayService
    participant W as WebhookDeliveryWorker
    participant EP as External Endpoint
    participant KV as Key Vault

    Bus->>GW: factory event (e.g. DeploymentPromoted)
    GW->>GW: match active subscriptions (tenant-scoped)
    GW->>W: enqueue WebhookDelivery (Pending)
    W->>KV: resolve signing secret (signingSecretRef)
    KV-->>W: signing key (in-memory only)
    W->>W: sign payload (HMAC)
    W->>EP: POST payload + signature
    alt 2xx response
        EP-->>W: 200 OK
        W->>GW: mark Delivered
        W->>Bus: publish WebhookDelivered
    else non-2xx / timeout
        EP-->>W: 5xx / timeout
        W->>GW: record DeliveryAttempt (failed)
        W->>W: backoff (exponential)
        Note over W: retry up to max attempts
        W->>EP: POST (retry)
        alt still failing after cap
            W->>GW: mark DeadLettered + IntegrationFailure
            W->>Bus: publish IntegrationFailed
        end
    end
Hold "Alt" / "Option" to enable pan & zoom

Integration Run Lifecycle

Every outbound operation (open PR, send SMS, create charge) is an IntegrationRun that moves through a strict state machine. Inbound deliveries reuse the same failure/retry semantics.

stateDiagram-v2
    [*] --> Pending
    Pending --> Running : ExecuteIntegrationRun (connection Active + Healthy)
    Running --> Completed : vendor 2xx / success
    Running --> Failed : vendor error / timeout
    Failed --> Running : IntegrationRetryWorker (transient, within cap)
    Failed --> DeadLettered : poison or retry cap reached
    Completed --> [*]
    DeadLettered --> [*]

    note right of Running
        Each attempt recorded as RunAttempt;
        traceId propagated to vendor call span
    end note
    note right of Failed
        IntegrationFailure created and classified:
        Transient / RateLimited / Auth / Validation / Poison
    end note
Hold "Alt" / "Option" to enable pan & zoom

State transitions are monotonic and each emits a canonical event: entering Completed emits IntegrationRunCompleted; entering Failed or DeadLettered emits IntegrationFailed. See Events.

Credential Rotation

Rotation is triggered on a schedule (rotation policy) or on demand via POST /integrations/credentials/rotate. The prior secret version stays active until the new version passes verification, so there is no window of broken connectivity.

sequenceDiagram
    participant Sch as Scheduler / API
    participant CR as CredentialRotationWorker
    participant KV as Key Vault
    participant Svc as Owning Integration Service
    participant Bus as Azure Service Bus

    Sch->>CR: RotateCredential (credentialId, reason)
    CR->>KV: create new secret version
    KV-->>CR: newVersionRef
    CR->>CR: mark credential Rotating
    CR->>Svc: verification IntegrationRun (test call with new version)
    alt verification succeeds
        Svc-->>CR: run Completed
        CR->>KV: promote new version (deactivate prior)
        CR->>CR: mark credential Active
        CR->>Bus: publish CredentialRotated
    else verification fails
        Svc-->>CR: run Failed
        CR->>KV: keep prior version active
        CR->>CR: mark credential Active (prior)
        CR->>Bus: publish IntegrationFailed (rotation)
    end
Hold "Alt" / "Option" to enable pan & zoom

Failure / Retry / Poison Handling

All failures across delivery, runs, and health probes converge on the IntegrationFailure aggregate, which is classified to decide the next action. The IntegrationRetryWorker re-executes transient failures within policy; poison messages are dead-lettered with the full envelope preserved for replay.

flowchart TB
    Failure["IntegrationFailed event"] --> Classify{"Classify failure"}
    Classify -->|Transient| Backoff["Backoff per RetryPolicy"]
    Classify -->|RateLimited| Wait["Wait for rate window"]
    Classify -->|Auth| Rotate["Trigger CredentialRotationWorker"]
    Classify -->|Validation| Reject["Reject (no retry) + escalate"]
    Classify -->|Poison| DLQ["Dead-letter + preserve envelope"]

    Backoff --> Retry["IntegrationRetryWorker re-executes"]
    Wait --> Retry
    Retry -->|success| Ok["IntegrationRunCompleted + IntegrationFailureResolved"]
    Retry -->|"cap reached"| DLQ
    Rotate -->|rotated| Retry
    DLQ --> Escalate["Escalate to Observability & Control Plane"]
    Reject --> Escalate
Hold "Alt" / "Option" to enable pan & zoom

Rules.

  • Idempotent retries. Each retry is keyed by failureId + attempt counter; redelivery never double-executes a vendor side effect once Completed.
  • Classification-driven action. Only Transient and RateLimited failures auto-retry; Auth triggers rotation; Validation and Poison never silently retry.
  • Poison preservation. Dead-lettered messages keep the full canonical envelope so they can be replayed after a fix.
  • Escalation is audited. Every escalation creates an EscalationRecord and surfaces to Observability & Feedback and the Control Plane.