Extension Roadmap¶

Target Architecture — Final-State Design

This roadmap describes how the Runtime & Cloud Platform extends beyond its core 9 services without breaking its contracts. Every extension preserves the canonical event envelope, tenant isolation via RuntimeTenantBinding, Pulumi-declared infrastructure, and full traceability.

Principles¶

Contract stability — new capabilities are additive; existing events, APIs, and aggregates keep their meaning (breaking changes get a version suffix).
Everything as desired state — new infrastructure is Pulumi-declared and drift-reconciled; nothing is operated out of band.
Autonomy by default — extensions are control loops first, dashboards second: detect, decide, act, then surface to humans.
Tenant-safe — every extension respects the isolation model and quotas in RuntimeTenantBinding.
Trace-complete — new signals carry the cross-cutting dimensions and correlate to the same traceId.
Cloud-portable shape — Azure-first, but abstractions (ComputeTarget, RuntimeService) are kept provider-neutral so additional clouds can be added behind the same model.

Future Services¶

Candidate Service	Purpose
`RuntimeCostOptimizationService`	Continuously right-size workloads and recommend/apply cost-saving scaling and SKU changes per tenant.
`RuntimeChaosService`	Inject controlled failures to validate resilience and the rollback/remediation loops.
`RuntimeTrafficManagementService`	Progressive delivery, traffic shaping, and multi-region routing (canary/blue-green at the edge).
`RuntimeComplianceService`	Continuously assert runtime compliance posture (residency, encryption, isolation) against Governance policy.
`RuntimeBackupRestoreService`	Tenant-aware backup, point-in-time restore, and DR orchestration across data stores.
`RuntimeCarbonService`	Surface and optimize energy/carbon footprint of generated runtimes.

Future Workers¶

Candidate Worker	Trigger	Purpose
`CostAnomalyWorker`	Timer + billing signals	Detect cost anomalies and propose remediation.
`CapacityForecastWorker`	Timer	Forecast capacity from usage trends and pre-provision.
`DisasterRecoveryWorker`	Failover signal	Promote a paired region and re-bind tenants.
`ComplianceDriftWorker`	Timer + policy change	Detect compliance drift distinct from config drift.
`CertificateRotationWorker`	Expiry schedule	Rotate TLS certificates alongside secret rotation.

Future APIs¶

POST /runtime/traffic-policies — declare progressive-delivery and routing policies.
POST /runtime/restore-points and POST /runtime/restores — backup and restore orchestration.
GET /runtime/cost/{environmentId} — per-environment cost and optimization recommendations.
GET /runtime/compliance/{environmentId} — runtime compliance posture.

Marketplace Opportunities¶

The Marketplace Platform can offer runtime building blocks as reusable, governed assets:

Runtime topology blueprints — opinionated Pulumi stacks (e.g. "regulated multi-region SaaS", "cost-lean single-region") publishable and reusable across tenants.
Scaling policy packs — curated ScalingPolicy presets per workload archetype.
Health & drift policy packs — reusable health gates and drift-remediation policies.
Runtime add-ons — pre-integrated runtime capabilities (CDN, WAF, cache tiers) installable into an environment.

Agent Opportunities¶

The platform is a natural home for autonomous Agent Mesh operators that reason over runtime signals and act through the existing APIs:

Runtime SRE Agent — diagnoses incidents from health/drift/scaling signals and proposes or executes governed remediations.
Capacity Planner Agent — tunes scaling policies and node pools from forecast and SLO attainment.
Cost Optimizer Agent — proposes right-sizing and SKU changes, gated by Governance.
Resilience Agent — schedules chaos experiments and validates rollback/DR readiness.
Drift Remediation Agent — classifies drift findings and selects the safest corrective deployment.

Each agent consumes the platform's events as grounding (via the Knowledge Platform), proposes actions as governed commands, and feeds outcomes back to Observability & Feedback — extending the factory's self-improvement loop all the way into production operations.