🧠 Cloud Provisioner Agent Specification¶
☁️ Core Purpose¶
The Cloud Provisioner Agent is the execution-layer enforcer of cloud resource provisioning within the ConnectSoft AI Software Factory. Its purpose is to translate cloud architecture plans into actual cloud infrastructure using Pulumi-based Infrastructure-as-Code (IaC).
It operates under orchestration (e.g., triggered by IaCCoordinator or EnvironmentSetupCoordinator) and receives high-level deployment topology, region selections, and environment overlays from the Cloud Architect Agent, along with secrets and compliance overlays from the Security Engineer Agent.
It ensures that all generated SaaS services, environments, and tenants have their required cloud foundation deployed, versioned, and trace-linked — across all supported regions and clouds.
🛠️ Role in the Platform¶
| Layer | Role |
|---|---|
| Architecture | Implements cloud-region-map, topology, replication, and failover plans |
| Infrastructure | Converts blueprints into deployed, running cloud resources |
| DevOps | Enables pipelines and services to target real endpoints (AKS, Key Vault, etc.) |
| Security | Respects and injects policy-enforced vaults, identity scopes, DNS constraints |
| Environment Execution | Brings up cloud environments per tenant, region, edition, or component need |
🌍 Phase Scope (Azure-First)¶
| Provisioned Resource | Azure Equivalent |
|---|---|
| Compute Cluster | Azure Kubernetes Service (AKS) |
| Secret Store | Azure Key Vault |
| Object Storage | Azure Blob Storage |
| DNS Zones / Entries | Azure DNS |
| Observability | Azure Monitor / Log Analytics |
| Identity (future) | Azure AD App Registrations, Managed Identities |
| Database Layer | Azure PostgreSQL / Azure SQL (optional in Phase 1) |
Each resource is:
- Provisioned via Pulumi
- Annotated with
trace_id,blueprint_id, andprovisioned_by: cloud-provisioner-agent - Tagged for environment, region, and compliance scope
📌 Strategic Alignment with Cloud Architect Agent¶
| Cloud Architect Agent Defines | Cloud Provisioner Agent Does |
|---|---|
cloud-region-map.yaml |
Deploys to primary/secondary regions |
replication-strategy.yaml |
Provisions redundant or geo-replicated resources |
resource-compliance-tags |
Applies tags, identity scopes, and access limits |
zone-mapping |
Provisions zonal or multi-zone clusters if required |
dns-domain-map.yaml |
Allocates zone and creates records |
🔗 Execution Trigger (Orchestrated)¶
Triggered by:
IaCCoordinator(e.g., on environment creation, blueprint activation, tenant onboarding)-
Orchestration events like:
-
CreateCloudResourcesForTenant ProvisionEditionLevelInfraUpdateRegionCapacity
🧭 Platform Flow Placement¶
flowchart TD
A[Cloud Architect Agent]
B[Infrastructure Architect Agent]
C[Security Engineer Agent]
D[Cloud Provisioner Agent]
E[DevOps Engineer Agent]
F[Azure / Kubernetes Resources]
A --> D
B --> D
C --> D
D --> F
D --> E
🧩 Example Scenario¶
A new service is being deployed for Edition: EU-MultiTenant, requiring:
- AKS cluster in
westeurope - Azure DNS with
*.edition.connectsoft.io - Azure Key Vault with injected secrets for runtime
- Azure Blob for distributed file storage
- Resource tags:
env=staging,region=westeurope,edition=EU
Cloud Architect defines it → Cloud Provisioner Agent renders Pulumi → provisions resources → emits resource map → DevOps Agent consumes outputs.
✅ Summary¶
The Cloud Provisioner Agent:
- ☁️ Turns infrastructure blueprints into real cloud infrastructure
- 🔁 Ensures region- and tenant-specific environments are provisioned
- 🔐 Integrates with secrets, DNS, identity, and observability layers
- 🧭 Respects architectural directives, trace constraints, and cloud policy overlays
- 📊 Emits trace-linked metadata for CI/CD, security, and observability agents
It is a foundational executor — bringing ConnectSoft’s autonomous SaaS deployment vision into physical cloud reality.
📌 Core Responsibilities Overview¶
The Cloud Provisioner Agent is responsible for materializing infrastructure blueprints into provisioned cloud resources across environments and tenants, starting with Azure.
It receives cloud architecture plans, security overlays, and service bindings, and converts them into IaC-managed cloud infrastructure using Pulumi.
Its deliverables are traceable, version-controlled, validated, and aligned with orchestration plans.
🔧 Detailed Responsibilities¶
✅ 1. Pulumi Stack Generation¶
-
Render complete stack files per environment, tenant, or blueprint scope:
-
Pulumi.yaml(project definition) Pulumi.{stack}.yaml(stack config)index.ts/main.cs(actual resource logic)- Structure output for GitOps and CI integration
✅ 2. Infrastructure Provisioning¶
-
Execute
pulumi upto provision resources for: -
AKS clusters
- Azure Key Vault instances
- Azure Blob Storage
- Azure DNS zones and records
- Azure Monitor / App Insights
- Resource groups, virtual networks (if required)
- Tag all resources with:
trace_id: trace-789
provisioned_by: cloud-provisioner-agent
environment: staging
edition: EU
region: westeurope
✅ 3. Multi-Region & Zonal Strategy Execution¶
-
Use
cloud-region-map.yamlandreplication-strategy.yamlto: -
Deploy primary and failover resource groups
- Configure geo-redundant storage or DNS
- Provision zonal AKS clusters with SLA-specific constraints
✅ 4. Secrets and Vault Binding¶
-
Inject secrets provided by Security Engineer Agent into:
-
Key Vault creation and initial seeding
- Pulumi
secretsProviderblock - Emit structured
secret-bindings.jsonfor downstream agents
✅ 5. Output & Endpoint Metadata Generation¶
-
Produce outputs consumed by:
-
DevOps pipelines (for endpoint injection)
- API Gateways (for DNS mapping)
- Observability Agent (for log collection and telemetry hooks)
- Example:
{
"aks_cluster_name": "aks-eu-staging-01",
"key_vault_uri": "https://vault-eu-staging.vault.azure.net/",
"storage_url": "https://blob-eu-staging.blob.core.windows.net",
"dns_record": "auth.europe.connectsoft.io"
}
✅ 6. Provisioning Validation & Status Emission¶
-
After each deployment:
-
Run
pulumi previewand compare before/after diffs -
Emit:
ResourcesProvisionedProvisioningFailed(if any step fails)- Full provisioning log and event metadata
- Store deployment snapshot (
provisioning-output.json) for trace audit
✅ 7. Traceability & Versioning¶
-
All outputs must include:
-
trace_id execution_idblueprint_idstack_idagent_origin: cloud-provisioner-agent- Commit stack files to Git (optional) under:
✅ 8. Collaborative Feedback Loop¶
-
Respond to signals from:
-
Cloud Architect Agent — topology, region, SLA constraints
- Security Engineer Agent — vaults, policy blocks, RBAC scopes
- Infrastructure Architect Agent — service/infra overlays
-
Provide infrastructure URIs and secrets back to:
-
DevOps Engineer Agent
- Observability Agent
- Platform Coordinator (if applicable)
📦 Summary of Primary Deliverables¶
| Artifact | Description |
|---|---|
Pulumi.yaml |
IaC project manifest |
Pulumi.dev.yaml |
Config file per environment |
index.ts / main.cs |
Stack logic with resources |
outputs.json |
Emitted resource map |
provisioning.log |
CLI summary of provisioning run |
provisioned-resources.json |
Full list of resource URIs, tags, metadata |
secret-bindings.json |
Used by DevOps and security agents |
ResourcesProvisioned event |
Traceable success notification |
✅ Summary¶
The Cloud Provisioner Agent is the infrastructure enabler of ConnectSoft’s AI Software Factory:
- 📦 Generates cloud-native, GitOps-ready IaC artifacts
- ☁️ Provisions Azure infrastructure with trace-linked metadata
- 🔐 Injects secrets, applies policies, and enforces topology plans
- 🔁 Feeds downstream agents with reliable, deployable endpoints
It bridges blueprint design with real cloud execution — securely and autonomously.
📥 Inputs¶
The Cloud Provisioner Agent requires a well-defined set of inputs from upstream agents and orchestration logic. These inputs guide:
- 📦 What to provision
- 🌍 Where to provision (region, zone, environment)
- 🔐 With what policies, secrets, naming conventions, and blueprint scope
All inputs are trace-bound, environment-aware, and cloud-specific.
🔑 Primary Inputs¶
1️⃣ cloud-region-map.yaml¶
Provided by: Cloud Architect Agent
Used to define target regions and zonal requirements.
2️⃣ replication-strategy.yaml¶
Provided by: Cloud Architect Agent
Determines whether to deploy mirrored resources across regions or zones.
3️⃣ environment-overlay.yaml¶
Provided by: Infrastructure Architect Agent
environment: staging
resource_prefix: cs-stg
resource_group: cs-stg-infra
tags:
env: staging
edition: EU
trace_id: trace-789
Used to inject naming, tagging, and grouping conventions.
4️⃣ component-scope.yaml¶
Provided by: Orchestration (e.g., IaCCoordinator)
component: AuthService
execution_id: exec-auth-789
trace_id: trace-auth-789
blueprint_id: blueprint-auth-multi
Connects infrastructure to the service’s lifecycle and audit trail.
5️⃣ infra-plan.yaml¶
Can be composed from blueprint fragments or pre-assembled.
resources:
- type: aks
size: standard
node_count: 3
k8s_version: 1.28
- type: keyvault
policy: app-only
- type: storage
tier: standard
- type: dns
fqdn: auth.europe.connectsoft.io
Describes the desired resource topology.
6️⃣ secrets-metadata.yaml¶
Provided by: Security Engineer Agent
secrets:
- name: AUTH_SECRET
value_from: vault
vault_ref: authservice-app-secret
mount_strategy: env
Used to seed Azure Key Vault and inform DevOps pipeline injection.
7️⃣ resource-constraints.yaml (Optional)¶
Can come from orchestration or platform governance policies.
🧠 Internal Contextual Inputs (Resolved by Agent or Environment)¶
| Field | Description |
|---|---|
cloud_provider |
Default: azure (future: aws, gcp) |
pulumi_project |
Derived from component_name + region |
pulumi_stack_name |
e.g., staging-eu-auth |
resource_prefix |
e.g., cs-stg-auth |
🧪 Example Consolidated Input Context¶
trace_id: trace-auth-789
execution_id: exec-auth-789
component: AuthService
cloud_provider: azure
region: westeurope
environment: staging
resource_prefix: cs-stg-auth
infra_plan:
aks: true
keyvault: true
dns: auth.europe.connectsoft.io
storage: standard
secrets:
- vault_ref: authservice-secret
📎 Input Validation Checklist¶
| Input | Validation |
|---|---|
trace_id, execution_id |
✅ Required |
primary_region, resource_prefix |
✅ Required |
infra_plan |
✅ Must define at least 1 resource |
secrets |
✅ Must map to valid vault refs |
replication |
🟡 Optional, fallback to single-region mode |
environment |
✅ Used for naming, tags, and stack configs |
🧠 Summary¶
The Cloud Provisioner Agent consumes a composite input model, made up of:
- Architecture inputs (regions, replication, DNS)
- Security overlays (vaults, mount strategies)
- Environment config (tags, naming, resource groups)
- Blueprint links (trace ID, component scope)
This enables the agent to deterministically generate and provision compliant, observable, cloud infrastructure.
📤 Output¶
The Cloud Provisioner Agent emits structured, traceable outputs in the form of:
- ✅ Pulumi stack files and project artifacts
- 📁 Provisioning logs and resource outputs
- 🔐 Secrets injection metadata
- 📡 Events and telemetry spans for orchestration and observability
- 💾 Cloud resource metadata for downstream agents (DevOps, Observability, Security)
✅ Primary Output Artifacts¶
1️⃣ Pulumi Stack Files¶
| File | Description |
|---|---|
Pulumi.yaml |
Pulumi project definition (name, runtime, description) |
Pulumi.{stack}.yaml |
Stack configuration file (region, secrets provider, tags) |
index.ts / main.cs |
Program logic to provision resources |
stack-output.json |
Output map of provisioned endpoints, IDs, URIs |
provisioning.log |
CLI logs from pulumi up or preview |
All files are committed (or staged) in Git at:
/infrastructure/stacks/{component}/{env}/{region}/
2️⃣ Cloud Resource Output Map¶
Emitted after successful provisioning:
{
"aks_cluster_name": "cs-stg-auth-aks01",
"aks_kubeconfig_secret": "vault://auth-kubeconfig",
"dns_record": "auth.eu.connectsoft.io",
"key_vault_uri": "https://vault-auth-stg.vault.azure.net/",
"storage_url": "https://csstgautheustorage.blob.core.windows.net"
}
Used by:
- DevOps Engineer Agent (for pipeline/environment injection)
- API Gateway Agent (for DNS mapping)
- Observability Agent (for logs/metrics setup)
3️⃣ Secrets Metadata Output¶
{
"secrets":
[
{
"name": "AUTH_SECRET",
"source": "azure-keyvault",
"vault_uri": "https://vault-auth-stg.vault.azure.net/",
"mount_strategy": "env"
}
]
}
Shared with:
- DevOps Agent for CI/CD env binding
- Security Agent for validation
- Test Agent for runtime test secrets (if allowed)
4️⃣ Deployment Event¶
ResourcesProvisioned¶
{
"event": "ResourcesProvisioned",
"trace_id": "trace-auth-789",
"component": "AuthService",
"execution_id": "exec-auth-789",
"environment": "staging",
"region": "westeurope",
"stack_path": "infrastructure/stacks/authservice/staging/westeurope",
"resource_count": 6,
"timestamp": "2025-05-08T10:00:22Z"
}
Consumed by:
- Orchestration layer
- DevOps Agent
- Dashboards / Observability Agent
5️⃣ Pulumi Output Metadata (YAML / JSON)¶
Saved for audit trail:
{
"project": "authservice-stack",
"stack": "staging-westeurope",
"provisioned_by": "cloud-provisioner-agent",
"resources": [
{ "type": "azure:containerservice:ManagedCluster", "name": "aks-auth-eu" },
{ "type": "azure:storage:BlobContainer", "name": "authfiles" }
]
}
📦 Optional Outputs (Based on Context)¶
| Output | Condition |
|---|---|
manual-approval.yaml |
If sensitive resource or environment requires pre-deploy approval |
rollback-plan.json |
If preview identifies drift and fallback is enabled |
dns-map.yaml |
If multiple endpoints / subdomains need to be shared with API Gateway Agent |
🧠 Metadata Embedded in All Outputs¶
All output artifacts include:
trace_id: trace-auth-789
execution_id: exec-auth-789
blueprint_id: blueprint-auth-multi
agent_origin: cloud-provisioner-agent
environment: staging
region: westeurope
📂 File Structure Convention (GitOps-Compatible)¶
/infrastructure/stacks/
└── authservice/
└── staging/
└── westeurope/
├── Pulumi.yaml
├── Pulumi.staging-westeurope.yaml
├── index.ts
├── provisioning.log
├── outputs.json
└── secrets.json
✅ Summary¶
The Cloud Provisioner Agent outputs:
- 📦 All required Pulumi IaC assets
- 📡 Rich cloud output metadata
- 🔐 Fully structured secrets and mount plans
- 🧠 Traceable provisioning logs and telemetry
- 🛰️ Events and files consumed by downstream agents
Its outputs are the bridge between abstract cloud design and concrete deployable infrastructure.
📚 Knowledge Base¶
The Cloud Provisioner Agent has access to a versioned, cloud-specific knowledge base composed of:
- Reusable Pulumi module templates
- Region, naming, and topology policies
- Compliance and tagging requirements
- Deployment patterns per resource type
- Cloud-specific constraints and best practices
- Secret and identity handling strategies
This knowledge is used to generate secure, consistent, policy-aligned cloud infrastructure.
🧱 Core Knowledge Categories¶
1️⃣ Pulumi Template Modules¶
| Resource Type | Pulumi Module |
|---|---|
| AKS | modules/azure/aks-cluster.ts |
| Key Vault | modules/azure/keyvault.ts |
| Blob Storage | modules/azure/blob-storage.ts |
| DNS Zones & Records | modules/azure/dns-record.ts |
| Log Analytics | modules/azure/monitor-insights.ts |
| Virtual Network (Optional) | modules/azure/vnet.ts |
Each template:
- Supports parameter overrides
- Uses trace- and environment-aware naming
- Emits outputs to the final Pulumi stack
2️⃣ Naming Convention Rules¶
naming:
resource_prefix: cs
format: "{prefix}-{env}-{region}-{component}"
allowed_chars: [a-z0-9-]
max_length: 63
Used to auto-resolve cloud resource names (e.g.,
cs-stg-weu-auth-aks)
3️⃣ Environment & Region Topology Definitions¶
regions:
staging:
default: westeurope
failover: northeurope
zones: [1,2]
production:
default: francecentral
failover: westeurope
Guides region-aware provisioning and DNS/FQDN mappings.
4️⃣ Cloud Resource Classifiers¶
| Class | Rule |
|---|---|
stateful |
Require backup or replication (e.g., blob storage) |
sensitive |
Must be deployed with key vault + tag compliance=high |
ephemeral |
Skippable during rollback or low-priority teardown |
Informs retry logic, vault strategy, and disaster recovery scope.
5️⃣ Secrets Mounting Strategies¶
Policy determines which strategy to apply by environment or resource type.
6️⃣ Tagging Policy Rules¶
All resources must be tagged with:
tags:
trace_id: <REQUIRED>
blueprint_id: <REQUIRED>
environment: <REQUIRED>
provisioned_by: cloud-provisioner-agent
Optionally:
7️⃣ Common Output Resolvers¶
Used to emit resource-specific outputs:
return {
aks_cluster_name: aks.name,
dns_record: dnsRecord.fqdn,
vault_uri: keyVault.vaultUri,
storage_url: storageAccount.primaryBlobEndpoint,
};
8️⃣ Blueprint Resource Map Inference¶
Pre-trained LLM model (or lookup index) for:
| Blueprint Use Case | Expected Infra |
|---|---|
auth-service |
AKS, DNS, KV, Storage |
report-generator |
KV, Blob, LogAnalytics |
tenant-onboarding |
DNS zone, Storage, Key Vault, Managed Identity |
Used to auto-expand from use case → provisioning requirements when
infra_planis partial.
📦 Knowledge Source Locations¶
| Asset Type | Location |
|---|---|
| Templates | infrastructure/modules/azure/ |
| Naming/Tagging Rules | infrastructure/policies/naming.yaml |
| Region Topology | cloud-region-map.yaml |
| Secrets Policy | security/overlay-vault.yaml |
| Blueprints to Infra | knowledge/infra-map-index.json |
🧠 Summary¶
The Cloud Provisioner Agent leverages a rich and versioned knowledge base that includes:
- 🧱 Pulumi templates
- 🧭 Region-aware deployment rules
- 🔐 Secrets and vault strategies
- 📛 Naming and tagging policies
- 📦 Use-case to infrastructure mapping
This allows it to provision secure, consistent, and trace-aligned cloud environments — fully autonomously.
🔄 Process Flow¶
flowchart TD
A[Receive Input & Trace Metadata] --> B[Load Infra Plan + Region Map + Vault Info]
B --> C[Resolve Templates + Merge Overlays]
C --> D[Generate Pulumi Stack Files]
D --> E[Run `pulumi preview` (validate)]
E --> F{Approved or Auto-Proceed?}
F -- Yes --> G[Run `pulumi up` (provision)]
F -- No --> H[Emit PreviewOnly + Await Approval]
G --> I[Verify Resources + Outputs]
I --> J[Emit Outputs, Logs, and Event: ResourcesProvisioned]
🪜 Detailed Step Breakdown¶
✅ Step 1: Receive Execution Context¶
Triggered by orchestration (e.g., IaCCoordinator), receives:
trace_id,blueprint_id,execution_idinfra_plan.yamlcloud-region-map.yamlsecrets.yaml
Ensures every run is traceable and bounded to a blueprint scope.
✅ Step 2: Load Knowledge & Merge Overlays¶
-
Load:
-
Region topology and resource plan
- Naming/tagging policy
- Secrets strategy
-
Merge:
-
Environment overlays (
resource_group,tags,replication)
✅ Step 3: Template Rendering¶
- Select Pulumi templates (from library)
-
Generate:
-
Pulumi.yaml(project) Pulumi.{stack}.yaml(stack config)index.ts/main.cs(logic with parameters)
✅ Step 4: Preview Mode (Validation)¶
- Run
pulumi preview -
If success:
-
Proceed or emit
ProvisioningPreviewReadyevent -
If failure:
-
Emit
ProvisioningFailed - Optionally retry with fallback region or settings
✅ Step 5: Execution (Provisioning)¶
If approved or auto_proceed = true:
- Run
pulumi up -
Track:
-
Resource count
- Changed vs created
- Outputs emitted
- Capture full provisioning log
✅ Step 6: Post-Provisioning Output¶
-
Resolve:
-
Resource URIs (AKS, DNS, Key Vault, Blob, etc.)
- DNS records
- Vault URIs and bound secrets
-
Validate:
-
Resource tags
- Trace metadata
- Naming format
✅ Step 7: Emit Artifacts¶
Output to Git or blob:
stack-output.jsonPulumi.yaml,Pulumi.stack.yamlprovisioning.logsecret-bindings.json
Emit:
ResourcesProvisionedevent- OTEL spans
- Infra snapshot to DevOps and Observability agents
🧠 Embedded Trace Metadata in All Steps¶
Every file, log, and emitted event includes:
trace_id: trace-auth-789
blueprint_id: blueprint-auth-multi
execution_id: exec-auth-789
agent_origin: cloud-provisioner-agent
region: westeurope
environment: staging
🔁 Alternate Flows¶
| Condition | Alternate Step |
|---|---|
| Preview failed | Retry with fallback region (if allowed) |
| Vault unreachable | Emit soft failure, mark secrets as pending |
| Manual approval required | Emit preview-only event and await signal |
| Output differs from last | Emit ProvisioningDriftDetected (future) |
🧠 Summary¶
The Cloud Provisioner Agent’s process flow ensures:
- 🛠️ Deterministic rendering from blueprint + region
- ☁️ Secure and compliant provisioning via Pulumi
- 📡 Observable and versioned outputs
- 🔁 Safe retry and approval paths built into the flow
It is modular, policy-bound, traceable, and ready for multi-cloud scaling.
🧩 Skills and Kernel Functions¶
The Cloud Provisioner Agent uses a combination of:
- 📚 Semantic Kernel (SK) Skills — composable functions for planning, transformation, naming, and template expansion
- ⚙️ Domain-specific Pulumi SDK bindings — for real-time provisioning logic
- 🔁 Agent-local orchestration logic — to manage flows, validate diffs, and emit outputs
All skills operate with full trace context and are injected into an execution plan derived from blueprint and environment state.
🔧 Core Semantic Kernel Skills¶
| Skill | Purpose |
|---|---|
ResourcePlanResolverSkill |
Merges infra_plan.yaml + region_map.yaml + overlays |
PulumiTemplateSelectorSkill |
Chooses appropriate stack templates (AKS, KV, Blob, etc.) |
NamingResolverSkill |
Computes compliant resource names (tagged, traceable, cloud-safe) |
SecretsInjectionSkill |
Maps vault references to initial seed values in Pulumi |
PulumiRendererSkill |
Composes and writes Pulumi.yaml, stack files, and index.ts |
ProvisioningPreviewSkill |
Runs pulumi preview and captures diff / output |
PulumiExecutorSkill |
Executes pulumi up (if approved or auto-proceed) |
OutputFormatterSkill |
Extracts key URIs, secrets, and resource identifiers |
EventEmitterSkill |
Emits ResourcesProvisioned, ProvisioningFailed, etc. |
TelemetryTracerSkill |
Injects span metadata, logs, and OTEL context during execution |
🧠 AI-Augmented Kernel Functions¶
These skills may use LLM reasoning for complex planning or decision assistance:
| Function | Description |
|---|---|
StackNamingPlanner |
Suggests short, region-safe resource names (63-char limit, lowercase, etc.) |
TopologyExpander |
Infers additional resources from partial plans (e.g., DNS implied by AKS) |
RegionFallbackAdvisor |
Suggests next best region if primary is unavailable or quota-exceeded |
PolicyComplianceChecker |
Detects drift or missing tags in planned output before provisioning |
🔁 Execution Plan Sample (SK-Compatible)¶
steps:
- use ResourcePlanResolverSkill
- use PulumiTemplateSelectorSkill
- use NamingResolverSkill
- use PulumiRendererSkill
- use ProvisioningPreviewSkill
- if approved:
- use PulumiExecutorSkill
- use OutputFormatterSkill
- use EventEmitterSkill(ResourcesProvisioned)
- else:
- use EventEmitterSkill(ProvisioningPreviewReady)
🔐 Policy + Secret Skill Integrations¶
- All secrets must flow through
SecretsInjectionSkill - If
vault_strategy = vault-agent-sidecar, injectsidecar.yamltemplate into index.ts - Role assignments (e.g., to services or pipelines) handled via
AccessPolicyComposerSkill(planned)
📦 Skill Library Versions & Strategy¶
| Source | Versioning Strategy |
|---|---|
templates/pulumi/ |
SemVer + cloud-provider scope |
skills/ |
SK plugin folders with unit-tested prompt wrapping |
common/overlays/ |
YAML-driven, updated per environment baseline |
🧠 Summary¶
The Cloud Provisioner Agent orchestrates its infrastructure logic using:
- 📚 Semantic Kernel skills to render and validate IaC
- 🤖 LLM-enhanced functions to reason about naming, region fallback, and policy compliance
- 🔁 Composable execution plans for traceable, safe, auditable provisioning
It ensures infrastructure is provisioned modularly, predictably, and context-aware, at any scale.
🧰 Core Technology Stack¶
| Layer | Technology | Purpose |
|---|---|---|
| Infrastructure as Code | Pulumi (TypeScript/.NET SDK) | Declarative cloud provisioning |
| Cloud Provider (Phase 1) | Azure | Target for AKS, Key Vault, Storage, DNS |
| Agent Execution Runtime | .NET 8 + Semantic Kernel | Agent host + skills engine |
| LLM/AI Reasoning | Azure OpenAI (GPT-4 Turbo) | Stack expansion, naming, fallback logic |
| Observability | OpenTelemetry SDK | Spans: cloud.provision.start, provision.success, provision.failed |
| Orchestration Interface | Internal Orchestrator API / Coordinator | Triggered via IaCCoordinator or env setup FSM |
| CLI Execution | pulumi preview, pulumi up, pulumi destroy |
Infrastructure deployment lifecycle |
☁️ Pulumi Configuration Standards¶
- Project root:
/infrastructure/stacks/{component}/{env}/{region}/ - Runtime: Default TypeScript (later supports .NET Pulumi)
- Secrets provider: Azure Key Vault (
pulumi config set --secret ...) - Stack naming convention:
{env}-{region}-{component}(e.g.,staging-westeurope-auth)
🌐 Azure Services Targeted¶
| Resource | Pulumi Module Used |
|---|---|
| AKS Cluster | @pulumi/azure-native.containerservice.ManagedCluster |
| Key Vault | @pulumi/azure-native.keyvault.Vault |
| DNS Zone & Record | @pulumi/azure-native.network.DnsZone |
| Blob Storage | @pulumi/azure-native.storage.StorageAccount |
| Log Analytics | @pulumi/azure-native.insights.* |
| Virtual Network (optional) | @pulumi/azure-native.network.VirtualNetwork |
All modules used via centralized module templates (
modules/azure/*.ts).
📦 GitOps & Storage Integration¶
| Use | Tool |
|---|---|
| IaC Versioning | Git repo under /infrastructure/stacks/... |
| Logs + output files | Azure Blob Storage (trace-tagged) |
| Secret injection | Azure Key Vault (shared or per-service) |
| Snapshot storage | outputs.json, provisioning.log uploaded to blob/archive path |
🔐 Security & Secrets Management¶
| Use Case | Tech |
|---|---|
| Vault Seeding | Pulumi secret config + ARM access policy |
| Sidecar Injection | Agent-side template renders vault-agent manifest (if required) |
| Identity Binding (future) | Managed Identity + App Registration support |
All secrets flow through
SecretsInjectionSkilland conform to overlay policy.
📊 Observability Stack¶
| Tool | Role |
|---|---|
| OpenTelemetry | Emitted spans (start, preview, success/fail) with trace_id, execution_id |
| Grafana Dashboards | Metrics visualization (resource count, duration, error rate) |
| Azure Monitor | Logs from provisioning runs, validation failures |
| Structured Logs | Emitted via ILogger → forwarded to blob, Azure Monitor, or Loki |
🔁 Agent Trigger & Integration¶
| Interface | Trigger |
|---|---|
| Orchestrator API | POST /provision/stack |
| Git hook | PR or branch with infra-plan.yaml + trace_id |
| Manual CLI | dotnet run provision --trace trace-auth-789 |
🧪 Validation Tools¶
| Check | Tool |
|---|---|
| Template lint | tslint, eslint |
| Pulumi preview | pulumi preview --diff --stack ... |
| Cloud quota check | (planned) Azure Resource Graph query via SDK |
| Stack diff validation | Hash-based output delta + drift detection |
🧠 Summary¶
The Cloud Provisioner Agent’s tech stack enables:
- 🔁 Automated, GitOps-friendly infrastructure provisioning
- ☁️ Azure-native cloud resources delivered via Pulumi
- 📡 Observable, trace-tagged provisioning lifecycle
- 🔐 Secure secret and environment injection
- 🤖 AI-assisted planning, naming, and recovery
This stack ensures modular, reproducible, multi-environment infrastructure delivery with minimal manual ops.
📜 System Prompt¶
The System Prompt is a persistent LLM instruction that ensures the Cloud Provisioner Agent:
- Operates securely and deterministically
- Applies cloud infrastructure best practices
- Honors blueprint-level traceability and naming rules
- Executes provisioning within policy and region constraints
- Emits complete, structured outputs for downstream automation
It is injected on agent startup and used across Semantic Kernel planning, skills, and function chains.
📋 System Prompt (Full Text)¶
You are the Cloud Provisioner Agent in the ConnectSoft AI Software Factory.
Your responsibility is to generate and provision secure, traceable, environment-specific cloud infrastructure using Infrastructure-as-Code (IaC) — primarily Pulumi for Azure.
You consume structured inputs including:
- Blueprint ID, Trace ID, Execution ID (for traceability)
- Infra plan YAML defining AKS, Key Vault, Storage, DNS, etc.
- Region overlays and zone definitions (primary, failover, SLA)
- Security overlays including secrets and vault mappings
- Environment overlays such as resource groups and tags
You must:
- Select and render the appropriate Pulumi stack templates
- Merge overlays for environment, region, and secrets
- Apply naming conventions and tagging policies
- Validate the output using `pulumi preview`
- Provision cloud infrastructure only if preview is successful or auto-approve is true
- Emit outputs including stack files, provisioning log, and a full map of resource URIs and secrets
- Tag all resources with `trace_id`, `blueprint_id`, `environment`, and `agent_origin: cloud-provisioner-agent`
You must not:
- Provision anything if `trace_id` or `blueprint_id` is missing
- Use wildcard names, untagged resources, or unvalidated secrets
- Guess resource plans if `infra_plan` is missing (unless instructed by blueprint or default expansion rules)
All output must be deterministic, compliant with cloud-specific rules, and versioned for reproducibility.
Emit the `ResourcesProvisioned` event only if the stack completes successfully. Otherwise, emit `ProvisioningFailed` with diagnostics.
✅ Scope Imposed by Prompt¶
| Category | Constraint |
|---|---|
| Provisioning | Requires explicit trace, blueprint, and region |
| Secrets | Must match provided secrets-metadata.yaml or vault overlay |
| Naming | Uses enforced convention: {prefix}-{env}-{region}-{component} |
| Retry | Allowed only after preview or explicit fallback instruction |
| Emission | Event + structured output required for downstream agents |
🔐 Compliance Notes¶
- Resource tags enforced at provision time
- Regions validated against
cloud-region-map.yaml - Secret mount strategy must be explicit (
env,volume, orsidecar) - Naming must avoid uppercase, special characters, or disallowed suffixes
📦 Output Obligations per Prompt¶
- ✅ Pulumi stack files (YAML + TS)
- ✅ Resource output map (e.g.,
outputs.json) - ✅
provisioning.log - ✅ Secrets binding file (
secret-bindings.json) - ✅
ResourcesProvisionedorProvisioningFailedevent
🧠 Summary¶
The System Prompt ensures that the Cloud Provisioner Agent:
- 🧱 Renders reproducible, secure infrastructure
- 🔁 Follows environment-specific overlays and naming
- ☁️ Provisions only after successful preview
- 📡 Emits traceable events and outputs
- 🔒 Never bypasses security or compliance gates
It defines the agent as a safe, deterministic, cloud-native executor for all provisioned environments.
📥 Input Prompt Template¶
This template defines the structured YAML or JSON input passed to the Cloud Provisioner Agent when invoked by:
- The orchestration layer (e.g.,
IaCCoordinator) - Environment setup workflows
- Manual invocations during sandbox or testing flows
It encapsulates all context needed to plan, generate, and provision the cloud resources for a specific service, region, and environment.
📋 YAML Input Prompt Template¶
trace_id: trace-auth-789
execution_id: exec-auth-789
blueprint_id: blueprint-auth-multi
component: AuthService
agent_origin: orchestrator
cloud_provider: azure
environment: staging
region: westeurope
resource_group: cs-stg-rg-auth
resource_prefix: cs-stg-auth
auto_proceed: true
infra_plan:
aks:
node_count: 3
k8s_version: 1.28
keyvault:
policy: app-only
blob_storage:
tier: standard
dns:
fqdn: auth.europe.connectsoft.io
replication:
storage: geo-redundant
dns: failover
secrets:
- name: AUTH_SECRET
vault_ref: authservice-app-secret
mount_strategy: env
tags:
env: staging
edition: EU
provisioned_by: cloud-provisioner-agent
✅ Required Fields¶
| Field | Purpose |
|---|---|
trace_id, execution_id, blueprint_id |
Ensures full traceability |
component, environment, region |
Determines scope and naming |
infra_plan |
Describes what resources to provision |
secrets[] |
Vault references and mount strategy |
tags |
Mandatory for all provisioned resources |
auto_proceed |
Allows provisioning without human approval after preview |
🧪 Example JSON (API-Compatible Format)¶
{
"trace_id": "trace-invoice-500",
"execution_id": "exec-invoice-500-b",
"component": "InvoiceService",
"blueprint_id": "invoice-platform-v1",
"environment": "production",
"region": "francecentral",
"cloud_provider": "azure",
"auto_proceed": false,
"infra_plan": {
"dns": {
"fqdn": "invoices.connectsoft.io"
},
"keyvault": {
"policy": "rbac"
},
"blob_storage": {
"tier": "hot"
}
},
"secrets": [
{
"name": "INVOICE_SECRET",
"vault_ref": "invoice-token",
"mount_strategy": "vault-agent-sidecar"
}
],
"tags": {
"env": "production",
"trace_id": "trace-invoice-500"
}
}
🔐 Input Validation Checklist¶
| Field | Required? | Notes |
|---|---|---|
trace_id |
✅ | Mandatory — blocks execution if missing |
infra_plan |
✅ | At least one resource must be specified |
secrets.vault_ref |
✅ | Must match known vault alias or key |
region |
✅ | Must be present in cloud-region-map.yaml |
resource_prefix |
🟡 | Auto-resolved if not present |
auto_proceed |
🟡 | Defaults to false (manual approval) |
📦 Optional Extended Fields¶
| Field | Description |
|---|---|
dns_map_override |
Override default DNS entries or inject zone hints |
manual_approval_required |
Forces wait state after preview |
cost_estimate_required |
Requests cost projection before execution |
fallback_region |
Allows automatic retry if region unavailable |
🧠 Summary¶
The Input Prompt Template enables the Cloud Provisioner Agent to:
- 📦 Receive infrastructure plans in a structured, deterministic way
- 🌍 Provision based on region, environment, and component scope
- 🔐 Respect secret policies and vault constraints
- 📊 Emit traceable outputs and telemetry linked to blueprint lineage
It guarantees that every provisioning run starts with a complete, auditable definition — with no guessing, and full policy alignment.
📤 Output Expectations¶
Every provisioning operation produces:
- 🧱 Infrastructure-as-Code artifacts (Pulumi stack files)
- 📄 Cloud resource output maps
- 📊 Provisioning logs
- 📡 Telemetry spans and events
- 🧩 Secrets binding metadata
- 💾 Snapshot files for GitOps and audit
These outputs are fully traceable to their blueprint and environment scope.
✅ Output Artifacts¶
1️⃣ Pulumi Stack Files¶
| File | Purpose |
|---|---|
Pulumi.yaml |
Defines the project and Pulumi runtime |
Pulumi.{stack}.yaml |
Contains config values, tags, and stack settings |
index.ts or main.cs |
Infrastructure logic (calls to Pulumi SDK) |
Pulumi.lock.yaml |
Pin versions of dependencies |
outputs.json |
Key URIs, resource IDs, endpoints, DNS, storage URLs |
All files are saved in:
2️⃣ Provisioning Log¶
[pulumi] Preview succeeded: 4 to create, 0 to change, 0 to delete
[pulumi] Updated resources:
- azure-native:containerservice:ManagedCluster
- azure-native:storage:BlobContainer
...
Saved as provisioning.log for debugging and audit purposes.
3️⃣ Resource Output Map (Structured JSON)¶
{
"aks_cluster_name": "aks-stg-auth",
"kubeconfig_vault_key": "vault://auth-kubeconfig",
"dns_record": "auth.stg.connectsoft.io",
"vault_uri": "https://vault-auth-stg.vault.azure.net",
"blob_url": "https://cs-stg-auth.blob.core.windows.net"
}
Used by:
- DevOps Engineer Agent
- Observability Agent
- API Gateway Agent (DNS setup)
4️⃣ Secrets Metadata Output¶
{
"secrets": [
{
"name": "AUTH_SECRET",
"vault_uri": "https://vault-auth-stg.vault.azure.net/",
"mount_strategy": "env"
}
]
}
Sent to:
- DevOps pipelines (for runtime binding)
- QA/Test Agent (if permitted)
5️⃣ Provisioning Events¶
✅ ResourcesProvisioned¶
{
"event": "ResourcesProvisioned",
"trace_id": "trace-auth-789",
"component": "AuthService",
"stack": "staging-westeurope",
"resource_count": 5,
"region": "westeurope",
"status": "success",
"timestamp": "2025-05-08T12:35:42Z"
}
Sent to Orchestration, Audit Log, DevOps Agent, and dashboards.
❌ ProvisioningFailed¶
{
"event": "ProvisioningFailed",
"reason": "AKS quota exceeded in region",
"trace_id": "trace-auth-789",
"stack": "staging-westeurope"
}
6️⃣ Trace & Audit Metadata¶
Included in all outputs:
trace_id: trace-auth-789
execution_id: exec-auth-789
blueprint_id: blueprint-auth-multi
agent_origin: cloud-provisioner-agent
region: westeurope
environment: staging
7️⃣ Optional Outputs¶
| Output | When Produced |
|---|---|
rollback-plan.json |
If provisioning partially succeeds |
dns-map.yaml |
If multiple DNS records are generated |
manual-approval.yaml |
If agent requires confirmation before deploy |
stack-diff.json |
When provisioning results deviate from preview or prior state |
📂 GitOps-Compatible File Structure¶
/infrastructure/stacks/
└── authservice/
└── staging/
└── westeurope/
├── Pulumi.yaml
├── Pulumi.staging-westeurope.yaml
├── index.ts
├── outputs.json
├── provisioning.log
├── secret-bindings.json
└── provisioning-event.json
📊 Telemetry Expectations¶
Emits OpenTelemetry spans:
| Span Name | Trigger |
|---|---|
cloud.provision.start |
When agent begins provisioning |
cloud.provision.success |
After successful pulumi up |
cloud.provision.failed |
On failure, with reason + trace_id |
cloud.provision.preview |
When preview is executed (even if not applied) |
🧠 Summary¶
The Cloud Provisioner Agent produces:
- 📦 Pulumi IaC artifacts (stack, project, config)
- 🌍 Cloud resource metadata (URIs, IDs, secrets)
- 📡 Events and telemetry for orchestration and observability
- 📁 Audit-ready logs and GitOps-compatible outputs
- 🔐 Secrets bindings for runtime injection
All outputs are trace-labeled, versionable, and consumable by downstream agents.
🧠 Memory¶
Memory enables the Cloud Provisioner Agent to:
- 📎 Maintain links between blueprints, environments, and actual provisioned resources
- 🔁 Avoid unnecessary reprovisioning (idempotency and state caching)
- 🔍 Detect drift or stack differences during preview
- 📊 Retain secrets metadata, resource tags, output maps, and history
- ⏮️ Enable re-entrancy in partially failed operations or rollback scenarios
🧠 Short-Term Memory (Execution Scope)¶
Stored in Semantic Kernel context dictionary or in ephemeral runtime cache.
| Key | Purpose |
|---|---|
trace_id, execution_id |
Carries traceability across steps |
resolved_stack_name |
Used to link Pulumi CLI actions to current operation |
template_plan |
Set of resolved Pulumi templates for the infra plan |
rendered_files |
In-memory representation of rendered files before disk write |
secrets_map |
Current secrets-to-vault mount plan |
resource_counts |
Expected # of resources before/after preview |
Cleared after each execution cycle unless retained in diagnostic mode.
💾 Long-Term Memory (Persistent)¶
Stored in Blob Storage, Azure Cosmos DB, or Git, depending on environment and tenant.
1️⃣ Stack History¶
{
"trace_id": "trace-auth-789",
"component": "AuthService",
"environment": "staging",
"region": "westeurope",
"stack": "staging-westeurope",
"last_modified": "2025-05-08T12:00:00Z",
"resource_count": 6,
"outputs_hash": "9ad7f3c1..."
}
Used to:
- Skip unchanged re-renders
- Compare preview diffs with previous state
- Enable safe re-runs of
pulumi up
2️⃣ Output Map History¶
{
"stack": "authservice-staging-weu",
"outputs": {
"aks": "aks-stg-auth-01",
"dns": "auth.stg.connectsoft.io"
},
"revision": 3
}
Retained for:
- Audit logging
- DevOps injection into pipelines
- Rollback targeting
3️⃣ Secrets Injection History¶
{
"vault_uri": "https://vault-auth-stg.vault.azure.net/",
"secret_refs": ["AUTH_SECRET", "KUBECONFIG"],
"mount_strategy": "env"
}
Used to:
- Detect missing or updated secrets
- Enforce policy compliance
- Recreate mount strategies during retry or promotion
4️⃣ Provisioning Logs + Snapshots¶
| File | Purpose |
|---|---|
provisioning.log |
Stored with timestamp and trace reference |
outputs.json |
Full resource outputs |
stack-diff.json |
Diff from last run, for drift detection |
🗂️ Retention Strategy¶
| Memory Type | Retention Duration |
|---|---|
| Stack outputs & logs | 90 days (rotated monthly) |
| Secrets metadata | 30–90 days, depending on policy sensitivity |
| Provisioning diffs | 60 days minimum for audit |
Successful ResourcesProvisioned events |
Archived indefinitely in trace store |
🔐 Access Control¶
- Write access only by Cloud Provisioner Agent
-
Read access by:
-
DevOps Agent (for output URI injection)
- Observability Agent (for infra monitoring)
- HumanOpsAgent (during rollback or review)
All memory objects include:
🔁 Replay Support (Future)¶
Memory structure enables:
- Provisioning replay with previous parameters
- Promotion-aware copy-to-environment (e.g., staging → prod)
- Drift-aware re-run with preview diff and policy check
🧠 Summary¶
The Cloud Provisioner Agent’s memory system supports:
- 🔁 Safe, idempotent infrastructure provisioning
- 📎 Trace-aware state retention for audits
- 🔍 Preview diff validation and rollback planning
- 📦 Secret history and vault injection tracking
This memory architecture makes cloud provisioning reproducible, compliant, and auditable by design.
🎯 Validation¶
Before any infrastructure is provisioned, the Cloud Provisioner Agent performs multi-layer validation to ensure:
- 🧱 Pulumi stack correctness
- 🔐 Secrets integrity and policy compliance
- 🧭 Region constraints and resource quota awareness
- 📛 Proper naming, tagging, and trace metadata
- 🛰️ Safe preview of changes with user or orchestrator confirmation
Validation protects cloud environments from drift, misconfiguration, overprovisioning, and policy violations.
✅ Validation Stages¶
1️⃣ Stack Structure Validation¶
| Check | Tool/Logic |
|---|---|
| Pulumi file completeness | All required files: Pulumi.yaml, stack file, logic file |
| YAML schema validity | Linter or custom validator (YAML + JSON schemas) |
| Supported resource types | Only whitelisted modules and cloud providers |
2️⃣ Naming & Tagging Enforcement¶
| Check | Enforcement Rule |
|---|---|
| Resource names | Must conform to naming.yaml pattern (e.g., {prefix}-{env}-{region}-{component}) |
| Max length | Enforced per Azure/GCP/AWS limits (e.g., AKS cluster ≤ 63 chars) |
| Required tags | Must include trace_id, blueprint_id, environment, agent_origin |
| Forbidden patterns | No capital letters, underscores, or reserved suffixes |
3️⃣ Secrets Validation¶
| Check | Enforcement |
|---|---|
vault_ref exists |
Must map to declared vault secret in overlay or policy |
mount_strategy valid |
Must be env, volume, or vault-agent-sidecar |
| No plaintext secrets | All secrets must resolve to secure mount or Pulumi config secret |
| SecretsProvider setup | Pulumi stack must declare a secrets provider if secrets used |
4️⃣ Region & Resource Constraints¶
| Constraint | Rule |
|---|---|
| Region availability | Must exist in cloud-region-map.yaml |
| Resource quota (planned) | Soft-check via Azure Resource Graph or Terraform provider |
| Failover support | replication-strategy.yaml must match resource plan (e.g., blob = geo-redundant) |
5️⃣ Pulumi Preview Validation¶
- Runs:
-
Checks:
-
Resource count: how many to create/change/delete
- Diff output: ensure no drift unless expected
- Errors: quota exceeded, invalid config, unknown provider
If preview fails, emit
ProvisioningFailedwith error snapshot and halt.
6️⃣ Trace Context Validation¶
| Requirement | Behavior |
|---|---|
trace_id missing |
❌ Block execution and emit validation error |
execution_id missing |
❌ Halt — required for observability and log correlation |
blueprint_id missing |
❌ Required for memory + audit linkage |
component undefined |
🟡 Warn — default to scope if trace metadata implies one |
📄 Example Validation Error (emitted as JSON)¶
{
"event": "ProvisioningFailed",
"trace_id": "trace-auth-789",
"stack": "staging-westeurope",
"reason": "Missing secretsProvider in Pulumi config",
"severity": "high",
"stage": "validation",
"timestamp": "2025-05-08T12:48:01Z"
}
🛠️ Validation Summary Table¶
| Category | Must Pass | Tool Used |
|---|---|---|
| File completeness | ✅ | Internal check |
| YAML/TS syntax | ✅ | yamllint, tslint, or schema validation |
| Stack preview pass | ✅ | pulumi preview |
| Tags present | ✅ | Tag policy engine |
| Vault mappings | ✅ | SecretsInjectionSkill |
| Naming constraints | ✅ | NamingResolverSkill |
🧠 Summary¶
The Cloud Provisioner Agent's validation system guarantees:
- 🧱 Well-formed IaC before deployment
- 🔐 Secure, compliant use of secrets
- 📛 Proper tagging and naming across cloud resources
- 📡 Preview visibility and error transparency before execution
- 🧾 Full trace and audit linkage to each provisioning action
Validation is the final gate before cloud infrastructure is deployed — ensuring that ConnectSoft remains secure, predictable, and compliant.
🔁 Retry & Correction Flow¶
Provisioning cloud infrastructure is inherently error-prone due to:
- 🛰️ Cloud-side quota errors or API delays
- 🔐 Vault/secret misconfigurations
- 🧱 YAML or resource plan issues
- 📦 Conflicting or drifted resource states
The Cloud Provisioner Agent must fail safely, retry deterministically, and never provision partial or insecure infrastructure.
✅ Retryable Error Categories¶
| Error Type | Action |
|---|---|
| Pulumi CLI transient failure | Retry up to 3x with exponential backoff |
| Azure API throttling (429) | Backoff and retry within cooldown window |
| DNS resolution delay (e.g., after zone creation) | Wait, re-query, retry binding |
| Vault unavailability | Retry after delay (up to policy-defined max attempts) |
| Stack lock present | Wait, reattempt pulumi up or notify coordinator |
❌ Non-Retryable / Escalation Errors¶
| Error | Action |
|---|---|
Invalid infra_plan |
Abort with ProvisioningFailed (schema or structure error) |
Missing trace_id or blueprint_id |
Hard stop — validation failure |
| Secrets mount strategy unknown | Fail with clear error, await manual fix |
| Quota exceeded (non-transient) | Emit error, suggest fallback region, notify orchestrator |
🧪 Auto-Correction Strategies (Safe Fallbacks)¶
| Condition | Correction |
|---|---|
Missing resource_prefix |
Derive from component + env + region |
| Undeclared tags | Auto-inject required tags (trace_id, env, etc.) |
| Missing fallback region | Use secondary_region from cloud-region-map.yaml |
| K8s version not provided | Use default LTS version for environment class |
| Empty DNS zone list | Suggest default zone or infer from blueprint + region |
All auto-corrections are tagged in log and span metadata for audit traceability.
🔁 Retry Flow Logic¶
flowchart TD
A[Start Provisioning] --> B[Run Validation + Preview]
B --> C{Preview OK?}
C -- No --> D{Retryable?}
D -- Yes --> B
D -- No --> E[Emit ProvisioningFailed]
C -- Yes --> F[Run pulumi up]
F --> G{Success?}
G -- No --> D
G -- Yes --> H[Emit ResourcesProvisioned]
📘 Retry Policy¶
retry:
max_attempts: 3
retry_interval_sec: 15
backoff_strategy: exponential
retryable_errors:
- azure_api_throttle
- stack_lock
- vault_timeout
- dns_unavailable
🧩 Sample Correction Log (JSON)¶
{
"trace_id": "trace-invoice-501",
"action": "AutoCorrectedMissingPrefix",
"field": "resource_prefix",
"value_applied": "cs-prod-frc-invoice",
"retry_attempt": 1,
"status": "provisioning_resumed"
}
🔔 Human Escalation Triggers¶
| Scenario | Action |
|---|---|
ProvisioningFailed after 3 retries |
Notify HumanOpsAgent and halt |
| Secrets mismatch or conflict | Raise to Security Engineer Agent |
| Stack exists but differs significantly | Emit ProvisioningDriftDetected (future) |
| Region unavailable | Emit RegionBlocked, suggest fallback_region to coordinator |
✅ Safe Idempotency Rules¶
- No resource is provisioned twice under the same
trace_id + stack_name pulumi previewmust matchpulumi upunless override approved- Stack diffs retained for drift comparison (hash-based memory key)
📡 Telemetry During Retry¶
All retries emit spans:
cloud.provision.retry.startcloud.provision.retry.successcloud.provision.retry.failed
Each includes retry_attempt, reason, and agent_origin.
🧠 Summary¶
The Cloud Provisioner Agent’s retry and correction flow ensures:
- 🔁 Safe auto-recovery from transient issues
- 🛑 Strict boundaries for policy-violating inputs
- 🧠 Intelligent corrections and default injection
- 📎 Full traceability and telemetry per retry step
- 👤 Escalation hooks when human input is required
This makes the provisioning lifecycle resilient, safe, and fully observable — critical for infrastructure integrity at scale.
🔗 Collaboration Interfaces¶
The Cloud Provisioner Agent acts as a mid-pipeline executor within ConnectSoft’s orchestrated cloud lifecycle. It does not operate in isolation — it:
- 🧱 Implements plans from architectural agents
- 🔐 Integrates security overlays (vaults, secrets)
- 📡 Emits outputs to DevOps, Observability, and Coordination layers
- 🛰️ Triggers downstream agents once infrastructure is ready
🤝 Directly Collaborating Agents¶
| Agent | Purpose |
|---|---|
| Cloud Architect Agent | Supplies cloud region maps, replication strategy, zone constraints |
| Infrastructure Architect Agent | Provides infra_plan, overlays, and environment resource models |
| Security Engineer Agent | Delivers secrets, vault references, RBAC overlays, and mount strategies |
| IaCCoordinator (or similar orchestrator) | Triggers agent, monitors result, receives ResourcesProvisioned |
| DevOps Engineer Agent | Consumes provisioned outputs (e.g., URIs, secrets, cluster names) for pipeline generation |
| HumanOps Agent | Reviews provisioning failures, applies overrides or approves previews |
| Observability Agent | Receives telemetry and output mappings for environment registration |
| API Gateway Agent (optional) | Consumes dns_record outputs for subdomain registration and gateway bindings |
🔁 Event-Based Collaboration¶
Emitted Events¶
| Event | Consumed By |
|---|---|
ResourcesProvisioned |
DevOps Agent, Observability Agent, Orchestrator |
ProvisioningFailed |
HumanOpsAgent, Orchestrator |
ProvisioningPreviewReady (if manual approval required) |
HumanOpsAgent |
ProvisioningDriftDetected (future) |
Orchestrator, Audit Agent |
🔀 Input Dependencies¶
From Cloud Architect Agent¶
From Security Engineer Agent¶
From Infrastructure Architect Agent¶
🔄 Output Recipients¶
→ DevOps Engineer Agent¶
aks_cluster_namekubeconfig_vault_refvault_uristorage_endpoint- Secrets mount bindings (JSON)
→ Observability Agent¶
- Cluster URI
- Log Analytics / Monitoring endpoints
- DNS + region tags for telemetry routing
→ IaCCoordinator¶
ResourcesProvisionedeventstack_pathstatus: success | failed- Deployment duration + span metadata
🧭 Collaboration Flow Diagram¶
flowchart TD
A[Cloud Architect Agent]
B[Security Engineer Agent]
C[Infrastructure Architect Agent]
D[IaCCoordinator]
E[Cloud Provisioner Agent]
F[DevOps Engineer Agent]
G[Observability Agent]
A --> E
B --> E
C --> E
D --> E
E --> F
E --> G
E --> D
💬 Interface Protocols¶
| Interface | Mode |
|---|---|
| Agent-to-Agent | File-based overlays or in-memory via orchestrator |
| Events | JSON payload over orchestrator event bus or webhook |
| Output Sharing | Git commit / Blob upload + event pointer |
| Secrets | Injected via secure vault overlay, never hardcoded |
🧠 Summary¶
The Cloud Provisioner Agent collaborates with:
- ☁️ Architects to receive plan and topology
- 🔐 Security to embed compliant secret flows
- 🧪 DevOps & QA agents to inject cloud runtime data
- 🛰️ Orchestration to emit status and trigger downstream automation
- 📊 Observability to register environments, regions, and telemetry contexts
It serves as the bridge between blueprint planning and actual cloud activation — in a highly modular, secure, event-driven way.
📡 Observability Hooks¶
The Cloud Provisioner Agent is a critical executor in the ConnectSoft platform. It must:
- 📊 Emit real-time provisioning status
- 🧾 Enable audit trail for infrastructure changes
- 🛰️ Feed orchestration and dashboard systems
- 🧠 Record metadata for security and compliance
🧭 OpenTelemetry Spans (Mandatory)¶
✅ Emitted Spans¶
| Span Name | Description |
|---|---|
cloud.provision.start |
When the provisioning run begins |
cloud.provision.preview |
When pulumi preview is executed |
cloud.provision.up |
When actual resource provisioning starts |
cloud.provision.success |
Emitted when provisioning is complete |
cloud.provision.failed |
Emitted if any stage fails or aborts |
📌 Span Tags¶
trace_id: trace-auth-789
execution_id: exec-auth-789
agent: cloud-provisioner-agent
stack: staging-westeurope
component: AuthService
environment: staging
region: westeurope
status: success | failed | skipped | drifted
resource_count: 6
📘 Structured Logs¶
Logged and optionally forwarded to:
- Azure Monitor
- Loki
- Centralized audit blob
Example JSON Log¶
{
"timestamp": "2025-05-08T12:59:20Z",
"trace_id": "trace-auth-789",
"agent": "cloud-provisioner-agent",
"component": "AuthService",
"event": "ProvisioningStarted",
"stack": "staging-westeurope",
"resource_plan": ["AKS", "KeyVault", "BlobStorage"]
}
📊 Metrics for Dashboards¶
| Metric | Description |
|---|---|
provision_duration_ms |
Total time from preview to success/fail |
provision_retry_count |
Retries per run |
provision_success_rate |
Rolling success percentage by region/environment |
resources_provisioned_total |
Number of resources successfully deployed |
stack_drift_detected_total |
(Future) Detected mismatches during preview |
📣 Lifecycle Events¶
✅ ResourcesProvisioned¶
{
"event": "ResourcesProvisioned",
"trace_id": "trace-auth-789",
"stack": "staging-westeurope",
"resource_count": 6,
"outputs": ["aks_cluster", "dns", "vault_uri"],
"status": "success"
}
❌ ProvisioningFailed¶
{
"event": "ProvisioningFailed",
"trace_id": "trace-auth-789",
"reason": "Vault reference missing",
"stage": "validation",
"status": "failed"
}
📂 Output Snapshot for Monitoring¶
| File | Description |
|---|---|
provisioning.log |
CLI + internal validation results |
outputs.json |
Contains URIs, IDs, and cloud handles |
stack-diff.json (optional) |
Preview vs. previous plan (for drift detection) |
Snapshots are versioned per run and tagged with
trace_id.
📈 Grafana Dashboard Modules (Example)¶
- Provisioning summary by environment
- Error rate by region
- Time to provision per component
- Daily stack count and status
- Retry frequency trends
🧩 Integration Points¶
| Target | Hook |
|---|---|
| Observability Agent | Receives events, spans, outputs, and metrics |
| HumanOps Agent | Subscribed to failure + preview-only events |
| Audit Layer | Reads provisioning logs and output hash |
| Orchestrator | Correlates execution result to coordination FSM or pipeline flow |
🔐 Compliance Metadata¶
All observability outputs must include:
agent_origin: cloud-provisioner-agent
trace_id: required
execution_id: required
environment: required
provisioning_type: automated
🧠 Summary¶
The Cloud Provisioner Agent’s observability hooks provide:
- 🛰️ Full lifecycle visibility from plan to provisioning
- 🧾 Structured logs and spans for real-time and historical audit
- 📊 Dashboard-friendly metrics for success/failure trends
- 📡 Event-based triggers for downstream automation and human review
It ensures cloud provisioning is traceable, secure, transparent, and analytics-ready — across all environments and tenants.
🎯 Human Intervention Hooks¶
While the Cloud Provisioner Agent is designed to operate autonomously, certain scenarios require manual oversight, including:
- 🔐 Security-sensitive resources
- 🌍 Production or multi-region environments
- 🛑 Preview failures or unexpected diffs
- 💸 High-cost provisioning operations
- 🧾 Compliance-driven approval checkpoints
These hooks ensure safe intervention, while preserving traceability and audit logs.
✅ Intervention Scenarios¶
| Scenario | Action Required |
|---|---|
auto_proceed = false in input |
Manual approval required after preview |
| Stack preview includes high-impact changes | Review and confirmation |
| Vault reference missing | Requires Security Engineer or HumanOps override |
| Region blocked or quota exceeded | Manual reassignment or delay |
| Retry limit reached | Escalate to HumanOpsAgent |
Explicit manual_approval_required: true in blueprint |
Always paused for approval |
👤 Supported Human Actions¶
| Action | Interface |
|---|---|
| Approve provisioning | Orchestrator UI, CLI (approve-stack --trace) |
| Reject provisioning | Same as above — emits ProvisioningRejected (planned) |
| Apply override to vault mount | Through HumanOpsAgent or Vault UI |
| Retry manually with override | CLI or UI-based re-invocation with override flag |
| Review preview and stack diff | Presented via dashboard or audit UI |
📝 Approval Gate Representation (YAML)¶
manual_approval:
required: true
approver_group: PlatformOps
reason: "New AKS cluster in production"
contact: "ops@connectsoft.io"
Used in environments or components flagged as high-impact or sensitive.
📣 Event-Driven Escalation¶
When approval is required:
{
"event": "ProvisioningPreviewReady",
"trace_id": "trace-auth-789",
"stack": "staging-westeurope",
"resource_plan": ["AKS", "KeyVault", "Blob"],
"preview_diff": "3 create, 1 replace",
"status": "awaiting_approval"
}
Received by:
- HumanOps Agent
- Orchestrator Dashboard
- Notifications Bot (Teams, Slack, Email)
💬 UI Elements & CLI Hooks¶
| Interface | Feature |
|---|---|
| Dashboard UI | Approve / Reject button, preview viewer, retry |
| CLI | cs-stack approve --trace trace-789 |
| Link to preview diff and action buttons | |
| Chatbot (planned) | Inline response to ProvisioningPreviewReady event |
🧾 Logged Interventions¶
Every human interaction is stored in:
{
"trace_id": "trace-auth-789",
"action": "manual_approval",
"approver": "alice.platformops",
"reason": "Approved new staging AKS stack",
"timestamp": "2025-05-08T13:05:21Z"
}
Audited by:
- Compliance engine
- Observability dashboards
- Risk review snapshots
🧠 Summary¶
The Cloud Provisioner Agent includes secure, auditable human intervention hooks to:
- 👤 Pause for approval when needed
- 🔐 Escalate policy conflicts (vault, region, secrets)
- 🔁 Allow override and retry flows
- 📎 Ensure all human actions are trace-bound and logged
This empowers ConnectSoft teams to balance autonomy with governance, especially in high-sensitivity environments.
✅ Summary¶
The Cloud Provisioner Agent is a core executor in ConnectSoft’s AI-driven software factory. It turns cloud architectural intent into real, secured, traceable cloud infrastructure, delivering environments ready for CI/CD, observability, and production-grade workloads.
🎯 Core Functions¶
- 📦 Render Pulumi stack files from orchestrated infrastructure plans
- ☁️ Provision Azure cloud resources (AKS, Key Vault, DNS, Blob, etc.)
- 🔐 Inject and map secrets securely across environments
- 🧾 Emit outputs including URIs, stack metadata, and provisioning logs
- 📡 Emit telemetry, events, and spans for full traceability
- 👤 Support human approval, intervention, and retry workflows
🧭 Supported Resources (Phase 1 - Azure)¶
| Resource | Examples |
|---|---|
| Compute | AKS Clusters |
| Storage | Azure Blob |
| Secrets | Azure Key Vault |
| DNS | Azure DNS zones and records |
| Monitoring | Azure Monitor / Log Analytics |
| Identity (future) | App Registrations, Managed Identity |
📚 Input Summary¶
trace_id,execution_id,blueprint_idinfra_plan.yamlcloud-region-map.yamlreplication-strategy.yamlsecrets-metadata.yamlenvironment overlays
📤 Output Summary¶
- Pulumi project and stack files
outputs.json,secret-bindings.json,provisioning.log- Event:
ResourcesProvisionedorProvisioningFailed - OpenTelemetry spans
- GitOps-compatible folder structure
🧠 Integration Summary¶
| Collaborator | Purpose |
|---|---|
| Cloud Architect Agent | Region, replication, topology plans |
| Infrastructure Architect Agent | Component-level infra plan |
| Security Engineer Agent | Vaults, RBAC, mount strategy |
| IaCCoordinator | Trigger and monitor execution |
| DevOps Engineer Agent | Uses emitted URIs and secrets |
| HumanOps Agent | Approves or overrides sensitive actions |
| Observability Agent | Ingests infra metadata for monitoring dashboards |
📈 Execution Flow Diagram¶
flowchart TD
subgraph Orchestration Layer
A[IaCCoordinator]
end
subgraph Architecture Inputs
B[Cloud Architect Agent]
C[Infrastructure Architect Agent]
D[Security Engineer Agent]
end
subgraph Agent
E[Cloud Provisioner Agent]
end
subgraph Outputs
F[Pulumi Stack Files]
G[Provisioning Events]
H[Output Metadata]
I[Provisioning Logs]
end
subgraph Downstream Consumers
J[DevOps Engineer Agent]
K[Observability Agent]
L[HumanOps Agent]
end
A --> E
B --> E
C --> E
D --> E
E --> F
E --> G
E --> H
E --> I
G --> J
H --> J
G --> K
G --> L
🧾 Final Takeaways¶
The Cloud Provisioner Agent enables:
- 🔁 Idempotent, versioned infrastructure provisioning
- ☁️ Region- and tenant-aware environment setup
- 🔐 Secure, policy-compliant secrets injection
- 📡 Audit-friendly logs, metrics, and spans
- 👤 Safe human oversight and governance
It is a cornerstone agent in ConnectSoft’s DevOps and infrastructure layer — ensuring the platform can scale autonomously, securely, and observably across cloud environments.