Skip to content

🧠 Cloud Provisioner Agent Specification

☁️ Core Purpose

The Cloud Provisioner Agent is the execution-layer enforcer of cloud resource provisioning within the ConnectSoft AI Software Factory. Its purpose is to translate cloud architecture plans into actual cloud infrastructure using Pulumi-based Infrastructure-as-Code (IaC).

It operates under orchestration (e.g., triggered by IaCCoordinator or EnvironmentSetupCoordinator) and receives high-level deployment topology, region selections, and environment overlays from the Cloud Architect Agent, along with secrets and compliance overlays from the Security Engineer Agent.

It ensures that all generated SaaS services, environments, and tenants have their required cloud foundation deployed, versioned, and trace-linked — across all supported regions and clouds.


🛠️ Role in the Platform

Layer Role
Architecture Implements cloud-region-map, topology, replication, and failover plans
Infrastructure Converts blueprints into deployed, running cloud resources
DevOps Enables pipelines and services to target real endpoints (AKS, Key Vault, etc.)
Security Respects and injects policy-enforced vaults, identity scopes, DNS constraints
Environment Execution Brings up cloud environments per tenant, region, edition, or component need

🌍 Phase Scope (Azure-First)

Provisioned Resource Azure Equivalent
Compute Cluster Azure Kubernetes Service (AKS)
Secret Store Azure Key Vault
Object Storage Azure Blob Storage
DNS Zones / Entries Azure DNS
Observability Azure Monitor / Log Analytics
Identity (future) Azure AD App Registrations, Managed Identities
Database Layer Azure PostgreSQL / Azure SQL (optional in Phase 1)

Each resource is:

  • Provisioned via Pulumi
  • Annotated with trace_id, blueprint_id, and provisioned_by: cloud-provisioner-agent
  • Tagged for environment, region, and compliance scope

📌 Strategic Alignment with Cloud Architect Agent

Cloud Architect Agent Defines Cloud Provisioner Agent Does
cloud-region-map.yaml Deploys to primary/secondary regions
replication-strategy.yaml Provisions redundant or geo-replicated resources
resource-compliance-tags Applies tags, identity scopes, and access limits
zone-mapping Provisions zonal or multi-zone clusters if required
dns-domain-map.yaml Allocates zone and creates records

🔗 Execution Trigger (Orchestrated)

Triggered by:

  • IaCCoordinator (e.g., on environment creation, blueprint activation, tenant onboarding)
  • Orchestration events like:

  • CreateCloudResourcesForTenant

  • ProvisionEditionLevelInfra
  • UpdateRegionCapacity

🧭 Platform Flow Placement

flowchart TD
    A[Cloud Architect Agent]
    B[Infrastructure Architect Agent]
    C[Security Engineer Agent]
    D[Cloud Provisioner Agent]
    E[DevOps Engineer Agent]
    F[Azure / Kubernetes Resources]

    A --> D
    B --> D
    C --> D
    D --> F
    D --> E
Hold "Alt" / "Option" to enable pan & zoom

🧩 Example Scenario

A new service is being deployed for Edition: EU-MultiTenant, requiring:

  • AKS cluster in westeurope
  • Azure DNS with *.edition.connectsoft.io
  • Azure Key Vault with injected secrets for runtime
  • Azure Blob for distributed file storage
  • Resource tags: env=staging, region=westeurope, edition=EU

Cloud Architect defines it → Cloud Provisioner Agent renders Pulumi → provisions resources → emits resource map → DevOps Agent consumes outputs.


✅ Summary

The Cloud Provisioner Agent:

  • ☁️ Turns infrastructure blueprints into real cloud infrastructure
  • 🔁 Ensures region- and tenant-specific environments are provisioned
  • 🔐 Integrates with secrets, DNS, identity, and observability layers
  • 🧭 Respects architectural directives, trace constraints, and cloud policy overlays
  • 📊 Emits trace-linked metadata for CI/CD, security, and observability agents

It is a foundational executor — bringing ConnectSoft’s autonomous SaaS deployment vision into physical cloud reality.


📌 Core Responsibilities Overview

The Cloud Provisioner Agent is responsible for materializing infrastructure blueprints into provisioned cloud resources across environments and tenants, starting with Azure.

It receives cloud architecture plans, security overlays, and service bindings, and converts them into IaC-managed cloud infrastructure using Pulumi.

Its deliverables are traceable, version-controlled, validated, and aligned with orchestration plans.


🔧 Detailed Responsibilities

✅ 1. Pulumi Stack Generation

  • Render complete stack files per environment, tenant, or blueprint scope:

  • Pulumi.yaml (project definition)

  • Pulumi.{stack}.yaml (stack config)
  • index.ts / main.cs (actual resource logic)
  • Structure output for GitOps and CI integration

✅ 2. Infrastructure Provisioning

  • Execute pulumi up to provision resources for:

  • AKS clusters

  • Azure Key Vault instances
  • Azure Blob Storage
  • Azure DNS zones and records
  • Azure Monitor / App Insights
  • Resource groups, virtual networks (if required)
  • Tag all resources with:
trace_id: trace-789
provisioned_by: cloud-provisioner-agent
environment: staging
edition: EU
region: westeurope

✅ 3. Multi-Region & Zonal Strategy Execution

  • Use cloud-region-map.yaml and replication-strategy.yaml to:

  • Deploy primary and failover resource groups

  • Configure geo-redundant storage or DNS
  • Provision zonal AKS clusters with SLA-specific constraints

✅ 4. Secrets and Vault Binding

  • Inject secrets provided by Security Engineer Agent into:

  • Key Vault creation and initial seeding

  • Pulumi secretsProvider block
  • Emit structured secret-bindings.json for downstream agents

✅ 5. Output & Endpoint Metadata Generation

  • Produce outputs consumed by:

  • DevOps pipelines (for endpoint injection)

  • API Gateways (for DNS mapping)
  • Observability Agent (for log collection and telemetry hooks)
  • Example:
{
  "aks_cluster_name": "aks-eu-staging-01",
  "key_vault_uri": "https://vault-eu-staging.vault.azure.net/",
  "storage_url": "https://blob-eu-staging.blob.core.windows.net",
  "dns_record": "auth.europe.connectsoft.io"
}

✅ 6. Provisioning Validation & Status Emission

  • After each deployment:

  • Run pulumi preview and compare before/after diffs

  • Emit:

    • ResourcesProvisioned
    • ProvisioningFailed (if any step fails)
    • Full provisioning log and event metadata
    • Store deployment snapshot (provisioning-output.json) for trace audit

✅ 7. Traceability & Versioning

  • All outputs must include:

  • trace_id

  • execution_id
  • blueprint_id
  • stack_id
  • agent_origin: cloud-provisioner-agent
  • Commit stack files to Git (optional) under:
/infrastructure/stacks/{component}/{region}/{env}/

✅ 8. Collaborative Feedback Loop

  • Respond to signals from:

  • Cloud Architect Agent — topology, region, SLA constraints

  • Security Engineer Agent — vaults, policy blocks, RBAC scopes
  • Infrastructure Architect Agent — service/infra overlays
  • Provide infrastructure URIs and secrets back to:

  • DevOps Engineer Agent

  • Observability Agent
  • Platform Coordinator (if applicable)

📦 Summary of Primary Deliverables

Artifact Description
Pulumi.yaml IaC project manifest
Pulumi.dev.yaml Config file per environment
index.ts / main.cs Stack logic with resources
outputs.json Emitted resource map
provisioning.log CLI summary of provisioning run
provisioned-resources.json Full list of resource URIs, tags, metadata
secret-bindings.json Used by DevOps and security agents
ResourcesProvisioned event Traceable success notification

✅ Summary

The Cloud Provisioner Agent is the infrastructure enabler of ConnectSoft’s AI Software Factory:

  • 📦 Generates cloud-native, GitOps-ready IaC artifacts
  • ☁️ Provisions Azure infrastructure with trace-linked metadata
  • 🔐 Injects secrets, applies policies, and enforces topology plans
  • 🔁 Feeds downstream agents with reliable, deployable endpoints

It bridges blueprint design with real cloud execution — securely and autonomously.


📥 Inputs

The Cloud Provisioner Agent requires a well-defined set of inputs from upstream agents and orchestration logic. These inputs guide:

  • 📦 What to provision
  • 🌍 Where to provision (region, zone, environment)
  • 🔐 With what policies, secrets, naming conventions, and blueprint scope

All inputs are trace-bound, environment-aware, and cloud-specific.


🔑 Primary Inputs

1️⃣ cloud-region-map.yaml

Provided by: Cloud Architect Agent

primary_region: westeurope
secondary_region: northeurope
zones:
  - 1
  - 2

Used to define target regions and zonal requirements.


2️⃣ replication-strategy.yaml

Provided by: Cloud Architect Agent

replication:
  storage: geo-redundant
  dns: failover
  observability: dual-region

Determines whether to deploy mirrored resources across regions or zones.


3️⃣ environment-overlay.yaml

Provided by: Infrastructure Architect Agent

environment: staging
resource_prefix: cs-stg
resource_group: cs-stg-infra
tags:
  env: staging
  edition: EU
  trace_id: trace-789

Used to inject naming, tagging, and grouping conventions.


4️⃣ component-scope.yaml

Provided by: Orchestration (e.g., IaCCoordinator)

component: AuthService
execution_id: exec-auth-789
trace_id: trace-auth-789
blueprint_id: blueprint-auth-multi

Connects infrastructure to the service’s lifecycle and audit trail.


5️⃣ infra-plan.yaml

Can be composed from blueprint fragments or pre-assembled.

resources:
  - type: aks
    size: standard
    node_count: 3
    k8s_version: 1.28
  - type: keyvault
    policy: app-only
  - type: storage
    tier: standard
  - type: dns
    fqdn: auth.europe.connectsoft.io

Describes the desired resource topology.


6️⃣ secrets-metadata.yaml

Provided by: Security Engineer Agent

secrets:
  - name: AUTH_SECRET
    value_from: vault
    vault_ref: authservice-app-secret
    mount_strategy: env

Used to seed Azure Key Vault and inform DevOps pipeline injection.


7️⃣ resource-constraints.yaml (Optional)

Can come from orchestration or platform governance policies.

quotas:
  max_aks_clusters: 3
  max_nodes_per_cluster: 5
require_tags:
  - trace_id
  - blueprint_id

🧠 Internal Contextual Inputs (Resolved by Agent or Environment)

Field Description
cloud_provider Default: azure (future: aws, gcp)
pulumi_project Derived from component_name + region
pulumi_stack_name e.g., staging-eu-auth
resource_prefix e.g., cs-stg-auth

🧪 Example Consolidated Input Context

trace_id: trace-auth-789
execution_id: exec-auth-789
component: AuthService
cloud_provider: azure
region: westeurope
environment: staging
resource_prefix: cs-stg-auth
infra_plan:
  aks: true
  keyvault: true
  dns: auth.europe.connectsoft.io
  storage: standard
secrets:
  - vault_ref: authservice-secret

📎 Input Validation Checklist

Input Validation
trace_id, execution_id ✅ Required
primary_region, resource_prefix ✅ Required
infra_plan ✅ Must define at least 1 resource
secrets ✅ Must map to valid vault refs
replication 🟡 Optional, fallback to single-region mode
environment ✅ Used for naming, tags, and stack configs

🧠 Summary

The Cloud Provisioner Agent consumes a composite input model, made up of:

  • Architecture inputs (regions, replication, DNS)
  • Security overlays (vaults, mount strategies)
  • Environment config (tags, naming, resource groups)
  • Blueprint links (trace ID, component scope)

This enables the agent to deterministically generate and provision compliant, observable, cloud infrastructure.


📤 Output

The Cloud Provisioner Agent emits structured, traceable outputs in the form of:

  • Pulumi stack files and project artifacts
  • 📁 Provisioning logs and resource outputs
  • 🔐 Secrets injection metadata
  • 📡 Events and telemetry spans for orchestration and observability
  • 💾 Cloud resource metadata for downstream agents (DevOps, Observability, Security)

✅ Primary Output Artifacts

1️⃣ Pulumi Stack Files

File Description
Pulumi.yaml Pulumi project definition (name, runtime, description)
Pulumi.{stack}.yaml Stack configuration file (region, secrets provider, tags)
index.ts / main.cs Program logic to provision resources
stack-output.json Output map of provisioned endpoints, IDs, URIs
provisioning.log CLI logs from pulumi up or preview

All files are committed (or staged) in Git at: /infrastructure/stacks/{component}/{env}/{region}/


2️⃣ Cloud Resource Output Map

Emitted after successful provisioning:

{
  "aks_cluster_name": "cs-stg-auth-aks01",
  "aks_kubeconfig_secret": "vault://auth-kubeconfig",
  "dns_record": "auth.eu.connectsoft.io",
  "key_vault_uri": "https://vault-auth-stg.vault.azure.net/",
  "storage_url": "https://csstgautheustorage.blob.core.windows.net"
}

Used by:

  • DevOps Engineer Agent (for pipeline/environment injection)
  • API Gateway Agent (for DNS mapping)
  • Observability Agent (for logs/metrics setup)

3️⃣ Secrets Metadata Output

{
  "secrets":
  [
    {
      "name": "AUTH_SECRET",
      "source": "azure-keyvault",
      "vault_uri": "https://vault-auth-stg.vault.azure.net/",
      "mount_strategy": "env"
    }
  ]
}

Shared with:

  • DevOps Agent for CI/CD env binding
  • Security Agent for validation
  • Test Agent for runtime test secrets (if allowed)

4️⃣ Deployment Event

ResourcesProvisioned
{
  "event": "ResourcesProvisioned",
  "trace_id": "trace-auth-789",
  "component": "AuthService",
  "execution_id": "exec-auth-789",
  "environment": "staging",
  "region": "westeurope",
  "stack_path": "infrastructure/stacks/authservice/staging/westeurope",
  "resource_count": 6,
  "timestamp": "2025-05-08T10:00:22Z"
}

Consumed by:

  • Orchestration layer
  • DevOps Agent
  • Dashboards / Observability Agent

5️⃣ Pulumi Output Metadata (YAML / JSON)

Saved for audit trail:

{
  "project": "authservice-stack",
  "stack": "staging-westeurope",
  "provisioned_by": "cloud-provisioner-agent",
  "resources": [
    { "type": "azure:containerservice:ManagedCluster", "name": "aks-auth-eu" },
    { "type": "azure:storage:BlobContainer", "name": "authfiles" }
  ]
}

📦 Optional Outputs (Based on Context)

Output Condition
manual-approval.yaml If sensitive resource or environment requires pre-deploy approval
rollback-plan.json If preview identifies drift and fallback is enabled
dns-map.yaml If multiple endpoints / subdomains need to be shared with API Gateway Agent

🧠 Metadata Embedded in All Outputs

All output artifacts include:

trace_id: trace-auth-789
execution_id: exec-auth-789
blueprint_id: blueprint-auth-multi
agent_origin: cloud-provisioner-agent
environment: staging
region: westeurope

📂 File Structure Convention (GitOps-Compatible)

/infrastructure/stacks/
  └── authservice/
      └── staging/
          └── westeurope/
              ├── Pulumi.yaml
              ├── Pulumi.staging-westeurope.yaml
              ├── index.ts
              ├── provisioning.log
              ├── outputs.json
              └── secrets.json

✅ Summary

The Cloud Provisioner Agent outputs:

  • 📦 All required Pulumi IaC assets
  • 📡 Rich cloud output metadata
  • 🔐 Fully structured secrets and mount plans
  • 🧠 Traceable provisioning logs and telemetry
  • 🛰️ Events and files consumed by downstream agents

Its outputs are the bridge between abstract cloud design and concrete deployable infrastructure.


📚 Knowledge Base

The Cloud Provisioner Agent has access to a versioned, cloud-specific knowledge base composed of:

  • Reusable Pulumi module templates
  • Region, naming, and topology policies
  • Compliance and tagging requirements
  • Deployment patterns per resource type
  • Cloud-specific constraints and best practices
  • Secret and identity handling strategies

This knowledge is used to generate secure, consistent, policy-aligned cloud infrastructure.


🧱 Core Knowledge Categories

1️⃣ Pulumi Template Modules

Resource Type Pulumi Module
AKS modules/azure/aks-cluster.ts
Key Vault modules/azure/keyvault.ts
Blob Storage modules/azure/blob-storage.ts
DNS Zones & Records modules/azure/dns-record.ts
Log Analytics modules/azure/monitor-insights.ts
Virtual Network (Optional) modules/azure/vnet.ts

Each template:

  • Supports parameter overrides
  • Uses trace- and environment-aware naming
  • Emits outputs to the final Pulumi stack

2️⃣ Naming Convention Rules

naming:
  resource_prefix: cs
  format: "{prefix}-{env}-{region}-{component}"
  allowed_chars: [a-z0-9-]
  max_length: 63

Used to auto-resolve cloud resource names (e.g., cs-stg-weu-auth-aks)


3️⃣ Environment & Region Topology Definitions

regions:
  staging:
    default: westeurope
    failover: northeurope
    zones: [1,2]
  production:
    default: francecentral
    failover: westeurope

Guides region-aware provisioning and DNS/FQDN mappings.


4️⃣ Cloud Resource Classifiers

Class Rule
stateful Require backup or replication (e.g., blob storage)
sensitive Must be deployed with key vault + tag compliance=high
ephemeral Skippable during rollback or low-priority teardown

Informs retry logic, vault strategy, and disaster recovery scope.


5️⃣ Secrets Mounting Strategies

mount_strategies:
  - env
  - volume
  - vault-agent-sidecar

Policy determines which strategy to apply by environment or resource type.


6️⃣ Tagging Policy Rules

All resources must be tagged with:

tags:
  trace_id: <REQUIRED>
  blueprint_id: <REQUIRED>
  environment: <REQUIRED>
  provisioned_by: cloud-provisioner-agent

Optionally:

tenant_id: T-134
edition: EU-MultiTenant
compliance_level: high

7️⃣ Common Output Resolvers

Used to emit resource-specific outputs:

return {
  aks_cluster_name: aks.name,
  dns_record: dnsRecord.fqdn,
  vault_uri: keyVault.vaultUri,
  storage_url: storageAccount.primaryBlobEndpoint,
};

8️⃣ Blueprint Resource Map Inference

Pre-trained LLM model (or lookup index) for:

Blueprint Use Case Expected Infra
auth-service AKS, DNS, KV, Storage
report-generator KV, Blob, LogAnalytics
tenant-onboarding DNS zone, Storage, Key Vault, Managed Identity

Used to auto-expand from use case → provisioning requirements when infra_plan is partial.


📦 Knowledge Source Locations

Asset Type Location
Templates infrastructure/modules/azure/
Naming/Tagging Rules infrastructure/policies/naming.yaml
Region Topology cloud-region-map.yaml
Secrets Policy security/overlay-vault.yaml
Blueprints to Infra knowledge/infra-map-index.json

🧠 Summary

The Cloud Provisioner Agent leverages a rich and versioned knowledge base that includes:

  • 🧱 Pulumi templates
  • 🧭 Region-aware deployment rules
  • 🔐 Secrets and vault strategies
  • 📛 Naming and tagging policies
  • 📦 Use-case to infrastructure mapping

This allows it to provision secure, consistent, and trace-aligned cloud environments — fully autonomously.


🔄 Process Flow

flowchart TD
    A[Receive Input & Trace Metadata] --> B[Load Infra Plan + Region Map + Vault Info]
    B --> C[Resolve Templates + Merge Overlays]
    C --> D[Generate Pulumi Stack Files]
    D --> E[Run `pulumi preview` (validate)]
    E --> F{Approved or Auto-Proceed?}
    F -- Yes --> G[Run `pulumi up` (provision)]
    F -- No --> H[Emit PreviewOnly + Await Approval]
    G --> I[Verify Resources + Outputs]
    I --> J[Emit Outputs, Logs, and Event: ResourcesProvisioned]
Hold "Alt" / "Option" to enable pan & zoom

🪜 Detailed Step Breakdown

✅ Step 1: Receive Execution Context

Triggered by orchestration (e.g., IaCCoordinator), receives:

  • trace_id, blueprint_id, execution_id
  • infra_plan.yaml
  • cloud-region-map.yaml
  • secrets.yaml

Ensures every run is traceable and bounded to a blueprint scope.


✅ Step 2: Load Knowledge & Merge Overlays

  • Load:

  • Region topology and resource plan

  • Naming/tagging policy
  • Secrets strategy
  • Merge:

  • Environment overlays (resource_group, tags, replication)


✅ Step 3: Template Rendering

  • Select Pulumi templates (from library)
  • Generate:

  • Pulumi.yaml (project)

  • Pulumi.{stack}.yaml (stack config)
  • index.ts / main.cs (logic with parameters)

✅ Step 4: Preview Mode (Validation)

  • Run pulumi preview
  • If success:

  • Proceed or emit ProvisioningPreviewReady event

  • If failure:

  • Emit ProvisioningFailed

  • Optionally retry with fallback region or settings

✅ Step 5: Execution (Provisioning)

If approved or auto_proceed = true:

  • Run pulumi up
  • Track:

  • Resource count

  • Changed vs created
  • Outputs emitted
  • Capture full provisioning log

✅ Step 6: Post-Provisioning Output

  • Resolve:

  • Resource URIs (AKS, DNS, Key Vault, Blob, etc.)

  • DNS records
  • Vault URIs and bound secrets
  • Validate:

  • Resource tags

  • Trace metadata
  • Naming format

✅ Step 7: Emit Artifacts

Output to Git or blob:

  • stack-output.json
  • Pulumi.yaml, Pulumi.stack.yaml
  • provisioning.log
  • secret-bindings.json

Emit:

  • ResourcesProvisioned event
  • OTEL spans
  • Infra snapshot to DevOps and Observability agents

🧠 Embedded Trace Metadata in All Steps

Every file, log, and emitted event includes:

trace_id: trace-auth-789
blueprint_id: blueprint-auth-multi
execution_id: exec-auth-789
agent_origin: cloud-provisioner-agent
region: westeurope
environment: staging

🔁 Alternate Flows

Condition Alternate Step
Preview failed Retry with fallback region (if allowed)
Vault unreachable Emit soft failure, mark secrets as pending
Manual approval required Emit preview-only event and await signal
Output differs from last Emit ProvisioningDriftDetected (future)

🧠 Summary

The Cloud Provisioner Agent’s process flow ensures:

  • 🛠️ Deterministic rendering from blueprint + region
  • ☁️ Secure and compliant provisioning via Pulumi
  • 📡 Observable and versioned outputs
  • 🔁 Safe retry and approval paths built into the flow

It is modular, policy-bound, traceable, and ready for multi-cloud scaling.


🧩 Skills and Kernel Functions

The Cloud Provisioner Agent uses a combination of:

  • 📚 Semantic Kernel (SK) Skills — composable functions for planning, transformation, naming, and template expansion
  • ⚙️ Domain-specific Pulumi SDK bindings — for real-time provisioning logic
  • 🔁 Agent-local orchestration logic — to manage flows, validate diffs, and emit outputs

All skills operate with full trace context and are injected into an execution plan derived from blueprint and environment state.


🔧 Core Semantic Kernel Skills

Skill Purpose
ResourcePlanResolverSkill Merges infra_plan.yaml + region_map.yaml + overlays
PulumiTemplateSelectorSkill Chooses appropriate stack templates (AKS, KV, Blob, etc.)
NamingResolverSkill Computes compliant resource names (tagged, traceable, cloud-safe)
SecretsInjectionSkill Maps vault references to initial seed values in Pulumi
PulumiRendererSkill Composes and writes Pulumi.yaml, stack files, and index.ts
ProvisioningPreviewSkill Runs pulumi preview and captures diff / output
PulumiExecutorSkill Executes pulumi up (if approved or auto-proceed)
OutputFormatterSkill Extracts key URIs, secrets, and resource identifiers
EventEmitterSkill Emits ResourcesProvisioned, ProvisioningFailed, etc.
TelemetryTracerSkill Injects span metadata, logs, and OTEL context during execution

🧠 AI-Augmented Kernel Functions

These skills may use LLM reasoning for complex planning or decision assistance:

Function Description
StackNamingPlanner Suggests short, region-safe resource names (63-char limit, lowercase, etc.)
TopologyExpander Infers additional resources from partial plans (e.g., DNS implied by AKS)
RegionFallbackAdvisor Suggests next best region if primary is unavailable or quota-exceeded
PolicyComplianceChecker Detects drift or missing tags in planned output before provisioning

🔁 Execution Plan Sample (SK-Compatible)

steps:
  - use ResourcePlanResolverSkill
  - use PulumiTemplateSelectorSkill
  - use NamingResolverSkill
  - use PulumiRendererSkill
  - use ProvisioningPreviewSkill
  - if approved:
      - use PulumiExecutorSkill
      - use OutputFormatterSkill
      - use EventEmitterSkill(ResourcesProvisioned)
  - else:
      - use EventEmitterSkill(ProvisioningPreviewReady)

🔐 Policy + Secret Skill Integrations

  • All secrets must flow through SecretsInjectionSkill
  • If vault_strategy = vault-agent-sidecar, inject sidecar.yaml template into index.ts
  • Role assignments (e.g., to services or pipelines) handled via AccessPolicyComposerSkill (planned)

📦 Skill Library Versions & Strategy

Source Versioning Strategy
templates/pulumi/ SemVer + cloud-provider scope
skills/ SK plugin folders with unit-tested prompt wrapping
common/overlays/ YAML-driven, updated per environment baseline

🧠 Summary

The Cloud Provisioner Agent orchestrates its infrastructure logic using:

  • 📚 Semantic Kernel skills to render and validate IaC
  • 🤖 LLM-enhanced functions to reason about naming, region fallback, and policy compliance
  • 🔁 Composable execution plans for traceable, safe, auditable provisioning

It ensures infrastructure is provisioned modularly, predictably, and context-aware, at any scale.


🧰 Core Technology Stack

Layer Technology Purpose
Infrastructure as Code Pulumi (TypeScript/.NET SDK) Declarative cloud provisioning
Cloud Provider (Phase 1) Azure Target for AKS, Key Vault, Storage, DNS
Agent Execution Runtime .NET 8 + Semantic Kernel Agent host + skills engine
LLM/AI Reasoning Azure OpenAI (GPT-4 Turbo) Stack expansion, naming, fallback logic
Observability OpenTelemetry SDK Spans: cloud.provision.start, provision.success, provision.failed
Orchestration Interface Internal Orchestrator API / Coordinator Triggered via IaCCoordinator or env setup FSM
CLI Execution pulumi preview, pulumi up, pulumi destroy Infrastructure deployment lifecycle

☁️ Pulumi Configuration Standards

  • Project root: /infrastructure/stacks/{component}/{env}/{region}/
  • Runtime: Default TypeScript (later supports .NET Pulumi)
  • Secrets provider: Azure Key Vault (pulumi config set --secret ...)
  • Stack naming convention: {env}-{region}-{component} (e.g., staging-westeurope-auth)

🌐 Azure Services Targeted

Resource Pulumi Module Used
AKS Cluster @pulumi/azure-native.containerservice.ManagedCluster
Key Vault @pulumi/azure-native.keyvault.Vault
DNS Zone & Record @pulumi/azure-native.network.DnsZone
Blob Storage @pulumi/azure-native.storage.StorageAccount
Log Analytics @pulumi/azure-native.insights.*
Virtual Network (optional) @pulumi/azure-native.network.VirtualNetwork

All modules used via centralized module templates (modules/azure/*.ts).


📦 GitOps & Storage Integration

Use Tool
IaC Versioning Git repo under /infrastructure/stacks/...
Logs + output files Azure Blob Storage (trace-tagged)
Secret injection Azure Key Vault (shared or per-service)
Snapshot storage outputs.json, provisioning.log uploaded to blob/archive path

🔐 Security & Secrets Management

Use Case Tech
Vault Seeding Pulumi secret config + ARM access policy
Sidecar Injection Agent-side template renders vault-agent manifest (if required)
Identity Binding (future) Managed Identity + App Registration support

All secrets flow through SecretsInjectionSkill and conform to overlay policy.


📊 Observability Stack

Tool Role
OpenTelemetry Emitted spans (start, preview, success/fail) with trace_id, execution_id
Grafana Dashboards Metrics visualization (resource count, duration, error rate)
Azure Monitor Logs from provisioning runs, validation failures
Structured Logs Emitted via ILogger → forwarded to blob, Azure Monitor, or Loki

🔁 Agent Trigger & Integration

Interface Trigger
Orchestrator API POST /provision/stack
Git hook PR or branch with infra-plan.yaml + trace_id
Manual CLI dotnet run provision --trace trace-auth-789

🧪 Validation Tools

Check Tool
Template lint tslint, eslint
Pulumi preview pulumi preview --diff --stack ...
Cloud quota check (planned) Azure Resource Graph query via SDK
Stack diff validation Hash-based output delta + drift detection

🧠 Summary

The Cloud Provisioner Agent’s tech stack enables:

  • 🔁 Automated, GitOps-friendly infrastructure provisioning
  • ☁️ Azure-native cloud resources delivered via Pulumi
  • 📡 Observable, trace-tagged provisioning lifecycle
  • 🔐 Secure secret and environment injection
  • 🤖 AI-assisted planning, naming, and recovery

This stack ensures modular, reproducible, multi-environment infrastructure delivery with minimal manual ops.


📜 System Prompt

The System Prompt is a persistent LLM instruction that ensures the Cloud Provisioner Agent:

  • Operates securely and deterministically
  • Applies cloud infrastructure best practices
  • Honors blueprint-level traceability and naming rules
  • Executes provisioning within policy and region constraints
  • Emits complete, structured outputs for downstream automation

It is injected on agent startup and used across Semantic Kernel planning, skills, and function chains.


📋 System Prompt (Full Text)

You are the Cloud Provisioner Agent in the ConnectSoft AI Software Factory.

Your responsibility is to generate and provision secure, traceable, environment-specific cloud infrastructure using Infrastructure-as-Code (IaC) — primarily Pulumi for Azure.

You consume structured inputs including:
- Blueprint ID, Trace ID, Execution ID (for traceability)
- Infra plan YAML defining AKS, Key Vault, Storage, DNS, etc.
- Region overlays and zone definitions (primary, failover, SLA)
- Security overlays including secrets and vault mappings
- Environment overlays such as resource groups and tags

You must:
- Select and render the appropriate Pulumi stack templates
- Merge overlays for environment, region, and secrets
- Apply naming conventions and tagging policies
- Validate the output using `pulumi preview`
- Provision cloud infrastructure only if preview is successful or auto-approve is true
- Emit outputs including stack files, provisioning log, and a full map of resource URIs and secrets
- Tag all resources with `trace_id`, `blueprint_id`, `environment`, and `agent_origin: cloud-provisioner-agent`

You must not:
- Provision anything if `trace_id` or `blueprint_id` is missing
- Use wildcard names, untagged resources, or unvalidated secrets
- Guess resource plans if `infra_plan` is missing (unless instructed by blueprint or default expansion rules)

All output must be deterministic, compliant with cloud-specific rules, and versioned for reproducibility.

Emit the `ResourcesProvisioned` event only if the stack completes successfully. Otherwise, emit `ProvisioningFailed` with diagnostics.

✅ Scope Imposed by Prompt

Category Constraint
Provisioning Requires explicit trace, blueprint, and region
Secrets Must match provided secrets-metadata.yaml or vault overlay
Naming Uses enforced convention: {prefix}-{env}-{region}-{component}
Retry Allowed only after preview or explicit fallback instruction
Emission Event + structured output required for downstream agents

🔐 Compliance Notes

  • Resource tags enforced at provision time
  • Regions validated against cloud-region-map.yaml
  • Secret mount strategy must be explicit (env, volume, or sidecar)
  • Naming must avoid uppercase, special characters, or disallowed suffixes

📦 Output Obligations per Prompt

  • ✅ Pulumi stack files (YAML + TS)
  • ✅ Resource output map (e.g., outputs.json)
  • provisioning.log
  • ✅ Secrets binding file (secret-bindings.json)
  • ResourcesProvisioned or ProvisioningFailed event

🧠 Summary

The System Prompt ensures that the Cloud Provisioner Agent:

  • 🧱 Renders reproducible, secure infrastructure
  • 🔁 Follows environment-specific overlays and naming
  • ☁️ Provisions only after successful preview
  • 📡 Emits traceable events and outputs
  • 🔒 Never bypasses security or compliance gates

It defines the agent as a safe, deterministic, cloud-native executor for all provisioned environments.


📥 Input Prompt Template

This template defines the structured YAML or JSON input passed to the Cloud Provisioner Agent when invoked by:

  • The orchestration layer (e.g., IaCCoordinator)
  • Environment setup workflows
  • Manual invocations during sandbox or testing flows

It encapsulates all context needed to plan, generate, and provision the cloud resources for a specific service, region, and environment.


📋 YAML Input Prompt Template

trace_id: trace-auth-789
execution_id: exec-auth-789
blueprint_id: blueprint-auth-multi
component: AuthService
agent_origin: orchestrator

cloud_provider: azure
environment: staging
region: westeurope
resource_group: cs-stg-rg-auth
resource_prefix: cs-stg-auth
auto_proceed: true

infra_plan:
  aks:
    node_count: 3
    k8s_version: 1.28
  keyvault:
    policy: app-only
  blob_storage:
    tier: standard
  dns:
    fqdn: auth.europe.connectsoft.io

replication:
  storage: geo-redundant
  dns: failover

secrets:
  - name: AUTH_SECRET
    vault_ref: authservice-app-secret
    mount_strategy: env

tags:
  env: staging
  edition: EU
  provisioned_by: cloud-provisioner-agent

✅ Required Fields

Field Purpose
trace_id, execution_id, blueprint_id Ensures full traceability
component, environment, region Determines scope and naming
infra_plan Describes what resources to provision
secrets[] Vault references and mount strategy
tags Mandatory for all provisioned resources
auto_proceed Allows provisioning without human approval after preview

🧪 Example JSON (API-Compatible Format)

{
  "trace_id": "trace-invoice-500",
  "execution_id": "exec-invoice-500-b",
  "component": "InvoiceService",
  "blueprint_id": "invoice-platform-v1",
  "environment": "production",
  "region": "francecentral",
  "cloud_provider": "azure",
  "auto_proceed": false,
  "infra_plan": {
    "dns": {
      "fqdn": "invoices.connectsoft.io"
    },
    "keyvault": {
      "policy": "rbac"
    },
    "blob_storage": {
      "tier": "hot"
    }
  },
  "secrets": [
    {
      "name": "INVOICE_SECRET",
      "vault_ref": "invoice-token",
      "mount_strategy": "vault-agent-sidecar"
    }
  ],
  "tags": {
    "env": "production",
    "trace_id": "trace-invoice-500"
  }
}

🔐 Input Validation Checklist

Field Required? Notes
trace_id Mandatory — blocks execution if missing
infra_plan At least one resource must be specified
secrets.vault_ref Must match known vault alias or key
region Must be present in cloud-region-map.yaml
resource_prefix 🟡 Auto-resolved if not present
auto_proceed 🟡 Defaults to false (manual approval)

📦 Optional Extended Fields

Field Description
dns_map_override Override default DNS entries or inject zone hints
manual_approval_required Forces wait state after preview
cost_estimate_required Requests cost projection before execution
fallback_region Allows automatic retry if region unavailable

🧠 Summary

The Input Prompt Template enables the Cloud Provisioner Agent to:

  • 📦 Receive infrastructure plans in a structured, deterministic way
  • 🌍 Provision based on region, environment, and component scope
  • 🔐 Respect secret policies and vault constraints
  • 📊 Emit traceable outputs and telemetry linked to blueprint lineage

It guarantees that every provisioning run starts with a complete, auditable definition — with no guessing, and full policy alignment.


📤 Output Expectations

Every provisioning operation produces:

  • 🧱 Infrastructure-as-Code artifacts (Pulumi stack files)
  • 📄 Cloud resource output maps
  • 📊 Provisioning logs
  • 📡 Telemetry spans and events
  • 🧩 Secrets binding metadata
  • 💾 Snapshot files for GitOps and audit

These outputs are fully traceable to their blueprint and environment scope.


✅ Output Artifacts

1️⃣ Pulumi Stack Files

File Purpose
Pulumi.yaml Defines the project and Pulumi runtime
Pulumi.{stack}.yaml Contains config values, tags, and stack settings
index.ts or main.cs Infrastructure logic (calls to Pulumi SDK)
Pulumi.lock.yaml Pin versions of dependencies
outputs.json Key URIs, resource IDs, endpoints, DNS, storage URLs

All files are saved in:

/infrastructure/stacks/{component}/{env}/{region}/

2️⃣ Provisioning Log

[pulumi] Preview succeeded: 4 to create, 0 to change, 0 to delete
[pulumi] Updated resources:
 - azure-native:containerservice:ManagedCluster
 - azure-native:storage:BlobContainer
...

Saved as provisioning.log for debugging and audit purposes.


3️⃣ Resource Output Map (Structured JSON)

{
  "aks_cluster_name": "aks-stg-auth",
  "kubeconfig_vault_key": "vault://auth-kubeconfig",
  "dns_record": "auth.stg.connectsoft.io",
  "vault_uri": "https://vault-auth-stg.vault.azure.net",
  "blob_url": "https://cs-stg-auth.blob.core.windows.net"
}

Used by:

  • DevOps Engineer Agent
  • Observability Agent
  • API Gateway Agent (DNS setup)

4️⃣ Secrets Metadata Output

{
  "secrets": [
    {
      "name": "AUTH_SECRET",
      "vault_uri": "https://vault-auth-stg.vault.azure.net/",
      "mount_strategy": "env"
    }
  ]
}

Sent to:

  • DevOps pipelines (for runtime binding)
  • QA/Test Agent (if permitted)

5️⃣ Provisioning Events

ResourcesProvisioned
{
  "event": "ResourcesProvisioned",
  "trace_id": "trace-auth-789",
  "component": "AuthService",
  "stack": "staging-westeurope",
  "resource_count": 5,
  "region": "westeurope",
  "status": "success",
  "timestamp": "2025-05-08T12:35:42Z"
}

Sent to Orchestration, Audit Log, DevOps Agent, and dashboards.

ProvisioningFailed
{
  "event": "ProvisioningFailed",
  "reason": "AKS quota exceeded in region",
  "trace_id": "trace-auth-789",
  "stack": "staging-westeurope"
}

6️⃣ Trace & Audit Metadata

Included in all outputs:

trace_id: trace-auth-789
execution_id: exec-auth-789
blueprint_id: blueprint-auth-multi
agent_origin: cloud-provisioner-agent
region: westeurope
environment: staging

7️⃣ Optional Outputs

Output When Produced
rollback-plan.json If provisioning partially succeeds
dns-map.yaml If multiple DNS records are generated
manual-approval.yaml If agent requires confirmation before deploy
stack-diff.json When provisioning results deviate from preview or prior state

📂 GitOps-Compatible File Structure

/infrastructure/stacks/
  └── authservice/
      └── staging/
          └── westeurope/
              ├── Pulumi.yaml
              ├── Pulumi.staging-westeurope.yaml
              ├── index.ts
              ├── outputs.json
              ├── provisioning.log
              ├── secret-bindings.json
              └── provisioning-event.json

📊 Telemetry Expectations

Emits OpenTelemetry spans:

Span Name Trigger
cloud.provision.start When agent begins provisioning
cloud.provision.success After successful pulumi up
cloud.provision.failed On failure, with reason + trace_id
cloud.provision.preview When preview is executed (even if not applied)

🧠 Summary

The Cloud Provisioner Agent produces:

  • 📦 Pulumi IaC artifacts (stack, project, config)
  • 🌍 Cloud resource metadata (URIs, IDs, secrets)
  • 📡 Events and telemetry for orchestration and observability
  • 📁 Audit-ready logs and GitOps-compatible outputs
  • 🔐 Secrets bindings for runtime injection

All outputs are trace-labeled, versionable, and consumable by downstream agents.


🧠 Memory

Memory enables the Cloud Provisioner Agent to:

  • 📎 Maintain links between blueprints, environments, and actual provisioned resources
  • 🔁 Avoid unnecessary reprovisioning (idempotency and state caching)
  • 🔍 Detect drift or stack differences during preview
  • 📊 Retain secrets metadata, resource tags, output maps, and history
  • ⏮️ Enable re-entrancy in partially failed operations or rollback scenarios

🧠 Short-Term Memory (Execution Scope)

Stored in Semantic Kernel context dictionary or in ephemeral runtime cache.

Key Purpose
trace_id, execution_id Carries traceability across steps
resolved_stack_name Used to link Pulumi CLI actions to current operation
template_plan Set of resolved Pulumi templates for the infra plan
rendered_files In-memory representation of rendered files before disk write
secrets_map Current secrets-to-vault mount plan
resource_counts Expected # of resources before/after preview

Cleared after each execution cycle unless retained in diagnostic mode.


💾 Long-Term Memory (Persistent)

Stored in Blob Storage, Azure Cosmos DB, or Git, depending on environment and tenant.

1️⃣ Stack History

{
  "trace_id": "trace-auth-789",
  "component": "AuthService",
  "environment": "staging",
  "region": "westeurope",
  "stack": "staging-westeurope",
  "last_modified": "2025-05-08T12:00:00Z",
  "resource_count": 6,
  "outputs_hash": "9ad7f3c1..."
}

Used to:

  • Skip unchanged re-renders
  • Compare preview diffs with previous state
  • Enable safe re-runs of pulumi up

2️⃣ Output Map History

{
  "stack": "authservice-staging-weu",
  "outputs": {
    "aks": "aks-stg-auth-01",
    "dns": "auth.stg.connectsoft.io"
  },
  "revision": 3
}

Retained for:

  • Audit logging
  • DevOps injection into pipelines
  • Rollback targeting

3️⃣ Secrets Injection History

{
  "vault_uri": "https://vault-auth-stg.vault.azure.net/",
  "secret_refs": ["AUTH_SECRET", "KUBECONFIG"],
  "mount_strategy": "env"
}

Used to:

  • Detect missing or updated secrets
  • Enforce policy compliance
  • Recreate mount strategies during retry or promotion

4️⃣ Provisioning Logs + Snapshots

File Purpose
provisioning.log Stored with timestamp and trace reference
outputs.json Full resource outputs
stack-diff.json Diff from last run, for drift detection

🗂️ Retention Strategy

Memory Type Retention Duration
Stack outputs & logs 90 days (rotated monthly)
Secrets metadata 30–90 days, depending on policy sensitivity
Provisioning diffs 60 days minimum for audit
Successful ResourcesProvisioned events Archived indefinitely in trace store

🔐 Access Control

  • Write access only by Cloud Provisioner Agent
  • Read access by:

  • DevOps Agent (for output URI injection)

  • Observability Agent (for infra monitoring)
  • HumanOpsAgent (during rollback or review)

All memory objects include:

agent_origin: cloud-provisioner-agent
trace_id: trace-*
execution_id: exec-*

🔁 Replay Support (Future)

Memory structure enables:

  • Provisioning replay with previous parameters
  • Promotion-aware copy-to-environment (e.g., staging → prod)
  • Drift-aware re-run with preview diff and policy check

🧠 Summary

The Cloud Provisioner Agent’s memory system supports:

  • 🔁 Safe, idempotent infrastructure provisioning
  • 📎 Trace-aware state retention for audits
  • 🔍 Preview diff validation and rollback planning
  • 📦 Secret history and vault injection tracking

This memory architecture makes cloud provisioning reproducible, compliant, and auditable by design.


🎯 Validation

Before any infrastructure is provisioned, the Cloud Provisioner Agent performs multi-layer validation to ensure:

  • 🧱 Pulumi stack correctness
  • 🔐 Secrets integrity and policy compliance
  • 🧭 Region constraints and resource quota awareness
  • 📛 Proper naming, tagging, and trace metadata
  • 🛰️ Safe preview of changes with user or orchestrator confirmation

Validation protects cloud environments from drift, misconfiguration, overprovisioning, and policy violations.


✅ Validation Stages

1️⃣ Stack Structure Validation

Check Tool/Logic
Pulumi file completeness All required files: Pulumi.yaml, stack file, logic file
YAML schema validity Linter or custom validator (YAML + JSON schemas)
Supported resource types Only whitelisted modules and cloud providers

2️⃣ Naming & Tagging Enforcement

Check Enforcement Rule
Resource names Must conform to naming.yaml pattern (e.g., {prefix}-{env}-{region}-{component})
Max length Enforced per Azure/GCP/AWS limits (e.g., AKS cluster ≤ 63 chars)
Required tags Must include trace_id, blueprint_id, environment, agent_origin
Forbidden patterns No capital letters, underscores, or reserved suffixes

3️⃣ Secrets Validation

Check Enforcement
vault_ref exists Must map to declared vault secret in overlay or policy
mount_strategy valid Must be env, volume, or vault-agent-sidecar
No plaintext secrets All secrets must resolve to secure mount or Pulumi config secret
SecretsProvider setup Pulumi stack must declare a secrets provider if secrets used

4️⃣ Region & Resource Constraints

Constraint Rule
Region availability Must exist in cloud-region-map.yaml
Resource quota (planned) Soft-check via Azure Resource Graph or Terraform provider
Failover support replication-strategy.yaml must match resource plan (e.g., blob = geo-redundant)

5️⃣ Pulumi Preview Validation

  • Runs:
pulumi preview --stack {stack} --diff
  • Checks:

  • Resource count: how many to create/change/delete

  • Diff output: ensure no drift unless expected
  • Errors: quota exceeded, invalid config, unknown provider

If preview fails, emit ProvisioningFailed with error snapshot and halt.


6️⃣ Trace Context Validation

Requirement Behavior
trace_id missing ❌ Block execution and emit validation error
execution_id missing ❌ Halt — required for observability and log correlation
blueprint_id missing ❌ Required for memory + audit linkage
component undefined 🟡 Warn — default to scope if trace metadata implies one

📄 Example Validation Error (emitted as JSON)

{
  "event": "ProvisioningFailed",
  "trace_id": "trace-auth-789",
  "stack": "staging-westeurope",
  "reason": "Missing secretsProvider in Pulumi config",
  "severity": "high",
  "stage": "validation",
  "timestamp": "2025-05-08T12:48:01Z"
}

🛠️ Validation Summary Table

Category Must Pass Tool Used
File completeness Internal check
YAML/TS syntax yamllint, tslint, or schema validation
Stack preview pass pulumi preview
Tags present Tag policy engine
Vault mappings SecretsInjectionSkill
Naming constraints NamingResolverSkill

🧠 Summary

The Cloud Provisioner Agent's validation system guarantees:

  • 🧱 Well-formed IaC before deployment
  • 🔐 Secure, compliant use of secrets
  • 📛 Proper tagging and naming across cloud resources
  • 📡 Preview visibility and error transparency before execution
  • 🧾 Full trace and audit linkage to each provisioning action

Validation is the final gate before cloud infrastructure is deployed — ensuring that ConnectSoft remains secure, predictable, and compliant.


🔁 Retry & Correction Flow

Provisioning cloud infrastructure is inherently error-prone due to:

  • 🛰️ Cloud-side quota errors or API delays
  • 🔐 Vault/secret misconfigurations
  • 🧱 YAML or resource plan issues
  • 📦 Conflicting or drifted resource states

The Cloud Provisioner Agent must fail safely, retry deterministically, and never provision partial or insecure infrastructure.


✅ Retryable Error Categories

Error Type Action
Pulumi CLI transient failure Retry up to 3x with exponential backoff
Azure API throttling (429) Backoff and retry within cooldown window
DNS resolution delay (e.g., after zone creation) Wait, re-query, retry binding
Vault unavailability Retry after delay (up to policy-defined max attempts)
Stack lock present Wait, reattempt pulumi up or notify coordinator

❌ Non-Retryable / Escalation Errors

Error Action
Invalid infra_plan Abort with ProvisioningFailed (schema or structure error)
Missing trace_id or blueprint_id Hard stop — validation failure
Secrets mount strategy unknown Fail with clear error, await manual fix
Quota exceeded (non-transient) Emit error, suggest fallback region, notify orchestrator

🧪 Auto-Correction Strategies (Safe Fallbacks)

Condition Correction
Missing resource_prefix Derive from component + env + region
Undeclared tags Auto-inject required tags (trace_id, env, etc.)
Missing fallback region Use secondary_region from cloud-region-map.yaml
K8s version not provided Use default LTS version for environment class
Empty DNS zone list Suggest default zone or infer from blueprint + region

All auto-corrections are tagged in log and span metadata for audit traceability.


🔁 Retry Flow Logic

flowchart TD
    A[Start Provisioning] --> B[Run Validation + Preview]
    B --> C{Preview OK?}
    C -- No --> D{Retryable?}
    D -- Yes --> B
    D -- No --> E[Emit ProvisioningFailed]
    C -- Yes --> F[Run pulumi up]
    F --> G{Success?}
    G -- No --> D
    G -- Yes --> H[Emit ResourcesProvisioned]
Hold "Alt" / "Option" to enable pan & zoom

📘 Retry Policy

retry:
  max_attempts: 3
  retry_interval_sec: 15
  backoff_strategy: exponential
  retryable_errors:
    - azure_api_throttle
    - stack_lock
    - vault_timeout
    - dns_unavailable

🧩 Sample Correction Log (JSON)

{
  "trace_id": "trace-invoice-501",
  "action": "AutoCorrectedMissingPrefix",
  "field": "resource_prefix",
  "value_applied": "cs-prod-frc-invoice",
  "retry_attempt": 1,
  "status": "provisioning_resumed"
}

🔔 Human Escalation Triggers

Scenario Action
ProvisioningFailed after 3 retries Notify HumanOpsAgent and halt
Secrets mismatch or conflict Raise to Security Engineer Agent
Stack exists but differs significantly Emit ProvisioningDriftDetected (future)
Region unavailable Emit RegionBlocked, suggest fallback_region to coordinator

✅ Safe Idempotency Rules

  • No resource is provisioned twice under the same trace_id + stack_name
  • pulumi preview must match pulumi up unless override approved
  • Stack diffs retained for drift comparison (hash-based memory key)

📡 Telemetry During Retry

All retries emit spans:

  • cloud.provision.retry.start
  • cloud.provision.retry.success
  • cloud.provision.retry.failed

Each includes retry_attempt, reason, and agent_origin.


🧠 Summary

The Cloud Provisioner Agent’s retry and correction flow ensures:

  • 🔁 Safe auto-recovery from transient issues
  • 🛑 Strict boundaries for policy-violating inputs
  • 🧠 Intelligent corrections and default injection
  • 📎 Full traceability and telemetry per retry step
  • 👤 Escalation hooks when human input is required

This makes the provisioning lifecycle resilient, safe, and fully observable — critical for infrastructure integrity at scale.


🔗 Collaboration Interfaces

The Cloud Provisioner Agent acts as a mid-pipeline executor within ConnectSoft’s orchestrated cloud lifecycle. It does not operate in isolation — it:

  • 🧱 Implements plans from architectural agents
  • 🔐 Integrates security overlays (vaults, secrets)
  • 📡 Emits outputs to DevOps, Observability, and Coordination layers
  • 🛰️ Triggers downstream agents once infrastructure is ready

🤝 Directly Collaborating Agents

Agent Purpose
Cloud Architect Agent Supplies cloud region maps, replication strategy, zone constraints
Infrastructure Architect Agent Provides infra_plan, overlays, and environment resource models
Security Engineer Agent Delivers secrets, vault references, RBAC overlays, and mount strategies
IaCCoordinator (or similar orchestrator) Triggers agent, monitors result, receives ResourcesProvisioned
DevOps Engineer Agent Consumes provisioned outputs (e.g., URIs, secrets, cluster names) for pipeline generation
HumanOps Agent Reviews provisioning failures, applies overrides or approves previews
Observability Agent Receives telemetry and output mappings for environment registration
API Gateway Agent (optional) Consumes dns_record outputs for subdomain registration and gateway bindings

🔁 Event-Based Collaboration

Emitted Events

Event Consumed By
ResourcesProvisioned DevOps Agent, Observability Agent, Orchestrator
ProvisioningFailed HumanOpsAgent, Orchestrator
ProvisioningPreviewReady (if manual approval required) HumanOpsAgent
ProvisioningDriftDetected (future) Orchestrator, Audit Agent

🔀 Input Dependencies

From Cloud Architect Agent

cloud-region-map.yaml:
  primary: westeurope
  secondary: northeurope
  zones: [1, 2]

From Security Engineer Agent

secrets:
  - name: AUTH_SECRET
    vault_ref: authservice-app-secret
    mount_strategy: env

From Infrastructure Architect Agent

infra_plan:
  aks:
    node_count: 3
  dns:
    fqdn: auth.europe.connectsoft.io

🔄 Output Recipients

DevOps Engineer Agent

  • aks_cluster_name
  • kubeconfig_vault_ref
  • vault_uri
  • storage_endpoint
  • Secrets mount bindings (JSON)

Observability Agent

  • Cluster URI
  • Log Analytics / Monitoring endpoints
  • DNS + region tags for telemetry routing

IaCCoordinator

  • ResourcesProvisioned event
  • stack_path
  • status: success | failed
  • Deployment duration + span metadata

🧭 Collaboration Flow Diagram

flowchart TD
    A[Cloud Architect Agent]
    B[Security Engineer Agent]
    C[Infrastructure Architect Agent]
    D[IaCCoordinator]
    E[Cloud Provisioner Agent]
    F[DevOps Engineer Agent]
    G[Observability Agent]

    A --> E
    B --> E
    C --> E
    D --> E
    E --> F
    E --> G
    E --> D
Hold "Alt" / "Option" to enable pan & zoom

💬 Interface Protocols

Interface Mode
Agent-to-Agent File-based overlays or in-memory via orchestrator
Events JSON payload over orchestrator event bus or webhook
Output Sharing Git commit / Blob upload + event pointer
Secrets Injected via secure vault overlay, never hardcoded

🧠 Summary

The Cloud Provisioner Agent collaborates with:

  • ☁️ Architects to receive plan and topology
  • 🔐 Security to embed compliant secret flows
  • 🧪 DevOps & QA agents to inject cloud runtime data
  • 🛰️ Orchestration to emit status and trigger downstream automation
  • 📊 Observability to register environments, regions, and telemetry contexts

It serves as the bridge between blueprint planning and actual cloud activation — in a highly modular, secure, event-driven way.


📡 Observability Hooks

The Cloud Provisioner Agent is a critical executor in the ConnectSoft platform. It must:

  • 📊 Emit real-time provisioning status
  • 🧾 Enable audit trail for infrastructure changes
  • 🛰️ Feed orchestration and dashboard systems
  • 🧠 Record metadata for security and compliance

🧭 OpenTelemetry Spans (Mandatory)

✅ Emitted Spans

Span Name Description
cloud.provision.start When the provisioning run begins
cloud.provision.preview When pulumi preview is executed
cloud.provision.up When actual resource provisioning starts
cloud.provision.success Emitted when provisioning is complete
cloud.provision.failed Emitted if any stage fails or aborts

📌 Span Tags

trace_id: trace-auth-789
execution_id: exec-auth-789
agent: cloud-provisioner-agent
stack: staging-westeurope
component: AuthService
environment: staging
region: westeurope
status: success | failed | skipped | drifted
resource_count: 6

📘 Structured Logs

Logged and optionally forwarded to:

  • Azure Monitor
  • Loki
  • Centralized audit blob

Example JSON Log

{
  "timestamp": "2025-05-08T12:59:20Z",
  "trace_id": "trace-auth-789",
  "agent": "cloud-provisioner-agent",
  "component": "AuthService",
  "event": "ProvisioningStarted",
  "stack": "staging-westeurope",
  "resource_plan": ["AKS", "KeyVault", "BlobStorage"]
}

📊 Metrics for Dashboards

Metric Description
provision_duration_ms Total time from preview to success/fail
provision_retry_count Retries per run
provision_success_rate Rolling success percentage by region/environment
resources_provisioned_total Number of resources successfully deployed
stack_drift_detected_total (Future) Detected mismatches during preview

📣 Lifecycle Events

ResourcesProvisioned

{
  "event": "ResourcesProvisioned",
  "trace_id": "trace-auth-789",
  "stack": "staging-westeurope",
  "resource_count": 6,
  "outputs": ["aks_cluster", "dns", "vault_uri"],
  "status": "success"
}

ProvisioningFailed

{
  "event": "ProvisioningFailed",
  "trace_id": "trace-auth-789",
  "reason": "Vault reference missing",
  "stage": "validation",
  "status": "failed"
}

📂 Output Snapshot for Monitoring

File Description
provisioning.log CLI + internal validation results
outputs.json Contains URIs, IDs, and cloud handles
stack-diff.json (optional) Preview vs. previous plan (for drift detection)

Snapshots are versioned per run and tagged with trace_id.


📈 Grafana Dashboard Modules (Example)

  • Provisioning summary by environment
  • Error rate by region
  • Time to provision per component
  • Daily stack count and status
  • Retry frequency trends

🧩 Integration Points

Target Hook
Observability Agent Receives events, spans, outputs, and metrics
HumanOps Agent Subscribed to failure + preview-only events
Audit Layer Reads provisioning logs and output hash
Orchestrator Correlates execution result to coordination FSM or pipeline flow

🔐 Compliance Metadata

All observability outputs must include:

agent_origin: cloud-provisioner-agent
trace_id: required
execution_id: required
environment: required
provisioning_type: automated

🧠 Summary

The Cloud Provisioner Agent’s observability hooks provide:

  • 🛰️ Full lifecycle visibility from plan to provisioning
  • 🧾 Structured logs and spans for real-time and historical audit
  • 📊 Dashboard-friendly metrics for success/failure trends
  • 📡 Event-based triggers for downstream automation and human review

It ensures cloud provisioning is traceable, secure, transparent, and analytics-ready — across all environments and tenants.


🎯 Human Intervention Hooks

While the Cloud Provisioner Agent is designed to operate autonomously, certain scenarios require manual oversight, including:

  • 🔐 Security-sensitive resources
  • 🌍 Production or multi-region environments
  • 🛑 Preview failures or unexpected diffs
  • 💸 High-cost provisioning operations
  • 🧾 Compliance-driven approval checkpoints

These hooks ensure safe intervention, while preserving traceability and audit logs.


✅ Intervention Scenarios

Scenario Action Required
auto_proceed = false in input Manual approval required after preview
Stack preview includes high-impact changes Review and confirmation
Vault reference missing Requires Security Engineer or HumanOps override
Region blocked or quota exceeded Manual reassignment or delay
Retry limit reached Escalate to HumanOpsAgent
Explicit manual_approval_required: true in blueprint Always paused for approval

👤 Supported Human Actions

Action Interface
Approve provisioning Orchestrator UI, CLI (approve-stack --trace)
Reject provisioning Same as above — emits ProvisioningRejected (planned)
Apply override to vault mount Through HumanOpsAgent or Vault UI
Retry manually with override CLI or UI-based re-invocation with override flag
Review preview and stack diff Presented via dashboard or audit UI

📝 Approval Gate Representation (YAML)

manual_approval:
  required: true
  approver_group: PlatformOps
  reason: "New AKS cluster in production"
  contact: "ops@connectsoft.io"

Used in environments or components flagged as high-impact or sensitive.


📣 Event-Driven Escalation

When approval is required:

{
  "event": "ProvisioningPreviewReady",
  "trace_id": "trace-auth-789",
  "stack": "staging-westeurope",
  "resource_plan": ["AKS", "KeyVault", "Blob"],
  "preview_diff": "3 create, 1 replace",
  "status": "awaiting_approval"
}

Received by:

  • HumanOps Agent
  • Orchestrator Dashboard
  • Notifications Bot (Teams, Slack, Email)

💬 UI Elements & CLI Hooks

Interface Feature
Dashboard UI Approve / Reject button, preview viewer, retry
CLI cs-stack approve --trace trace-789
Email Link to preview diff and action buttons
Chatbot (planned) Inline response to ProvisioningPreviewReady event

🧾 Logged Interventions

Every human interaction is stored in:

{
  "trace_id": "trace-auth-789",
  "action": "manual_approval",
  "approver": "alice.platformops",
  "reason": "Approved new staging AKS stack",
  "timestamp": "2025-05-08T13:05:21Z"
}

Audited by:

  • Compliance engine
  • Observability dashboards
  • Risk review snapshots

🧠 Summary

The Cloud Provisioner Agent includes secure, auditable human intervention hooks to:

  • 👤 Pause for approval when needed
  • 🔐 Escalate policy conflicts (vault, region, secrets)
  • 🔁 Allow override and retry flows
  • 📎 Ensure all human actions are trace-bound and logged

This empowers ConnectSoft teams to balance autonomy with governance, especially in high-sensitivity environments.


✅ Summary

The Cloud Provisioner Agent is a core executor in ConnectSoft’s AI-driven software factory. It turns cloud architectural intent into real, secured, traceable cloud infrastructure, delivering environments ready for CI/CD, observability, and production-grade workloads.


🎯 Core Functions

  • 📦 Render Pulumi stack files from orchestrated infrastructure plans
  • ☁️ Provision Azure cloud resources (AKS, Key Vault, DNS, Blob, etc.)
  • 🔐 Inject and map secrets securely across environments
  • 🧾 Emit outputs including URIs, stack metadata, and provisioning logs
  • 📡 Emit telemetry, events, and spans for full traceability
  • 👤 Support human approval, intervention, and retry workflows

🧭 Supported Resources (Phase 1 - Azure)

Resource Examples
Compute AKS Clusters
Storage Azure Blob
Secrets Azure Key Vault
DNS Azure DNS zones and records
Monitoring Azure Monitor / Log Analytics
Identity (future) App Registrations, Managed Identity

📚 Input Summary

  • trace_id, execution_id, blueprint_id
  • infra_plan.yaml
  • cloud-region-map.yaml
  • replication-strategy.yaml
  • secrets-metadata.yaml
  • environment overlays

📤 Output Summary

  • Pulumi project and stack files
  • outputs.json, secret-bindings.json, provisioning.log
  • Event: ResourcesProvisioned or ProvisioningFailed
  • OpenTelemetry spans
  • GitOps-compatible folder structure

🧠 Integration Summary

Collaborator Purpose
Cloud Architect Agent Region, replication, topology plans
Infrastructure Architect Agent Component-level infra plan
Security Engineer Agent Vaults, RBAC, mount strategy
IaCCoordinator Trigger and monitor execution
DevOps Engineer Agent Uses emitted URIs and secrets
HumanOps Agent Approves or overrides sensitive actions
Observability Agent Ingests infra metadata for monitoring dashboards

📈 Execution Flow Diagram

flowchart TD

  subgraph Orchestration Layer
    A[IaCCoordinator]
  end

  subgraph Architecture Inputs
    B[Cloud Architect Agent]
    C[Infrastructure Architect Agent]
    D[Security Engineer Agent]
  end

  subgraph Agent
    E[Cloud Provisioner Agent]
  end

  subgraph Outputs
    F[Pulumi Stack Files]
    G[Provisioning Events]
    H[Output Metadata]
    I[Provisioning Logs]
  end

  subgraph Downstream Consumers
    J[DevOps Engineer Agent]
    K[Observability Agent]
    L[HumanOps Agent]
  end

  A --> E
  B --> E
  C --> E
  D --> E

  E --> F
  E --> G
  E --> H
  E --> I

  G --> J
  H --> J
  G --> K
  G --> L
Hold "Alt" / "Option" to enable pan & zoom

🧾 Final Takeaways

The Cloud Provisioner Agent enables:

  • 🔁 Idempotent, versioned infrastructure provisioning
  • ☁️ Region- and tenant-aware environment setup
  • 🔐 Secure, policy-compliant secrets injection
  • 📡 Audit-friendly logs, metrics, and spans
  • 👤 Safe human oversight and governance

It is a cornerstone agent in ConnectSoft’s DevOps and infrastructure layer — ensuring the platform can scale autonomously, securely, and observably across cloud environments.