🧠 Cloud Provisioner Agent Specification¶

☁️ Core Purpose¶

The Cloud Provisioner Agent is the execution-layer enforcer of cloud resource provisioning within the ConnectSoft AI Software Factory. Its purpose is to translate cloud architecture plans into actual cloud infrastructure using Pulumi-based Infrastructure-as-Code (IaC).

It operates under orchestration (e.g., triggered by IaCCoordinator or EnvironmentSetupCoordinator) and receives high-level deployment topology, region selections, and environment overlays from the Cloud Architect Agent, along with secrets and compliance overlays from the Security Engineer Agent.

It ensures that all generated SaaS services, environments, and tenants have their required cloud foundation deployed, versioned, and trace-linked — across all supported regions and clouds.

🛠️ Role in the Platform¶

Layer	Role
Architecture	Implements `cloud-region-map`, topology, replication, and failover plans
Infrastructure	Converts blueprints into deployed, running cloud resources
DevOps	Enables pipelines and services to target real endpoints (AKS, Key Vault, etc.)
Security	Respects and injects policy-enforced vaults, identity scopes, DNS constraints
Environment Execution	Brings up cloud environments per tenant, region, edition, or component need

🌍 Phase Scope (Azure-First)¶

Provisioned Resource	Azure Equivalent
Compute Cluster	Azure Kubernetes Service (AKS)
Secret Store	Azure Key Vault
Object Storage	Azure Blob Storage
DNS Zones / Entries	Azure DNS
Observability	Azure Monitor / Log Analytics
Identity (future)	Azure AD App Registrations, Managed Identities
Database Layer	Azure PostgreSQL / Azure SQL (optional in Phase 1)

Each resource is:

Provisioned via Pulumi
Annotated with trace_id, blueprint_id, and provisioned_by: cloud-provisioner-agent
Tagged for environment, region, and compliance scope

📌 Strategic Alignment with Cloud Architect Agent¶

Cloud Architect Agent Defines	Cloud Provisioner Agent Does
`cloud-region-map.yaml`	Deploys to primary/secondary regions
`replication-strategy.yaml`	Provisions redundant or geo-replicated resources
`resource-compliance-tags`	Applies tags, identity scopes, and access limits
`zone-mapping`	Provisions zonal or multi-zone clusters if required
`dns-domain-map.yaml`	Allocates zone and creates records

🔗 Execution Trigger (Orchestrated)¶

Triggered by:

IaCCoordinator (e.g., on environment creation, blueprint activation, tenant onboarding)
Orchestration events like:
CreateCloudResourcesForTenant
ProvisionEditionLevelInfra
UpdateRegionCapacity

🧭 Platform Flow Placement¶

flowchart TD
    A[Cloud Architect Agent]
    B[Infrastructure Architect Agent]
    C[Security Engineer Agent]
    D[Cloud Provisioner Agent]
    E[DevOps Engineer Agent]
    F[Azure / Kubernetes Resources]

    A --> D
    B --> D
    C --> D
    D --> F
    D --> E

Hold "Alt" / "Option" to enable pan & zoom

🧩 Example Scenario¶

A new service is being deployed for Edition: EU-MultiTenant, requiring:

AKS cluster in westeurope
Azure DNS with *.edition.connectsoft.io
Azure Key Vault with injected secrets for runtime
Azure Blob for distributed file storage
Resource tags: env=staging, region=westeurope, edition=EU

Cloud Architect defines it → Cloud Provisioner Agent renders Pulumi → provisions resources → emits resource map → DevOps Agent consumes outputs.

✅ Summary¶

The Cloud Provisioner Agent:

☁️ Turns infrastructure blueprints into real cloud infrastructure
🔁 Ensures region- and tenant-specific environments are provisioned
🔐 Integrates with secrets, DNS, identity, and observability layers
🧭 Respects architectural directives, trace constraints, and cloud policy overlays
📊 Emits trace-linked metadata for CI/CD, security, and observability agents

It is a foundational executor — bringing ConnectSoft’s autonomous SaaS deployment vision into physical cloud reality.

📌 Core Responsibilities Overview¶

The Cloud Provisioner Agent is responsible for materializing infrastructure blueprints into provisioned cloud resources across environments and tenants, starting with Azure.

It receives cloud architecture plans, security overlays, and service bindings, and converts them into IaC-managed cloud infrastructure using Pulumi.

Its deliverables are traceable, version-controlled, validated, and aligned with orchestration plans.

🔧 Detailed Responsibilities¶

✅ 1. Pulumi Stack Generation¶

Render complete stack files per environment, tenant, or blueprint scope:
Pulumi.yaml (project definition)
Pulumi.{stack}.yaml (stack config)
index.ts / main.cs (actual resource logic)
Structure output for GitOps and CI integration

✅ 2. Infrastructure Provisioning¶

Execute pulumi up to provision resources for:
AKS clusters
Azure Key Vault instances
Azure Blob Storage
Azure DNS zones and records
Azure Monitor / App Insights
Resource groups, virtual networks (if required)
Tag all resources with:

trace_id: trace-789
provisioned_by: cloud-provisioner-agent
environment: staging
edition: EU
region: westeurope

✅ 3. Multi-Region & Zonal Strategy Execution¶

Use cloud-region-map.yaml and replication-strategy.yaml to:
Deploy primary and failover resource groups
Configure geo-redundant storage or DNS
Provision zonal AKS clusters with SLA-specific constraints

✅ 4. Secrets and Vault Binding¶

Inject secrets provided by Security Engineer Agent into:
Key Vault creation and initial seeding
Pulumi secretsProvider block
Emit structured secret-bindings.json for downstream agents

✅ 5. Output & Endpoint Metadata Generation¶

Produce outputs consumed by:
DevOps pipelines (for endpoint injection)
API Gateways (for DNS mapping)
Observability Agent (for log collection and telemetry hooks)
Example:

{
  "aks_cluster_name": "aks-eu-staging-01",
  "key_vault_uri": "https://vault-eu-staging.vault.azure.net/",
  "storage_url": "https://blob-eu-staging.blob.core.windows.net",
  "dns_record": "auth.europe.connectsoft.io"
}

✅ 6. Provisioning Validation & Status Emission¶

After each deployment:
Run pulumi preview and compare before/after diffs
Emit:
- ResourcesProvisioned
- ProvisioningFailed (if any step fails)
- Full provisioning log and event metadata
- Store deployment snapshot (provisioning-output.json) for trace audit

✅ 7. Traceability & Versioning¶

All outputs must include:
trace_id
execution_id
blueprint_id
stack_id
agent_origin: cloud-provisioner-agent
Commit stack files to Git (optional) under:

/infrastructure/stacks/{component}/{region}/{env}/

✅ 8. Collaborative Feedback Loop¶

Respond to signals from:
Cloud Architect Agent — topology, region, SLA constraints
Security Engineer Agent — vaults, policy blocks, RBAC scopes
Infrastructure Architect Agent — service/infra overlays
Provide infrastructure URIs and secrets back to:
DevOps Engineer Agent
Observability Agent
Platform Coordinator (if applicable)

📦 Summary of Primary Deliverables¶

Artifact	Description
`Pulumi.yaml`	IaC project manifest
`Pulumi.dev.yaml`	Config file per environment
`index.ts` / `main.cs`	Stack logic with resources
`outputs.json`	Emitted resource map
`provisioning.log`	CLI summary of provisioning run
`provisioned-resources.json`	Full list of resource URIs, tags, metadata
`secret-bindings.json`	Used by DevOps and security agents
`ResourcesProvisioned` event	Traceable success notification

✅ Summary¶

The Cloud Provisioner Agent is the infrastructure enabler of ConnectSoft’s AI Software Factory:

📦 Generates cloud-native, GitOps-ready IaC artifacts
☁️ Provisions Azure infrastructure with trace-linked metadata
🔐 Injects secrets, applies policies, and enforces topology plans
🔁 Feeds downstream agents with reliable, deployable endpoints

It bridges blueprint design with real cloud execution — securely and autonomously.

📥 Inputs¶

The Cloud Provisioner Agent requires a well-defined set of inputs from upstream agents and orchestration logic. These inputs guide:

📦 What to provision
🌍 Where to provision (region, zone, environment)
🔐 With what policies, secrets, naming conventions, and blueprint scope

All inputs are trace-bound, environment-aware, and cloud-specific.

🔑 Primary Inputs¶

1️⃣ `cloud-region-map.yaml`¶

Provided by: Cloud Architect Agent

primary_region: westeurope
secondary_region: northeurope
zones:
  - 1
  - 2

Used to define target regions and zonal requirements.

2️⃣ `replication-strategy.yaml`¶

Provided by: Cloud Architect Agent

replication:
  storage: geo-redundant
  dns: failover
  observability: dual-region

Determines whether to deploy mirrored resources across regions or zones.

3️⃣ `environment-overlay.yaml`¶

Provided by: Infrastructure Architect Agent

environment: staging
resource_prefix: cs-stg
resource_group: cs-stg-infra
tags:
  env: staging
  edition: EU
  trace_id: trace-789

Used to inject naming, tagging, and grouping conventions.

4️⃣ `component-scope.yaml`¶

Provided by: Orchestration (e.g., IaCCoordinator)

component: AuthService
execution_id: exec-auth-789
trace_id: trace-auth-789
blueprint_id: blueprint-auth-multi

Connects infrastructure to the service’s lifecycle and audit trail.

5️⃣ `infra-plan.yaml`¶

Can be composed from blueprint fragments or pre-assembled.

resources:
  - type: aks
    size: standard
    node_count: 3
    k8s_version: 1.28
  - type: keyvault
    policy: app-only
  - type: storage
    tier: standard
  - type: dns
    fqdn: auth.europe.connectsoft.io

Describes the desired resource topology.

6️⃣ `secrets-metadata.yaml`¶

Provided by: Security Engineer Agent

secrets:
  - name: AUTH_SECRET
    value_from: vault
    vault_ref: authservice-app-secret
    mount_strategy: env

Used to seed Azure Key Vault and inform DevOps pipeline injection.

7️⃣ `resource-constraints.yaml` (Optional)¶

Can come from orchestration or platform governance policies.

quotas:
  max_aks_clusters: 3
  max_nodes_per_cluster: 5
require_tags:
  - trace_id
  - blueprint_id

🧠 Internal Contextual Inputs (Resolved by Agent or Environment)¶

Field	Description
`cloud_provider`	Default: `azure` (future: `aws`, `gcp`)
`pulumi_project`	Derived from `component_name` + `region`
`pulumi_stack_name`	e.g., `staging-eu-auth`
`resource_prefix`	e.g., `cs-stg-auth`

🧪 Example Consolidated Input Context¶

trace_id: trace-auth-789
execution_id: exec-auth-789
component: AuthService
cloud_provider: azure
region: westeurope
environment: staging
resource_prefix: cs-stg-auth
infra_plan:
  aks: true
  keyvault: true
  dns: auth.europe.connectsoft.io
  storage: standard
secrets:
  - vault_ref: authservice-secret

📎 Input Validation Checklist¶

Input	Validation
`trace_id`, `execution_id`	✅ Required
`primary_region`, `resource_prefix`	✅ Required
`infra_plan`	✅ Must define at least 1 resource
`secrets`	✅ Must map to valid vault refs
`replication`	🟡 Optional, fallback to single-region mode
`environment`	✅ Used for naming, tags, and stack configs

🧠 Summary¶

The Cloud Provisioner Agent consumes a composite input model, made up of:

Architecture inputs (regions, replication, DNS)
Security overlays (vaults, mount strategies)
Environment config (tags, naming, resource groups)
Blueprint links (trace ID, component scope)

This enables the agent to deterministically generate and provision compliant, observable, cloud infrastructure.

📤 Output¶

The Cloud Provisioner Agent emits structured, traceable outputs in the form of:

✅ Pulumi stack files and project artifacts
📁 Provisioning logs and resource outputs
🔐 Secrets injection metadata
📡 Events and telemetry spans for orchestration and observability
💾 Cloud resource metadata for downstream agents (DevOps, Observability, Security)

✅ Primary Output Artifacts¶

1️⃣ Pulumi Stack Files¶

File	Description
`Pulumi.yaml`	Pulumi project definition (name, runtime, description)
`Pulumi.{stack}.yaml`	Stack configuration file (region, secrets provider, tags)
`index.ts` / `main.cs`	Program logic to provision resources
`stack-output.json`	Output map of provisioned endpoints, IDs, URIs
`provisioning.log`	CLI logs from `pulumi up` or `preview`

All files are committed (or staged) in Git at: /infrastructure/stacks/{component}/{env}/{region}/

2️⃣ Cloud Resource Output Map¶

Emitted after successful provisioning:

{
  "aks_cluster_name": "cs-stg-auth-aks01",
  "aks_kubeconfig_secret": "vault://auth-kubeconfig",
  "dns_record": "auth.eu.connectsoft.io",
  "key_vault_uri": "https://vault-auth-stg.vault.azure.net/",
  "storage_url": "https://csstgautheustorage.blob.core.windows.net"
}

Used by:

DevOps Engineer Agent (for pipeline/environment injection)
API Gateway Agent (for DNS mapping)
Observability Agent (for logs/metrics setup)

3️⃣ Secrets Metadata Output¶

{
  "secrets":
  [
    {
      "name": "AUTH_SECRET",
      "source": "azure-keyvault",
      "vault_uri": "https://vault-auth-stg.vault.azure.net/",
      "mount_strategy": "env"
    }
  ]
}

Shared with:

DevOps Agent for CI/CD env binding
Security Agent for validation
Test Agent for runtime test secrets (if allowed)

4️⃣ Deployment Event¶

`ResourcesProvisioned`¶

{
  "event": "ResourcesProvisioned",
  "trace_id": "trace-auth-789",
  "component": "AuthService",
  "execution_id": "exec-auth-789",
  "environment": "staging",
  "region": "westeurope",
  "stack_path": "infrastructure/stacks/authservice/staging/westeurope",
  "resource_count": 6,
  "timestamp": "2025-05-08T10:00:22Z"
}

Consumed by:

Orchestration layer
DevOps Agent
Dashboards / Observability Agent

5️⃣ Pulumi Output Metadata (YAML / JSON)¶

Saved for audit trail:

{
  "project": "authservice-stack",
  "stack": "staging-westeurope",
  "provisioned_by": "cloud-provisioner-agent",
  "resources": [
    { "type": "azure:containerservice:ManagedCluster", "name": "aks-auth-eu" },
    { "type": "azure:storage:BlobContainer", "name": "authfiles" }
  ]
}

📦 Optional Outputs (Based on Context)¶

Output	Condition
`manual-approval.yaml`	If sensitive resource or environment requires pre-deploy approval
`rollback-plan.json`	If preview identifies drift and fallback is enabled
`dns-map.yaml`	If multiple endpoints / subdomains need to be shared with API Gateway Agent

🧠 Metadata Embedded in All Outputs¶

All output artifacts include:

trace_id: trace-auth-789
execution_id: exec-auth-789
blueprint_id: blueprint-auth-multi
agent_origin: cloud-provisioner-agent
environment: staging
region: westeurope

📂 File Structure Convention (GitOps-Compatible)¶

/infrastructure/stacks/
  └── authservice/
      └── staging/
          └── westeurope/
              ├── Pulumi.yaml
              ├── Pulumi.staging-westeurope.yaml
              ├── index.ts
              ├── provisioning.log
              ├── outputs.json
              └── secrets.json

✅ Summary¶

The Cloud Provisioner Agent outputs:

📦 All required Pulumi IaC assets
📡 Rich cloud output metadata
🔐 Fully structured secrets and mount plans
🧠 Traceable provisioning logs and telemetry
🛰️ Events and files consumed by downstream agents

Its outputs are the bridge between abstract cloud design and concrete deployable infrastructure.

📚 Knowledge Base¶

The Cloud Provisioner Agent has access to a versioned, cloud-specific knowledge base composed of:

Reusable Pulumi module templates
Region, naming, and topology policies
Compliance and tagging requirements
Deployment patterns per resource type
Cloud-specific constraints and best practices
Secret and identity handling strategies

This knowledge is used to generate secure, consistent, policy-aligned cloud infrastructure.

🧱 Core Knowledge Categories¶

1️⃣ Pulumi Template Modules¶

Resource Type	Pulumi Module
AKS	`modules/azure/aks-cluster.ts`
Key Vault	`modules/azure/keyvault.ts`
Blob Storage	`modules/azure/blob-storage.ts`
DNS Zones & Records	`modules/azure/dns-record.ts`
Log Analytics	`modules/azure/monitor-insights.ts`
Virtual Network (Optional)	`modules/azure/vnet.ts`

Each template:

Supports parameter overrides
Uses trace- and environment-aware naming
Emits outputs to the final Pulumi stack

2️⃣ Naming Convention Rules¶

naming:
  resource_prefix: cs
  format: "{prefix}-{env}-{region}-{component}"
  allowed_chars: [a-z0-9-]
  max_length: 63

Used to auto-resolve cloud resource names (e.g., cs-stg-weu-auth-aks)

3️⃣ Environment & Region Topology Definitions¶

regions:
  staging:
    default: westeurope
    failover: northeurope
    zones: [1,2]
  production:
    default: francecentral
    failover: westeurope

Guides region-aware provisioning and DNS/FQDN mappings.

4️⃣ Cloud Resource Classifiers¶

Class	Rule
`stateful`	Require backup or replication (e.g., blob storage)
`sensitive`	Must be deployed with key vault + tag `compliance=high`
`ephemeral`	Skippable during rollback or low-priority teardown

Informs retry logic, vault strategy, and disaster recovery scope.

5️⃣ Secrets Mounting Strategies¶

mount_strategies:
  - env
  - volume
  - vault-agent-sidecar

Policy determines which strategy to apply by environment or resource type.

6️⃣ Tagging Policy Rules¶

All resources must be tagged with:

tags:
  trace_id: <REQUIRED>
  blueprint_id: <REQUIRED>
  environment: <REQUIRED>
  provisioned_by: cloud-provisioner-agent

Optionally:

tenant_id: T-134
edition: EU-MultiTenant
compliance_level: high

7️⃣ Common Output Resolvers¶

Used to emit resource-specific outputs:

return {
  aks_cluster_name: aks.name,
  dns_record: dnsRecord.fqdn,
  vault_uri: keyVault.vaultUri,
  storage_url: storageAccount.primaryBlobEndpoint,
};

8️⃣ Blueprint Resource Map Inference¶

Pre-trained LLM model (or lookup index) for:

Blueprint Use Case	Expected Infra
`auth-service`	AKS, DNS, KV, Storage
`report-generator`	KV, Blob, LogAnalytics
`tenant-onboarding`	DNS zone, Storage, Key Vault, Managed Identity

Used to auto-expand from use case → provisioning requirements when infra_plan is partial.

📦 Knowledge Source Locations¶

Asset Type	Location
Templates	`infrastructure/modules/azure/`
Naming/Tagging Rules	`infrastructure/policies/naming.yaml`
Region Topology	`cloud-region-map.yaml`
Secrets Policy	`security/overlay-vault.yaml`
Blueprints to Infra	`knowledge/infra-map-index.json`

🧠 Summary¶

The Cloud Provisioner Agent leverages a rich and versioned knowledge base that includes:

🧱 Pulumi templates
🧭 Region-aware deployment rules
🔐 Secrets and vault strategies
📛 Naming and tagging policies
📦 Use-case to infrastructure mapping

This allows it to provision secure, consistent, and trace-aligned cloud environments — fully autonomously.

🔄 Process Flow¶

flowchart TD
    A[Receive Input & Trace Metadata] --> B[Load Infra Plan + Region Map + Vault Info]
    B --> C[Resolve Templates + Merge Overlays]
    C --> D[Generate Pulumi Stack Files]
    D --> E[Run `pulumi preview` (validate)]
    E --> F{Approved or Auto-Proceed?}
    F -- Yes --> G[Run `pulumi up` (provision)]
    F -- No --> H[Emit PreviewOnly + Await Approval]
    G --> I[Verify Resources + Outputs]
    I --> J[Emit Outputs, Logs, and Event: ResourcesProvisioned]

Hold "Alt" / "Option" to enable pan & zoom

🪜 Detailed Step Breakdown¶

✅ Step 1: Receive Execution Context¶

Triggered by orchestration (e.g., IaCCoordinator), receives:

trace_id, blueprint_id, execution_id
infra_plan.yaml
cloud-region-map.yaml
secrets.yaml

Ensures every run is traceable and bounded to a blueprint scope.

✅ Step 2: Load Knowledge & Merge Overlays¶

Load:
Region topology and resource plan
Naming/tagging policy
Secrets strategy
Merge:
Environment overlays (resource_group, tags, replication)

✅ Step 3: Template Rendering¶

Select Pulumi templates (from library)
Generate:
Pulumi.yaml (project)
Pulumi.{stack}.yaml (stack config)
index.ts / main.cs (logic with parameters)

✅ Step 4: Preview Mode (Validation)¶

Run pulumi preview
If success:
Proceed or emit ProvisioningPreviewReady event
If failure:
Emit ProvisioningFailed
Optionally retry with fallback region or settings

✅ Step 5: Execution (Provisioning)¶

If approved or auto_proceed = true:

Run pulumi up
Track:
Resource count
Changed vs created
Outputs emitted
Capture full provisioning log

✅ Step 6: Post-Provisioning Output¶

Resolve:
Resource URIs (AKS, DNS, Key Vault, Blob, etc.)
DNS records
Vault URIs and bound secrets
Validate:
Resource tags
Trace metadata
Naming format

✅ Step 7: Emit Artifacts¶

Output to Git or blob:

stack-output.json
Pulumi.yaml, Pulumi.stack.yaml
provisioning.log
secret-bindings.json

Emit:

ResourcesProvisioned event
OTEL spans
Infra snapshot to DevOps and Observability agents

🧠 Embedded Trace Metadata in All Steps¶

Every file, log, and emitted event includes:

trace_id: trace-auth-789
blueprint_id: blueprint-auth-multi
execution_id: exec-auth-789
agent_origin: cloud-provisioner-agent
region: westeurope
environment: staging

🔁 Alternate Flows¶

Condition	Alternate Step
Preview failed	Retry with fallback region (if allowed)
Vault unreachable	Emit soft failure, mark secrets as pending
Manual approval required	Emit preview-only event and await signal
Output differs from last	Emit `ProvisioningDriftDetected` (future)

🧠 Summary¶

The Cloud Provisioner Agent’s process flow ensures:

🛠️ Deterministic rendering from blueprint + region
☁️ Secure and compliant provisioning via Pulumi
📡 Observable and versioned outputs
🔁 Safe retry and approval paths built into the flow

It is modular, policy-bound, traceable, and ready for multi-cloud scaling.

🧩 Skills and Kernel Functions¶

The Cloud Provisioner Agent uses a combination of:

📚 Semantic Kernel (SK) Skills — composable functions for planning, transformation, naming, and template expansion
⚙️ Domain-specific Pulumi SDK bindings — for real-time provisioning logic
🔁 Agent-local orchestration logic — to manage flows, validate diffs, and emit outputs

All skills operate with full trace context and are injected into an execution plan derived from blueprint and environment state.

🔧 Core Semantic Kernel Skills¶

Skill	Purpose
`ResourcePlanResolverSkill`	Merges `infra_plan.yaml` + `region_map.yaml` + overlays
`PulumiTemplateSelectorSkill`	Chooses appropriate stack templates (AKS, KV, Blob, etc.)
`NamingResolverSkill`	Computes compliant resource names (tagged, traceable, cloud-safe)
`SecretsInjectionSkill`	Maps vault references to initial seed values in Pulumi
`PulumiRendererSkill`	Composes and writes `Pulumi.yaml`, stack files, and index.ts
`ProvisioningPreviewSkill`	Runs `pulumi preview` and captures diff / output
`PulumiExecutorSkill`	Executes `pulumi up` (if approved or auto-proceed)
`OutputFormatterSkill`	Extracts key URIs, secrets, and resource identifiers
`EventEmitterSkill`	Emits `ResourcesProvisioned`, `ProvisioningFailed`, etc.
`TelemetryTracerSkill`	Injects span metadata, logs, and OTEL context during execution

🧠 AI-Augmented Kernel Functions¶

These skills may use LLM reasoning for complex planning or decision assistance:

Function	Description
`StackNamingPlanner`	Suggests short, region-safe resource names (63-char limit, lowercase, etc.)
`TopologyExpander`	Infers additional resources from partial plans (e.g., DNS implied by AKS)
`RegionFallbackAdvisor`	Suggests next best region if primary is unavailable or quota-exceeded
`PolicyComplianceChecker`	Detects drift or missing tags in planned output before provisioning

🔁 Execution Plan Sample (SK-Compatible)¶

steps:
  - use ResourcePlanResolverSkill
  - use PulumiTemplateSelectorSkill
  - use NamingResolverSkill
  - use PulumiRendererSkill
  - use ProvisioningPreviewSkill
  - if approved:
      - use PulumiExecutorSkill
      - use OutputFormatterSkill
      - use EventEmitterSkill(ResourcesProvisioned)
  - else:
      - use EventEmitterSkill(ProvisioningPreviewReady)

🔐 Policy + Secret Skill Integrations¶

All secrets must flow through SecretsInjectionSkill
If vault_strategy = vault-agent-sidecar, inject sidecar.yaml template into index.ts
Role assignments (e.g., to services or pipelines) handled via AccessPolicyComposerSkill (planned)

📦 Skill Library Versions & Strategy¶

Source	Versioning Strategy
`templates/pulumi/`	SemVer + cloud-provider scope
`skills/`	SK plugin folders with unit-tested prompt wrapping
`common/overlays/`	YAML-driven, updated per environment baseline

🧠 Summary¶

The Cloud Provisioner Agent orchestrates its infrastructure logic using:

📚 Semantic Kernel skills to render and validate IaC
🤖 LLM-enhanced functions to reason about naming, region fallback, and policy compliance
🔁 Composable execution plans for traceable, safe, auditable provisioning

It ensures infrastructure is provisioned modularly, predictably, and context-aware, at any scale.

🧰 Core Technology Stack¶

Layer	Technology	Purpose
Infrastructure as Code	Pulumi (TypeScript/.NET SDK)	Declarative cloud provisioning
Cloud Provider (Phase 1)	Azure	Target for AKS, Key Vault, Storage, DNS
Agent Execution Runtime	.NET 8 + Semantic Kernel	Agent host + skills engine
LLM/AI Reasoning	Azure OpenAI (GPT-4 Turbo)	Stack expansion, naming, fallback logic
Observability	OpenTelemetry SDK	Spans: `cloud.provision.start`, `provision.success`, `provision.failed`
Orchestration Interface	Internal Orchestrator API / Coordinator	Triggered via `IaCCoordinator` or env setup FSM
CLI Execution	`pulumi preview`, `pulumi up`, `pulumi destroy`	Infrastructure deployment lifecycle

☁️ Pulumi Configuration Standards¶

Project root: /infrastructure/stacks/{component}/{env}/{region}/
Runtime: Default TypeScript (later supports .NET Pulumi)
Secrets provider: Azure Key Vault (pulumi config set --secret ...)
Stack naming convention: {env}-{region}-{component} (e.g., staging-westeurope-auth)

🌐 Azure Services Targeted¶

Resource	Pulumi Module Used
AKS Cluster	`@pulumi/azure-native.containerservice.ManagedCluster`
Key Vault	`@pulumi/azure-native.keyvault.Vault`
DNS Zone & Record	`@pulumi/azure-native.network.DnsZone`
Blob Storage	`@pulumi/azure-native.storage.StorageAccount`
Log Analytics	`@pulumi/azure-native.insights.*`
Virtual Network (optional)	`@pulumi/azure-native.network.VirtualNetwork`

All modules used via centralized module templates (modules/azure/*.ts).

📦 GitOps & Storage Integration¶

Use	Tool
IaC Versioning	Git repo under `/infrastructure/stacks/...`
Logs + output files	Azure Blob Storage (trace-tagged)
Secret injection	Azure Key Vault (shared or per-service)
Snapshot storage	`outputs.json`, `provisioning.log` uploaded to blob/archive path

🔐 Security & Secrets Management¶

Use Case	Tech
Vault Seeding	Pulumi secret config + ARM access policy
Sidecar Injection	Agent-side template renders vault-agent manifest (if required)
Identity Binding (future)	Managed Identity + App Registration support

All secrets flow through SecretsInjectionSkill and conform to overlay policy.

📊 Observability Stack¶

Tool	Role
OpenTelemetry	Emitted spans (start, preview, success/fail) with `trace_id`, `execution_id`
Grafana Dashboards	Metrics visualization (resource count, duration, error rate)
Azure Monitor	Logs from provisioning runs, validation failures
Structured Logs	Emitted via `ILogger` → forwarded to blob, Azure Monitor, or Loki

🔁 Agent Trigger & Integration¶

Interface	Trigger
Orchestrator API	`POST /provision/stack`
Git hook	PR or branch with `infra-plan.yaml` + `trace_id`
Manual CLI	`dotnet run provision --trace trace-auth-789`

🧪 Validation Tools¶

Check	Tool
Template lint	`tslint`, `eslint`
Pulumi preview	`pulumi preview --diff --stack ...`
Cloud quota check	(planned) Azure Resource Graph query via SDK
Stack diff validation	Hash-based output delta + drift detection

🧠 Summary¶

The Cloud Provisioner Agent’s tech stack enables:

🔁 Automated, GitOps-friendly infrastructure provisioning
☁️ Azure-native cloud resources delivered via Pulumi
📡 Observable, trace-tagged provisioning lifecycle
🔐 Secure secret and environment injection
🤖 AI-assisted planning, naming, and recovery

This stack ensures modular, reproducible, multi-environment infrastructure delivery with minimal manual ops.

📜 System Prompt¶

The System Prompt is a persistent LLM instruction that ensures the Cloud Provisioner Agent:

Operates securely and deterministically
Applies cloud infrastructure best practices
Honors blueprint-level traceability and naming rules
Executes provisioning within policy and region constraints
Emits complete, structured outputs for downstream automation

It is injected on agent startup and used across Semantic Kernel planning, skills, and function chains.

📋 System Prompt (Full Text)¶

You are the Cloud Provisioner Agent in the ConnectSoft AI Software Factory.

Your responsibility is to generate and provision secure, traceable, environment-specific cloud infrastructure using Infrastructure-as-Code (IaC) — primarily Pulumi for Azure.

You consume structured inputs including:
- Blueprint ID, Trace ID, Execution ID (for traceability)
- Infra plan YAML defining AKS, Key Vault, Storage, DNS, etc.
- Region overlays and zone definitions (primary, failover, SLA)
- Security overlays including secrets and vault mappings
- Environment overlays such as resource groups and tags

You must:
- Select and render the appropriate Pulumi stack templates
- Merge overlays for environment, region, and secrets
- Apply naming conventions and tagging policies
- Validate the output using `pulumi preview`
- Provision cloud infrastructure only if preview is successful or auto-approve is true
- Emit outputs including stack files, provisioning log, and a full map of resource URIs and secrets
- Tag all resources with `trace_id`, `blueprint_id`, `environment`, and `agent_origin: cloud-provisioner-agent`

You must not:
- Provision anything if `trace_id` or `blueprint_id` is missing
- Use wildcard names, untagged resources, or unvalidated secrets
- Guess resource plans if `infra_plan` is missing (unless instructed by blueprint or default expansion rules)

All output must be deterministic, compliant with cloud-specific rules, and versioned for reproducibility.

Emit the `ResourcesProvisioned` event only if the stack completes successfully. Otherwise, emit `ProvisioningFailed` with diagnostics.

✅ Scope Imposed by Prompt¶

Category	Constraint
Provisioning	Requires explicit trace, blueprint, and region
Secrets	Must match provided `secrets-metadata.yaml` or `vault overlay`
Naming	Uses enforced convention: `{prefix}-{env}-{region}-{component}`
Retry	Allowed only after preview or explicit fallback instruction
Emission	Event + structured output required for downstream agents

🔐 Compliance Notes¶

Resource tags enforced at provision time
Regions validated against cloud-region-map.yaml
Secret mount strategy must be explicit (env, volume, or sidecar)
Naming must avoid uppercase, special characters, or disallowed suffixes

📦 Output Obligations per Prompt¶

✅ Pulumi stack files (YAML + TS)
✅ Resource output map (e.g., outputs.json)
✅ provisioning.log
✅ Secrets binding file (secret-bindings.json)
✅ ResourcesProvisioned or ProvisioningFailed event

🧠 Summary¶

The System Prompt ensures that the Cloud Provisioner Agent:

🧱 Renders reproducible, secure infrastructure
🔁 Follows environment-specific overlays and naming
☁️ Provisions only after successful preview
📡 Emits traceable events and outputs
🔒 Never bypasses security or compliance gates

It defines the agent as a safe, deterministic, cloud-native executor for all provisioned environments.

📥 Input Prompt Template¶

This template defines the structured YAML or JSON input passed to the Cloud Provisioner Agent when invoked by:

The orchestration layer (e.g., IaCCoordinator)
Environment setup workflows
Manual invocations during sandbox or testing flows

It encapsulates all context needed to plan, generate, and provision the cloud resources for a specific service, region, and environment.

📋 YAML Input Prompt Template¶

trace_id: trace-auth-789
execution_id: exec-auth-789
blueprint_id: blueprint-auth-multi
component: AuthService
agent_origin: orchestrator

cloud_provider: azure
environment: staging
region: westeurope
resource_group: cs-stg-rg-auth
resource_prefix: cs-stg-auth
auto_proceed: true

infra_plan:
  aks:
    node_count: 3
    k8s_version: 1.28
  keyvault:
    policy: app-only
  blob_storage:
    tier: standard
  dns:
    fqdn: auth.europe.connectsoft.io

replication:
  storage: geo-redundant
  dns: failover

secrets:
  - name: AUTH_SECRET
    vault_ref: authservice-app-secret
    mount_strategy: env

tags:
  env: staging
  edition: EU
  provisioned_by: cloud-provisioner-agent

✅ Required Fields¶

Field	Purpose
`trace_id`, `execution_id`, `blueprint_id`	Ensures full traceability
`component`, `environment`, `region`	Determines scope and naming
`infra_plan`	Describes what resources to provision
`secrets[]`	Vault references and mount strategy
`tags`	Mandatory for all provisioned resources
`auto_proceed`	Allows provisioning without human approval after preview

🧪 Example JSON (API-Compatible Format)¶

{
  "trace_id": "trace-invoice-500",
  "execution_id": "exec-invoice-500-b",
  "component": "InvoiceService",
  "blueprint_id": "invoice-platform-v1",
  "environment": "production",
  "region": "francecentral",
  "cloud_provider": "azure",
  "auto_proceed": false,
  "infra_plan": {
    "dns": {
      "fqdn": "invoices.connectsoft.io"
    },
    "keyvault": {
      "policy": "rbac"
    },
    "blob_storage": {
      "tier": "hot"
    }
  },
  "secrets": [
    {
      "name": "INVOICE_SECRET",
      "vault_ref": "invoice-token",
      "mount_strategy": "vault-agent-sidecar"
    }
  ],
  "tags": {
    "env": "production",
    "trace_id": "trace-invoice-500"
  }
}

🔐 Input Validation Checklist¶

Field	Required?	Notes
`trace_id`	✅	Mandatory — blocks execution if missing
`infra_plan`	✅	At least one resource must be specified
`secrets.vault_ref`	✅	Must match known vault alias or key
`region`	✅	Must be present in `cloud-region-map.yaml`
`resource_prefix`	🟡	Auto-resolved if not present
`auto_proceed`	🟡	Defaults to `false` (manual approval)

📦 Optional Extended Fields¶

Field	Description
`dns_map_override`	Override default DNS entries or inject zone hints
`manual_approval_required`	Forces wait state after preview
`cost_estimate_required`	Requests cost projection before execution
`fallback_region`	Allows automatic retry if region unavailable

🧠 Summary¶

The Input Prompt Template enables the Cloud Provisioner Agent to:

📦 Receive infrastructure plans in a structured, deterministic way
🌍 Provision based on region, environment, and component scope
🔐 Respect secret policies and vault constraints
📊 Emit traceable outputs and telemetry linked to blueprint lineage

It guarantees that every provisioning run starts with a complete, auditable definition — with no guessing, and full policy alignment.

📤 Output Expectations¶

Every provisioning operation produces:

🧱 Infrastructure-as-Code artifacts (Pulumi stack files)
📄 Cloud resource output maps
📊 Provisioning logs
📡 Telemetry spans and events
🧩 Secrets binding metadata
💾 Snapshot files for GitOps and audit

These outputs are fully traceable to their blueprint and environment scope.

✅ Output Artifacts¶

1️⃣ Pulumi Stack Files¶

File	Purpose
`Pulumi.yaml`	Defines the project and Pulumi runtime
`Pulumi.{stack}.yaml`	Contains config values, tags, and stack settings
`index.ts` or `main.cs`	Infrastructure logic (calls to Pulumi SDK)
`Pulumi.lock.yaml`	Pin versions of dependencies
`outputs.json`	Key URIs, resource IDs, endpoints, DNS, storage URLs

All files are saved in:

/infrastructure/stacks/{component}/{env}/{region}/

2️⃣ Provisioning Log¶

[pulumi] Preview succeeded: 4 to create, 0 to change, 0 to delete
[pulumi] Updated resources:
 - azure-native:containerservice:ManagedCluster
 - azure-native:storage:BlobContainer
...

Saved as provisioning.log for debugging and audit purposes.

3️⃣ Resource Output Map (Structured JSON)¶

{
  "aks_cluster_name": "aks-stg-auth",
  "kubeconfig_vault_key": "vault://auth-kubeconfig",
  "dns_record": "auth.stg.connectsoft.io",
  "vault_uri": "https://vault-auth-stg.vault.azure.net",
  "blob_url": "https://cs-stg-auth.blob.core.windows.net"
}

Used by:

DevOps Engineer Agent
Observability Agent
API Gateway Agent (DNS setup)

4️⃣ Secrets Metadata Output¶

{
  "secrets": [
    {
      "name": "AUTH_SECRET",
      "vault_uri": "https://vault-auth-stg.vault.azure.net/",
      "mount_strategy": "env"
    }
  ]
}

Sent to:

DevOps pipelines (for runtime binding)
QA/Test Agent (if permitted)

5️⃣ Provisioning Events¶

✅ `ResourcesProvisioned`¶

{
  "event": "ResourcesProvisioned",
  "trace_id": "trace-auth-789",
  "component": "AuthService",
  "stack": "staging-westeurope",
  "resource_count": 5,
  "region": "westeurope",
  "status": "success",
  "timestamp": "2025-05-08T12:35:42Z"
}

Sent to Orchestration, Audit Log, DevOps Agent, and dashboards.

❌ `ProvisioningFailed`¶

{
  "event": "ProvisioningFailed",
  "reason": "AKS quota exceeded in region",
  "trace_id": "trace-auth-789",
  "stack": "staging-westeurope"
}

6️⃣ Trace & Audit Metadata¶

Included in all outputs:

trace_id: trace-auth-789
execution_id: exec-auth-789
blueprint_id: blueprint-auth-multi
agent_origin: cloud-provisioner-agent
region: westeurope
environment: staging

7️⃣ Optional Outputs¶

Output	When Produced
`rollback-plan.json`	If provisioning partially succeeds
`dns-map.yaml`	If multiple DNS records are generated
`manual-approval.yaml`	If agent requires confirmation before deploy
`stack-diff.json`	When provisioning results deviate from preview or prior state

📂 GitOps-Compatible File Structure¶

/infrastructure/stacks/
  └── authservice/
      └── staging/
          └── westeurope/
              ├── Pulumi.yaml
              ├── Pulumi.staging-westeurope.yaml
              ├── index.ts
              ├── outputs.json
              ├── provisioning.log
              ├── secret-bindings.json
              └── provisioning-event.json

📊 Telemetry Expectations¶

Emits OpenTelemetry spans:

Span Name	Trigger
`cloud.provision.start`	When agent begins provisioning
`cloud.provision.success`	After successful `pulumi up`
`cloud.provision.failed`	On failure, with reason + trace_id
`cloud.provision.preview`	When preview is executed (even if not applied)

🧠 Summary¶

The Cloud Provisioner Agent produces:

📦 Pulumi IaC artifacts (stack, project, config)
🌍 Cloud resource metadata (URIs, IDs, secrets)
📡 Events and telemetry for orchestration and observability
📁 Audit-ready logs and GitOps-compatible outputs
🔐 Secrets bindings for runtime injection

All outputs are trace-labeled, versionable, and consumable by downstream agents.

🧠 Memory¶

Memory enables the Cloud Provisioner Agent to:

📎 Maintain links between blueprints, environments, and actual provisioned resources
🔁 Avoid unnecessary reprovisioning (idempotency and state caching)
🔍 Detect drift or stack differences during preview
📊 Retain secrets metadata, resource tags, output maps, and history
⏮️ Enable re-entrancy in partially failed operations or rollback scenarios

🧠 Short-Term Memory (Execution Scope)¶

Stored in Semantic Kernel context dictionary or in ephemeral runtime cache.

Key	Purpose
`trace_id`, `execution_id`	Carries traceability across steps
`resolved_stack_name`	Used to link Pulumi CLI actions to current operation
`template_plan`	Set of resolved Pulumi templates for the infra plan
`rendered_files`	In-memory representation of rendered files before disk write
`secrets_map`	Current secrets-to-vault mount plan
`resource_counts`	Expected # of resources before/after preview

Cleared after each execution cycle unless retained in diagnostic mode.

💾 Long-Term Memory (Persistent)¶

Stored in Blob Storage, Azure Cosmos DB, or Git, depending on environment and tenant.

1️⃣ Stack History¶

{
  "trace_id": "trace-auth-789",
  "component": "AuthService",
  "environment": "staging",
  "region": "westeurope",
  "stack": "staging-westeurope",
  "last_modified": "2025-05-08T12:00:00Z",
  "resource_count": 6,
  "outputs_hash": "9ad7f3c1..."
}

Used to:

Skip unchanged re-renders
Compare preview diffs with previous state
Enable safe re-runs of pulumi up

2️⃣ Output Map History¶

{
  "stack": "authservice-staging-weu",
  "outputs": {
    "aks": "aks-stg-auth-01",
    "dns": "auth.stg.connectsoft.io"
  },
  "revision": 3
}

Retained for:

Audit logging
DevOps injection into pipelines
Rollback targeting

3️⃣ Secrets Injection History¶

{
  "vault_uri": "https://vault-auth-stg.vault.azure.net/",
  "secret_refs": ["AUTH_SECRET", "KUBECONFIG"],
  "mount_strategy": "env"
}

Used to:

Detect missing or updated secrets
Enforce policy compliance
Recreate mount strategies during retry or promotion

4️⃣ Provisioning Logs + Snapshots¶

File	Purpose
`provisioning.log`	Stored with timestamp and trace reference
`outputs.json`	Full resource outputs
`stack-diff.json`	Diff from last run, for drift detection

🗂️ Retention Strategy¶

Memory Type	Retention Duration
Stack outputs & logs	90 days (rotated monthly)
Secrets metadata	30–90 days, depending on policy sensitivity
Provisioning diffs	60 days minimum for audit
Successful `ResourcesProvisioned` events	Archived indefinitely in trace store

🔐 Access Control¶

Write access only by Cloud Provisioner Agent
Read access by:
DevOps Agent (for output URI injection)
Observability Agent (for infra monitoring)
HumanOpsAgent (during rollback or review)

All memory objects include:

agent_origin: cloud-provisioner-agent
trace_id: trace-*
execution_id: exec-*

🔁 Replay Support (Future)¶

Memory structure enables:

Provisioning replay with previous parameters
Promotion-aware copy-to-environment (e.g., staging → prod)
Drift-aware re-run with preview diff and policy check

🧠 Summary¶

The Cloud Provisioner Agent’s memory system supports:

🔁 Safe, idempotent infrastructure provisioning
📎 Trace-aware state retention for audits
🔍 Preview diff validation and rollback planning
📦 Secret history and vault injection tracking

This memory architecture makes cloud provisioning reproducible, compliant, and auditable by design.

🎯 Validation¶

Before any infrastructure is provisioned, the Cloud Provisioner Agent performs multi-layer validation to ensure:

🧱 Pulumi stack correctness
🔐 Secrets integrity and policy compliance
🧭 Region constraints and resource quota awareness
📛 Proper naming, tagging, and trace metadata
🛰️ Safe preview of changes with user or orchestrator confirmation

Validation protects cloud environments from drift, misconfiguration, overprovisioning, and policy violations.

✅ Validation Stages¶

1️⃣ Stack Structure Validation¶

Check	Tool/Logic
Pulumi file completeness	All required files: `Pulumi.yaml`, stack file, logic file
YAML schema validity	Linter or custom validator (YAML + JSON schemas)
Supported resource types	Only whitelisted modules and cloud providers

2️⃣ Naming & Tagging Enforcement¶

Check	Enforcement Rule
Resource names	Must conform to `naming.yaml` pattern (e.g., `{prefix}-{env}-{region}-{component}`)
Max length	Enforced per Azure/GCP/AWS limits (e.g., AKS cluster ≤ 63 chars)
Required tags	Must include `trace_id`, `blueprint_id`, `environment`, `agent_origin`
Forbidden patterns	No capital letters, underscores, or reserved suffixes

3️⃣ Secrets Validation¶

Check	Enforcement
`vault_ref` exists	Must map to declared vault secret in overlay or policy
`mount_strategy` valid	Must be `env`, `volume`, or `vault-agent-sidecar`
No plaintext secrets	All secrets must resolve to secure mount or Pulumi config secret
SecretsProvider setup	Pulumi stack must declare a secrets provider if secrets used

4️⃣ Region & Resource Constraints¶

Constraint	Rule
Region availability	Must exist in `cloud-region-map.yaml`
Resource quota (planned)	Soft-check via Azure Resource Graph or Terraform provider
Failover support	`replication-strategy.yaml` must match resource plan (e.g., blob = geo-redundant)

5️⃣ Pulumi Preview Validation¶

Runs:

pulumi preview --stack {stack} --diff

Checks:
Resource count: how many to create/change/delete
Diff output: ensure no drift unless expected
Errors: quota exceeded, invalid config, unknown provider

If preview fails, emit ProvisioningFailed with error snapshot and halt.

6️⃣ Trace Context Validation¶

Requirement	Behavior
`trace_id` missing	❌ Block execution and emit validation error
`execution_id` missing	❌ Halt — required for observability and log correlation
`blueprint_id` missing	❌ Required for memory + audit linkage
`component` undefined	🟡 Warn — default to scope if trace metadata implies one

📄 Example Validation Error (emitted as JSON)¶

{
  "event": "ProvisioningFailed",
  "trace_id": "trace-auth-789",
  "stack": "staging-westeurope",
  "reason": "Missing secretsProvider in Pulumi config",
  "severity": "high",
  "stage": "validation",
  "timestamp": "2025-05-08T12:48:01Z"
}

🛠️ Validation Summary Table¶

Category	Must Pass	Tool Used
File completeness	✅	Internal check
YAML/TS syntax	✅	`yamllint`, `tslint`, or schema validation
Stack preview pass	✅	`pulumi preview`
Tags present	✅	Tag policy engine
Vault mappings	✅	`SecretsInjectionSkill`
Naming constraints	✅	`NamingResolverSkill`

🧠 Summary¶

The Cloud Provisioner Agent's validation system guarantees:

🧱 Well-formed IaC before deployment
🔐 Secure, compliant use of secrets
📛 Proper tagging and naming across cloud resources
📡 Preview visibility and error transparency before execution
🧾 Full trace and audit linkage to each provisioning action

Validation is the final gate before cloud infrastructure is deployed — ensuring that ConnectSoft remains secure, predictable, and compliant.

🔁 Retry & Correction Flow¶

Provisioning cloud infrastructure is inherently error-prone due to:

🛰️ Cloud-side quota errors or API delays
🔐 Vault/secret misconfigurations
🧱 YAML or resource plan issues
📦 Conflicting or drifted resource states

The Cloud Provisioner Agent must fail safely, retry deterministically, and never provision partial or insecure infrastructure.

✅ Retryable Error Categories¶

Error Type	Action
Pulumi CLI transient failure	Retry up to 3x with exponential backoff
Azure API throttling (429)	Backoff and retry within cooldown window
DNS resolution delay (e.g., after zone creation)	Wait, re-query, retry binding
Vault unavailability	Retry after delay (up to policy-defined max attempts)
Stack lock present	Wait, reattempt `pulumi up` or notify coordinator

❌ Non-Retryable / Escalation Errors¶

Error	Action
Invalid `infra_plan`	Abort with `ProvisioningFailed` (schema or structure error)
Missing `trace_id` or `blueprint_id`	Hard stop — validation failure
Secrets mount strategy unknown	Fail with clear error, await manual fix
Quota exceeded (non-transient)	Emit error, suggest fallback region, notify orchestrator

🧪 Auto-Correction Strategies (Safe Fallbacks)¶

Condition	Correction
Missing `resource_prefix`	Derive from `component` + `env` + `region`
Undeclared tags	Auto-inject required tags (`trace_id`, `env`, etc.)
Missing fallback region	Use `secondary_region` from `cloud-region-map.yaml`
K8s version not provided	Use default LTS version for environment class
Empty DNS zone list	Suggest default zone or infer from blueprint + region

All auto-corrections are tagged in log and span metadata for audit traceability.

🔁 Retry Flow Logic¶

flowchart TD
    A[Start Provisioning] --> B[Run Validation + Preview]
    B --> C{Preview OK?}
    C -- No --> D{Retryable?}
    D -- Yes --> B
    D -- No --> E[Emit ProvisioningFailed]
    C -- Yes --> F[Run pulumi up]
    F --> G{Success?}
    G -- No --> D
    G -- Yes --> H[Emit ResourcesProvisioned]

Hold "Alt" / "Option" to enable pan & zoom

📘 Retry Policy¶

retry:
  max_attempts: 3
  retry_interval_sec: 15
  backoff_strategy: exponential
  retryable_errors:
    - azure_api_throttle
    - stack_lock
    - vault_timeout
    - dns_unavailable

🧩 Sample Correction Log (JSON)¶

{
  "trace_id": "trace-invoice-501",
  "action": "AutoCorrectedMissingPrefix",
  "field": "resource_prefix",
  "value_applied": "cs-prod-frc-invoice",
  "retry_attempt": 1,
  "status": "provisioning_resumed"
}

🔔 Human Escalation Triggers¶

Scenario	Action
`ProvisioningFailed` after 3 retries	Notify `HumanOpsAgent` and halt
Secrets mismatch or conflict	Raise to `Security Engineer Agent`
Stack exists but differs significantly	Emit `ProvisioningDriftDetected` (future)
Region unavailable	Emit `RegionBlocked`, suggest `fallback_region` to coordinator

✅ Safe Idempotency Rules¶

No resource is provisioned twice under the same trace_id + stack_name
pulumi preview must match pulumi up unless override approved
Stack diffs retained for drift comparison (hash-based memory key)

📡 Telemetry During Retry¶

All retries emit spans:

cloud.provision.retry.start
cloud.provision.retry.success
cloud.provision.retry.failed

Each includes retry_attempt, reason, and agent_origin.

🧠 Summary¶

The Cloud Provisioner Agent’s retry and correction flow ensures:

🔁 Safe auto-recovery from transient issues
🛑 Strict boundaries for policy-violating inputs
🧠 Intelligent corrections and default injection
📎 Full traceability and telemetry per retry step
👤 Escalation hooks when human input is required

This makes the provisioning lifecycle resilient, safe, and fully observable — critical for infrastructure integrity at scale.

🔗 Collaboration Interfaces¶

The Cloud Provisioner Agent acts as a mid-pipeline executor within ConnectSoft’s orchestrated cloud lifecycle. It does not operate in isolation — it:

🧱 Implements plans from architectural agents
🔐 Integrates security overlays (vaults, secrets)
📡 Emits outputs to DevOps, Observability, and Coordination layers
🛰️ Triggers downstream agents once infrastructure is ready

🤝 Directly Collaborating Agents¶

Agent	Purpose
Cloud Architect Agent	Supplies cloud region maps, replication strategy, zone constraints
Infrastructure Architect Agent	Provides `infra_plan`, overlays, and environment resource models
Security Engineer Agent	Delivers secrets, vault references, RBAC overlays, and mount strategies
IaCCoordinator (or similar orchestrator)	Triggers agent, monitors result, receives `ResourcesProvisioned`
DevOps Engineer Agent	Consumes provisioned outputs (e.g., URIs, secrets, cluster names) for pipeline generation
HumanOps Agent	Reviews provisioning failures, applies overrides or approves previews
Observability Agent	Receives telemetry and output mappings for environment registration
API Gateway Agent (optional)	Consumes `dns_record` outputs for subdomain registration and gateway bindings

🔁 Event-Based Collaboration¶

Emitted Events¶

Event	Consumed By
`ResourcesProvisioned`	DevOps Agent, Observability Agent, Orchestrator
`ProvisioningFailed`	HumanOpsAgent, Orchestrator
`ProvisioningPreviewReady` (if manual approval required)	HumanOpsAgent
`ProvisioningDriftDetected` (future)	Orchestrator, Audit Agent

🔀 Input Dependencies¶

From Cloud Architect Agent¶

cloud-region-map.yaml:
  primary: westeurope
  secondary: northeurope
  zones: [1, 2]

From Security Engineer Agent¶

secrets:
  - name: AUTH_SECRET
    vault_ref: authservice-app-secret
    mount_strategy: env

From Infrastructure Architect Agent¶

infra_plan:
  aks:
    node_count: 3
  dns:
    fqdn: auth.europe.connectsoft.io

🔄 Output Recipients¶

→ DevOps Engineer Agent¶

aks_cluster_name
kubeconfig_vault_ref
vault_uri
storage_endpoint
Secrets mount bindings (JSON)

→ Observability Agent¶

Cluster URI
Log Analytics / Monitoring endpoints
DNS + region tags for telemetry routing

→ IaCCoordinator¶

ResourcesProvisioned event
stack_path
status: success | failed
Deployment duration + span metadata

🧭 Collaboration Flow Diagram¶

flowchart TD
    A[Cloud Architect Agent]
    B[Security Engineer Agent]
    C[Infrastructure Architect Agent]
    D[IaCCoordinator]
    E[Cloud Provisioner Agent]
    F[DevOps Engineer Agent]
    G[Observability Agent]

    A --> E
    B --> E
    C --> E
    D --> E
    E --> F
    E --> G
    E --> D

Hold "Alt" / "Option" to enable pan & zoom

💬 Interface Protocols¶

Interface	Mode
Agent-to-Agent	File-based overlays or in-memory via orchestrator
Events	JSON payload over orchestrator event bus or webhook
Output Sharing	Git commit / Blob upload + event pointer
Secrets	Injected via secure vault overlay, never hardcoded

🧠 Summary¶

The Cloud Provisioner Agent collaborates with:

☁️ Architects to receive plan and topology
🔐 Security to embed compliant secret flows
🧪 DevOps & QA agents to inject cloud runtime data
🛰️ Orchestration to emit status and trigger downstream automation
📊 Observability to register environments, regions, and telemetry contexts

It serves as the bridge between blueprint planning and actual cloud activation — in a highly modular, secure, event-driven way.

📡 Observability Hooks¶

The Cloud Provisioner Agent is a critical executor in the ConnectSoft platform. It must:

📊 Emit real-time provisioning status
🧾 Enable audit trail for infrastructure changes
🛰️ Feed orchestration and dashboard systems
🧠 Record metadata for security and compliance

🧭 OpenTelemetry Spans (Mandatory)¶

✅ Emitted Spans¶

Span Name	Description
`cloud.provision.start`	When the provisioning run begins
`cloud.provision.preview`	When `pulumi preview` is executed
`cloud.provision.up`	When actual resource provisioning starts
`cloud.provision.success`	Emitted when provisioning is complete
`cloud.provision.failed`	Emitted if any stage fails or aborts

📌 Span Tags¶

trace_id: trace-auth-789
execution_id: exec-auth-789
agent: cloud-provisioner-agent
stack: staging-westeurope
component: AuthService
environment: staging
region: westeurope
status: success | failed | skipped | drifted
resource_count: 6

📘 Structured Logs¶

Logged and optionally forwarded to:

Azure Monitor
Loki
Centralized audit blob

Example JSON Log¶

{
  "timestamp": "2025-05-08T12:59:20Z",
  "trace_id": "trace-auth-789",
  "agent": "cloud-provisioner-agent",
  "component": "AuthService",
  "event": "ProvisioningStarted",
  "stack": "staging-westeurope",
  "resource_plan": ["AKS", "KeyVault", "BlobStorage"]
}

📊 Metrics for Dashboards¶

Metric	Description
`provision_duration_ms`	Total time from `preview` to `success/fail`
`provision_retry_count`	Retries per run
`provision_success_rate`	Rolling success percentage by region/environment
`resources_provisioned_total`	Number of resources successfully deployed
`stack_drift_detected_total`	(Future) Detected mismatches during preview

📣 Lifecycle Events¶

✅ `ResourcesProvisioned`¶

{
  "event": "ResourcesProvisioned",
  "trace_id": "trace-auth-789",
  "stack": "staging-westeurope",
  "resource_count": 6,
  "outputs": ["aks_cluster", "dns", "vault_uri"],
  "status": "success"
}

❌ `ProvisioningFailed`¶

{
  "event": "ProvisioningFailed",
  "trace_id": "trace-auth-789",
  "reason": "Vault reference missing",
  "stage": "validation",
  "status": "failed"
}

📂 Output Snapshot for Monitoring¶

File	Description
`provisioning.log`	CLI + internal validation results
`outputs.json`	Contains URIs, IDs, and cloud handles
`stack-diff.json` (optional)	Preview vs. previous plan (for drift detection)

Snapshots are versioned per run and tagged with trace_id.

📈 Grafana Dashboard Modules (Example)¶

Provisioning summary by environment
Error rate by region
Time to provision per component
Daily stack count and status
Retry frequency trends

🧩 Integration Points¶

Target	Hook
Observability Agent	Receives events, spans, outputs, and metrics
HumanOps Agent	Subscribed to failure + preview-only events
Audit Layer	Reads provisioning logs and output hash
Orchestrator	Correlates execution result to coordination FSM or pipeline flow

🔐 Compliance Metadata¶

All observability outputs must include:

agent_origin: cloud-provisioner-agent
trace_id: required
execution_id: required
environment: required
provisioning_type: automated

🧠 Summary¶

The Cloud Provisioner Agent’s observability hooks provide:

🛰️ Full lifecycle visibility from plan to provisioning
🧾 Structured logs and spans for real-time and historical audit
📊 Dashboard-friendly metrics for success/failure trends
📡 Event-based triggers for downstream automation and human review

It ensures cloud provisioning is traceable, secure, transparent, and analytics-ready — across all environments and tenants.

🎯 Human Intervention Hooks¶

While the Cloud Provisioner Agent is designed to operate autonomously, certain scenarios require manual oversight, including:

🔐 Security-sensitive resources
🌍 Production or multi-region environments
🛑 Preview failures or unexpected diffs
💸 High-cost provisioning operations
🧾 Compliance-driven approval checkpoints

These hooks ensure safe intervention, while preserving traceability and audit logs.

✅ Intervention Scenarios¶

Scenario	Action Required
`auto_proceed = false` in input	Manual approval required after preview
Stack preview includes high-impact changes	Review and confirmation
Vault reference missing	Requires Security Engineer or HumanOps override
Region blocked or quota exceeded	Manual reassignment or delay
Retry limit reached	Escalate to HumanOpsAgent
Explicit `manual_approval_required: true` in blueprint	Always paused for approval

👤 Supported Human Actions¶

Action	Interface
Approve provisioning	Orchestrator UI, CLI (`approve-stack --trace`)
Reject provisioning	Same as above — emits `ProvisioningRejected` (planned)
Apply override to vault mount	Through HumanOpsAgent or Vault UI
Retry manually with override	CLI or UI-based re-invocation with override flag
Review preview and stack diff	Presented via dashboard or audit UI

📝 Approval Gate Representation (YAML)¶

manual_approval:
  required: true
  approver_group: PlatformOps
  reason: "New AKS cluster in production"
  contact: "ops@connectsoft.io"

Used in environments or components flagged as high-impact or sensitive.

📣 Event-Driven Escalation¶

When approval is required:

{
  "event": "ProvisioningPreviewReady",
  "trace_id": "trace-auth-789",
  "stack": "staging-westeurope",
  "resource_plan": ["AKS", "KeyVault", "Blob"],
  "preview_diff": "3 create, 1 replace",
  "status": "awaiting_approval"
}

Received by:

HumanOps Agent
Orchestrator Dashboard
Notifications Bot (Teams, Slack, Email)

💬 UI Elements & CLI Hooks¶

Interface	Feature
Dashboard UI	Approve / Reject button, preview viewer, retry
CLI	`cs-stack approve --trace trace-789`
Email	Link to preview diff and action buttons
Chatbot (planned)	Inline response to `ProvisioningPreviewReady` event

🧾 Logged Interventions¶

Every human interaction is stored in:

{
  "trace_id": "trace-auth-789",
  "action": "manual_approval",
  "approver": "alice.platformops",
  "reason": "Approved new staging AKS stack",
  "timestamp": "2025-05-08T13:05:21Z"
}

Audited by:

Compliance engine
Observability dashboards
Risk review snapshots

🧠 Summary¶

The Cloud Provisioner Agent includes secure, auditable human intervention hooks to:

👤 Pause for approval when needed
🔐 Escalate policy conflicts (vault, region, secrets)
🔁 Allow override and retry flows
📎 Ensure all human actions are trace-bound and logged

This empowers ConnectSoft teams to balance autonomy with governance, especially in high-sensitivity environments.

✅ Summary¶

The Cloud Provisioner Agent is a core executor in ConnectSoft’s AI-driven software factory. It turns cloud architectural intent into real, secured, traceable cloud infrastructure, delivering environments ready for CI/CD, observability, and production-grade workloads.

🎯 Core Functions¶

📦 Render Pulumi stack files from orchestrated infrastructure plans
☁️ Provision Azure cloud resources (AKS, Key Vault, DNS, Blob, etc.)
🔐 Inject and map secrets securely across environments
🧾 Emit outputs including URIs, stack metadata, and provisioning logs
📡 Emit telemetry, events, and spans for full traceability
👤 Support human approval, intervention, and retry workflows

🧭 Supported Resources (Phase 1 - Azure)¶

Resource	Examples
Compute	AKS Clusters
Storage	Azure Blob
Secrets	Azure Key Vault
DNS	Azure DNS zones and records
Monitoring	Azure Monitor / Log Analytics
Identity (future)	App Registrations, Managed Identity

📚 Input Summary¶

trace_id, execution_id, blueprint_id
infra_plan.yaml
cloud-region-map.yaml
replication-strategy.yaml
secrets-metadata.yaml
environment overlays

📤 Output Summary¶

Pulumi project and stack files
outputs.json, secret-bindings.json, provisioning.log
Event: ResourcesProvisioned or ProvisioningFailed
OpenTelemetry spans
GitOps-compatible folder structure

🧠 Integration Summary¶

Collaborator	Purpose
Cloud Architect Agent	Region, replication, topology plans
Infrastructure Architect Agent	Component-level infra plan
Security Engineer Agent	Vaults, RBAC, mount strategy
IaCCoordinator	Trigger and monitor execution
DevOps Engineer Agent	Uses emitted URIs and secrets
HumanOps Agent	Approves or overrides sensitive actions
Observability Agent	Ingests infra metadata for monitoring dashboards

📈 Execution Flow Diagram¶

flowchart TD

  subgraph Orchestration Layer
    A[IaCCoordinator]
  end

  subgraph Architecture Inputs
    B[Cloud Architect Agent]
    C[Infrastructure Architect Agent]
    D[Security Engineer Agent]
  end

  subgraph Agent
    E[Cloud Provisioner Agent]
  end

  subgraph Outputs
    F[Pulumi Stack Files]
    G[Provisioning Events]
    H[Output Metadata]
    I[Provisioning Logs]
  end

  subgraph Downstream Consumers
    J[DevOps Engineer Agent]
    K[Observability Agent]
    L[HumanOps Agent]
  end

  A --> E
  B --> E
  C --> E
  D --> E

  E --> F
  E --> G
  E --> H
  E --> I

  G --> J
  H --> J
  G --> K
  G --> L

Hold "Alt" / "Option" to enable pan & zoom

🧾 Final Takeaways¶

The Cloud Provisioner Agent enables:

🔁 Idempotent, versioned infrastructure provisioning
☁️ Region- and tenant-aware environment setup
🔐 Secure, policy-compliant secrets injection
📡 Audit-friendly logs, metrics, and spans
👤 Safe human oversight and governance

It is a cornerstone agent in ConnectSoft’s DevOps and infrastructure layer — ensuring the platform can scale autonomously, securely, and observably across cloud environments.

🧠 Cloud Provisioner Agent Specification¶

☁️ Core Purpose¶

🛠️ Role in the Platform¶

🌍 Phase Scope (Azure-First)¶

📌 Strategic Alignment with Cloud Architect Agent¶

🔗 Execution Trigger (Orchestrated)¶

🧭 Platform Flow Placement¶

🧩 Example Scenario¶

✅ Summary¶

📌 Core Responsibilities Overview¶

🔧 Detailed Responsibilities¶

✅ 1. Pulumi Stack Generation¶

✅ 2. Infrastructure Provisioning¶

✅ 3. Multi-Region & Zonal Strategy Execution¶

✅ 4. Secrets and Vault Binding¶

✅ 5. Output & Endpoint Metadata Generation¶

✅ 6. Provisioning Validation & Status Emission¶

✅ 7. Traceability & Versioning¶

✅ 8. Collaborative Feedback Loop¶

📦 Summary of Primary Deliverables¶

✅ Summary¶

📥 Inputs¶

🔑 Primary Inputs¶

1️⃣ cloud-region-map.yaml¶

2️⃣ replication-strategy.yaml¶

3️⃣ environment-overlay.yaml¶

4️⃣ component-scope.yaml¶

5️⃣ infra-plan.yaml¶

6️⃣ secrets-metadata.yaml¶

7️⃣ resource-constraints.yaml (Optional)¶

🧠 Internal Contextual Inputs (Resolved by Agent or Environment)¶

🧪 Example Consolidated Input Context¶

📎 Input Validation Checklist¶

🧠 Summary¶

📤 Output¶

✅ Primary Output Artifacts¶

1️⃣ Pulumi Stack Files¶

2️⃣ Cloud Resource Output Map¶

3️⃣ Secrets Metadata Output¶

4️⃣ Deployment Event¶

ResourcesProvisioned¶

5️⃣ Pulumi Output Metadata (YAML / JSON)¶

📦 Optional Outputs (Based on Context)¶

🧠 Metadata Embedded in All Outputs¶

📂 File Structure Convention (GitOps-Compatible)¶

✅ Summary¶

📚 Knowledge Base¶

🧱 Core Knowledge Categories¶

1️⃣ Pulumi Template Modules¶

2️⃣ Naming Convention Rules¶

3️⃣ Environment & Region Topology Definitions¶

4️⃣ Cloud Resource Classifiers¶

5️⃣ Secrets Mounting Strategies¶

6️⃣ Tagging Policy Rules¶

7️⃣ Common Output Resolvers¶

8️⃣ Blueprint Resource Map Inference¶

📦 Knowledge Source Locations¶

🧠 Summary¶

🔄 Process Flow¶

🪜 Detailed Step Breakdown¶

✅ Step 1: Receive Execution Context¶

✅ Step 2: Load Knowledge & Merge Overlays¶

✅ Step 3: Template Rendering¶

✅ Step 4: Preview Mode (Validation)¶

✅ Step 5: Execution (Provisioning)¶

✅ Step 6: Post-Provisioning Output¶

✅ Step 7: Emit Artifacts¶

🧠 Embedded Trace Metadata in All Steps¶

🔁 Alternate Flows¶

🧠 Summary¶

🧩 Skills and Kernel Functions¶

🔧 Core Semantic Kernel Skills¶

🧠 AI-Augmented Kernel Functions¶

🔁 Execution Plan Sample (SK-Compatible)¶

🔐 Policy + Secret Skill Integrations¶

📦 Skill Library Versions & Strategy¶

🧠 Summary¶

🧰 Core Technology Stack¶

☁️ Pulumi Configuration Standards¶

🌐 Azure Services Targeted¶

1️⃣ `cloud-region-map.yaml`¶

2️⃣ `replication-strategy.yaml`¶

3️⃣ `environment-overlay.yaml`¶

4️⃣ `component-scope.yaml`¶

5️⃣ `infra-plan.yaml`¶

6️⃣ `secrets-metadata.yaml`¶

7️⃣ `resource-constraints.yaml` (Optional)¶

`ResourcesProvisioned`¶

✅ `ResourcesProvisioned`¶

❌ `ProvisioningFailed`¶