Infrastructure Architect Agent Specification¶

🎯 Purpose¶

The Infrastructure Architect Agent is responsible for designing and generating the foundational cloud infrastructure required to operate the ConnectSoft AI Software Factory at scale — using multi-cloud, tenant-aware, IaC-first principles.

Its mission is to translate high-level architectural intent (compute needs, data isolation, security boundaries, etc.) into declarative, reproducible infrastructure modules, using tools like Bicep, Terraform, and Pulumi (C#).

💡 Why This Agent Matters¶

Without this agent:

Infrastructure setup would be manual, inconsistent, and error-prone
Multi-tenant environments would suffer from cross-tenant leakage or drift
Developers would waste time managing infrastructure instead of building logic
Observability and identity boundaries wouldn’t be enforced from the ground up
Services couldn’t scale or operate securely across cloud regions/providers

With the Infrastructure Architect Agent:

✅ All infrastructure is code-generated, version-controlled, and composable
✅ Teams get ready-to-deploy, secure environments with no human provisioning
✅ Each tenant and microservice has proper isolation, cost tagging, and access boundaries
✅ Infrastructure definitions are portable across Azure, AWS, GCP, or hybrid setups
✅ Provisioning supports Bicep, Terraform, and now Pulumi (C#) — enabling .NET-native IaC

🧱 What This Agent Enables¶

Capability	Impact
🔌 Cloud-native networking	VNet, subnets, NAT, firewalls, private DNS
🏗️ Compute orchestration	AKS/EKS clusters, node pools, autoscaling
🔐 Identity & access	Managed Identity, IAM roles, service principals
💾 Storage provisioning	Azure Blob, S3, databases, backup vaults
🌍 Environment scaffolding	Per-tenant or per-stage logical groupings
💬 Policy-as-code	RBAC, NSGs, audit logging, telemetry agents
♻️ Reusability & portability	Bicep + Terraform + Pulumi (C#) bundles

🧭 Position in the Factory Pipeline¶

flowchart TD
    SolutionArchitect --> InfrastructureArchitect
    InfrastructureArchitect --> DevOpsArchitect
    InfrastructureArchitect --> CloudArchitectureAgent
    InfrastructureArchitect --> SecurityArchitect
    InfrastructureArchitect --> ObservabilityAgent

Hold "Alt" / "Option" to enable pan & zoom

🧠 Reusable Infrastructure Scenarios¶

Scenario	Provisioned By Agent
New microservice with DB + Key Vault	✅ VNet, AKS node pool, Azure SQL, KV
Staging environment per tenant	✅ Subnet, node pool, traffic filtering
Multi-cloud deployment (Azure + AWS)	✅ Terraform + Pulumi dual-mode templates
Cluster auto-scaling with cost budget tags	✅ Autoscaler + tagging in IaC

📈 Vision Alignment¶

This agent upholds ConnectSoft’s values:

Cloud-Native
Security-First
Event-Driven & Observable
Multi-Tenant Ready
Automated, Reproducible, and Composable

📡 Scope of Influence¶

The Infrastructure Architect Agent governs the infrastructure control plane across all environments and tenants.
Its outputs shape the physical, logical, and virtual boundaries for every microservice, data flow, identity relationship, and cloud resource used by the ConnectSoft AI Software Factory.

This scope includes multi-region, multi-cloud, and multi-tenant infrastructure definitions.

🔧 System Layers Controlled by the Agent¶

Layer	Components Controlled
Networking	VNet, Subnet, Private DNS, Peering, NAT, NSGs, Public IP, Private Endpoints
Compute	AKS/EKS/GKE clusters, node pools, autoscaling groups, container runtimes
Storage	Azure Blob, S3 Buckets, PostgreSQL/MySQL, File Shares, Disks, Backups
Identity & Access	Managed Identity, IAM Roles, RBAC bindings, service principal outputs
Secrets & Key Management	Azure Key Vault, AWS KMS, GCP Secrets, integration paths
Environment Isolation	Per-tenant staging spaces, dedicated compute/storage profiles
Provisioning Mode	Bicep, Terraform, and Pulumi (C#) IaC backends for every artifact
Observability Layer	Log routing (Log Analytics, CloudWatch), OpenTelemetry endpoints
DNS & Traffic Control	Zones, Ingress Controllers, Route Tables, Load Balancers

🌍 Target Environments Provisioned¶

Environment	Resources Created
Development	Shared AKS pool, dev-only DNS, reduced autoscaling
Staging	Cluster with cost tags, tracing on, mTLS across microservices
Production	Hardened AKS/EKS, regional failover, Key Vault with HSM
Per-Tenant	Optional isolated subnet, DNS, key ring, secret namespace
Hybrid (Local + Cloud)	Self-hosted runner + cloud ingress, VPN/peering

📦 Agent Output Mode Coverage¶

Output Type	Description
Bicep	Azure-native IaC with parameterized modules
Terraform	Multi-cloud abstraction supporting Azure, AWS, GCP
Pulumi (C#)	.NET-native, developer-friendly infrastructure declarations
Mermaid Topology Diagram	Visual VNet/cluster/service layout
Provisioning Events	Emits `InfraProvisioningPlanPublished` and optional diff events

🧠 Scope Summary Diagram¶

graph TD
    A[VNet/Subnet/DNS]
    B[AKS Node Pools]
    C[Key Vault + IAM]
    D[Storage Accounts/DBs]
    E[Pulumi + Terraform + Bicep]
    F[OpenTelemetry Collector]
    G[Ingress + Load Balancer]

    A --> E
    B --> E
    C --> E
    D --> E
    F --> E
    G --> E

Hold "Alt" / "Option" to enable pan & zoom

✅ Key Impact Zones¶

Impacted Layer	Agent Enforcement
❗ Cost-aware autoscaling	Node pool budget tags and usage quotas
🔐 Zero Trust networking	NSGs, private links, enforced mTLS
🔑 Least privilege IAM	MSI roles, scoped secrets, tenant RBAC
📦 Modular IaC	Every component pluggable and reusable
♻️ Reusable patterns	Shared modules across tenants, clouds, services
📈 Observability-on-provision	OpenTelemetry spans built into infra creation

📋 Core Responsibilities¶

The Infrastructure Architect Agent is accountable for designing, provisioning, securing, and documenting the infrastructure foundation that powers every microservice, gateway, and subsystem in the ConnectSoft AI Software Factory.

Its responsibilities ensure that infrastructure is:

Declaratively defined
Multi-cloud and multi-tenant ready
Secure by default
Environment-aware
Observability-aligned
Compatible with DevOps and Cloud Architecture agents

🧱 1. Provisioning Blueprint Generation¶

Task	Output
Generate per-environment infrastructure definitions	`infra.bicep`, `main.tf`, `PulumiInfra.cs`
Structure modules for reusability (e.g., VNet, AKS, KV)	Modularized IaC blocks
Define naming conventions, tags, location strategies	Global infra naming policy module
Emit variant manifests (e.g., Dev, Staging, Prod, TenantX)	`infra.{env}.bicep`, `infra.{tenant}.tf`

🔐 2. Identity and Access Layer Definition¶

Task	Output
Define system-assigned or user-assigned Managed Identities	`identity-map.yaml`
Bind IAM roles to service scopes (e.g., read, contributor, custom roles)	IAM policy definitions per cloud
Map identity outputs to agents for later use (DevOps, Security)	Shared output map with `resource_id`, `client_id`, `secret_ref`

🌐 3. Network Topology Planning¶

Task	Output
Define VNet + Subnet per environment or tenant	`network-topology.yaml`
Configure NSGs, DNS, peering, and egress rules	Enforced through `PulumiInfra.cs` or `network.tf`
Control ingress exposure and service mesh compatibility	Mesh annotation support for Istio/Linkerd/Kuma

💾 4. Storage and Data Service Definition¶

Task	Output
Define Blob/S3 buckets, access tiers, replication, encryption	`storage-definition.yaml`
Provision PostgreSQL, SQL, or CosmosDB	Configurable DB-as-a-service modules
Define backup vaults, data retention, TTL settings	Storage compliance blueprint injected from policy

📈 5. Observability and Monitoring Setup¶

Task	Output
Deploy OpenTelemetry Collector as sidecar or managed infra	Optional toggle via policy
Route logs to Azure Monitor, CloudWatch, or custom endpoints	`otel-agent.yaml` output
Enable service health metrics via K8s	Auto-wired probes into deployment specs
Emit `InfraProvisioningSpans` for all provisioning actions	All agent actions are observable

♻️ 6. Reusability and Policy Propagation¶

Task	Output
Reuse core templates across projects	Parameterized Bicep, Pulumi classes, Terraform modules
Integrate organizational naming/tagging/security standards	Tag injection logic with `owner`, `env`, `billing_code`
Emit global `infra-policy-map.yaml` to downstream agents	Shared policy alignment for Security and DevOps Architect Agents

📢 7. Lifecycle & Compliance Hooks¶

Task	Output
Emit `InfraProvisioningPlanPublished` with trace_id and artifact list	Triggers downstream agent coordination
Emit drift detection events if previous state differs	Optional in audit mode
Output `infra-metadata.json` per run	Includes all provisioned resource IDs and traceability fields
Provide rollback plan (`infra-rollback.yaml`) when enabled	Describes deletions, safe resource tear-down steps

✅ Summary¶

Responsibility Area	Delivered By Agent
IaC output (Bicep, Terraform, Pulumi)	✅
Networking & DNS topology	✅
Identity, RBAC, and IAM config	✅
Storage definition and protection	✅
Observability layer pre-wired	✅
Reusability, modularity, compliance	✅
Traceable provisioning lifecycle	✅

📥 Core Inputs¶

The Infrastructure Architect Agent consumes architectural intent and operational metadata from upstream agents and policy sources to generate IaC-compatible infrastructure blueprints.

These inputs drive the agent’s ability to:

Choose the right cloud services
Structure the network and security model
Enforce cost and compliance constraints
Output per-environment or per-tenant variations
Align with platform observability and deployment expectations

📂 Required Input Artifacts¶

Artifact	Source Agent	Purpose
`solution-architecture.md`	Solution Architect Agent	Defines global cloud targets, service composition, zones
`application-architecture.md`	Application Architect Agent	Describes logical services, access groups, service mesh topology
`deployment-strategy.yaml`	DevOps Architect Agent	Determines target environments and staging separation
`identity-policy.yaml`	Security Architect Agent	Defines IAM/RBAC/MSI rules for cloud resources
`resource-configuration.yaml`	Cloud Architecture Agent	Specifies node pool sizes, DNS rules, ingress exposure
`field-retention-map.yaml`	Data Architect Agent	Drives storage encryption and replication flags
`observability-policy.yaml`	Observability Agent	Ensures metrics, logs, and tracing are pre-integrated
`previous-infra-state.json` (optional)	From repo	Enables drift detection and versioned rollback

📘 Sample Input: `resource-configuration.yaml`¶

resources:
  compute:
    provider: Azure
    cluster_type: AKS
    node_pool:
      size: Standard_D4s_v3
      autoscale: true
      max_nodes: 10
  dns:
    zone: "internal.connectsoft.dev"
    private_dns_enabled: true
  storage:
    account_tier: Standard_LRS
    replication: GRS

📘 Sample Input: `identity-policy.yaml`¶

iam:
  enable_managed_identity: true
  role_bindings:
    - role: Contributor
      principal: microservice@connectsoft
    - role: Key Vault Reader
      principal: devops-agent@connectsoft
  tenant_isolation:
    enabled: true
    mode: soft

📘 Sample Input: `observability-policy.yaml`¶

otel:
  collector:
    enabled: true
    type: sidecar
  spans:
    provision_infra: true
    storage_attached: true
  exporters:
    - type: otlp
      endpoint: https://otel.connectsoft.dev

🧩 Input Validation Rules¶

Rule	Description
Must specify at least one compute cluster (AKS, EKS, GKE)	✅
IAM roles must reference valid service principals or MSIs	✅
DNS zone must follow platform naming convention	✅
Missing observability policy → fallback to basic collector + OTLP spans	✅
Storage configuration must match retention + compliance map	✅
Identity policies must align with tenant model (shared/isolated)	✅
If no resource config provided → suggest default AKS dev config	✅

🔐 Input Prompt Snippet Example¶

assignment: provision-infrastructure

project:
  trace_id: trace-infra-44188
  service_name: NotificationService
  environment: Staging
  tenant_id: tenant-42

inputs:
  solution_architecture_url: https://.../solution-architecture.md
  resource_configuration_url: https://.../notification/resource-configuration.yaml
  identity_policy_url: https://.../notification/identity-policy.yaml
  observability_policy_url: https://.../notification/observability.yaml
  previous_infra_state_url: https://.../notification/infra-v1.1.0.json

settings:
  output_format: [bicep, terraform, pulumi]
  multi_cloud_enabled: false
  rollback_safe: true
  emit_diagram: true

🧠 Metadata Propagation¶

Every input maps to:

trace_id → present in every output file and event
tenant_id, environment → used in naming, resource grouping, tagging
agent_version → helps downstream validation
cloud_provider → decides Bicep (Azure), Terraform (multi-cloud), or Pulumi (C#/.NET)

📤 Core Outputs¶

The Infrastructure Architect Agent emits a complete infrastructure-as-code (IaC) blueprint for each service, tenant, and environment — using multi-mode outputs including:

Bicep for Azure-native deployments
Terraform for multi-cloud compatibility
Pulumi (C#) for .NET-native teams

Each output is environment-specific, versioned, and traceable, and includes metadata for rollback, observability, and security review.

📦 Artifact Output Summary¶

Artifact	Format	Purpose
`infra.bicep`	Bicep	Azure-native definition (VNets, AKS, KV, etc.)
`main.tf`	Terraform	Multi-cloud support (AWS, Azure, GCP)
`PulumiInfra.cs`	C# (Pulumi)	.NET-based IaC, same topology as Bicep/TF
`network-topology.yaml`	YAML	Declares subnets, IP ranges, NSGs, DNS
`identity-map.yaml`	YAML	Maps service identities, roles, scopes
`storage-definition.yaml`	YAML	Blob buckets, database plans, backup policies
`infra-metadata.json`	JSON	Traceable outputs: IDs, trace_id, environment, version
`infra-rollback.yaml`	YAML	Describes how to safely undo/replace resources
`infra-policy-map.yaml`	YAML	Injected tags, constraints, quotas, compliance requirements
`infra-topology.mmd`	Mermaid	Visual network and compute diagram
`InfraProvisioningPlanPublished`	Event	Lifecycle event for CI/CD and Cloud Architecture Agent

🧾 Output Example: `infra.bicep`¶

resource aks 'Microsoft.ContainerService/managedClusters@2022-03-01' = {
  name: '${serviceName}-aks-${environment}'
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    dnsPrefix: '${serviceName}-${environment}'
    agentPoolProfiles: [
      {
        name: 'default'
        count: 2
        vmSize: 'Standard_DS2_v2'
        osType: 'Linux'
        mode: 'System'
      }
    ]
  }
}

🧾 Output Example: `PulumiInfra.cs`¶

var resourceGroup = new ResourceGroup($"{project}-rg");

var vnet = new VirtualNetwork("vnet", new VirtualNetworkArgs {
    AddressSpaces = { "10.0.0.0/16" },
    ResourceGroupName = resourceGroup.Name,
    Location = resourceGroup.Location
});

var aks = new KubernetesCluster("aks", new KubernetesClusterArgs {
    ResourceGroupName = resourceGroup.Name,
    AgentPoolProfiles = {
        new KubernetesClusterAgentPoolProfileArgs {
            Name = "agentpool",
            Count = 2,
            VmSize = "Standard_D2_v2",
            Mode = "System"
        }
    },
    DnsPrefix = $"{project}-k8s",
    Identity = new KubernetesClusterIdentityArgs {
        Type = "SystemAssigned"
    }
});

🧾 Output Example: `network-topology.yaml`¶

vnet:
  name: vnet-notification-staging
  address_space: 10.1.0.0/16
  subnets:
    - name: public
      address_prefix: 10.1.1.0/24
    - name: private
      address_prefix: 10.1.2.0/24
  dns_zone: internal.connectsoft.dev
  peering:
    enabled: true
    targets:
      - core-services
      - telemetry-network

🧾 Output Example: `infra-metadata.json`¶

{
  "trace_id": "trace-infra-44411",
  "service": "NotificationService",
  "environment": "Staging",
  "provisioned_at": "2025-05-02T01:34:00Z",
  "agent_version": "1.3.0",
  "artifacts": {
    "bicep": "infra.bicep",
    "terraform": "main.tf",
    "pulumi": "PulumiInfra.cs",
    "metadata": "infra-metadata.json"
  }
}

✅ Output Completeness Rules¶

Requirement	Status
All outputs must include `trace_id`, `agent_version`, and `environment`	✅
Must emit at least one IaC format (Bicep, TF, Pulumi)	✅
All compute and network resources must be grouped per environment	✅
Storage, identity, and DNS outputs must be traceable via IDs	✅
Mermaid diagram must match declared topology	✅
Lifecycle event `InfraProvisioningPlanPublished` must be emitted	✅

📈 Event Emission Example¶

{
  "event": "InfraProvisioningPlanPublished",
  "trace_id": "trace-infra-44411",
  "service": "NotificationService",
  "environment": "Staging",
  "outputs": {
    "pulumi": "PulumiInfra.cs",
    "terraform": "main.tf",
    "bicep": "infra.bicep",
    "metadata": "infra-metadata.json"
  },
  "timestamp": "2025-05-02T01:34:00Z"
}

📚 Agent Knowledge Base Overview¶

The Infrastructure Architect Agent leverages a curated infrastructure knowledge base that includes:

Reusable IaC templates and modules
Multi-cloud best practices
Resource naming conventions and tagging strategies
Known patterns for tenant isolation, zone redundancy, and cost optimization
Policy-driven configurations for storage, identity, telemetry, and security

This knowledge base is applied uniformly across all IaC output modes: Bicep, Terraform, and Pulumi (C#).

🧱 1. Infrastructure Modules Library¶

Module	Description
`aks_cluster`	Parametric node pool with MSI, DNS, autoscaling
`vnet_base`	VNet + subnets + NSG with peer routing
`keyvault_module`	Vault + access policy bindings + purge protection
`blob_storage`	Secure container with private endpoint, encryption
`dns_zone`	Azure Private DNS or Route53 zone per tenant/environment
`otel_collector`	Injected as deployment or managed agent with routing
`role_assignments`	RBAC/IAM bindings for services and managed identities

🌍 2. Cloud-Specific Knowledge¶

Cloud	Agent Behavior
Azure	Uses Bicep templates + ARM native API ID mappings
AWS	Terraform modules for VPC, IAM, EKS, S3, CloudWatch
GCP	Terraform modules for VPC, GKE, IAM, KMS, Stackdriver
Hybrid/Local	Defaults to self-hosted Kubernetes + DNS stub zones
Pulumi (C#)	Mirrors Bicep/TF topologies using Pulumi SDKs + .NET

🔐 3. Identity and Access Policies¶

Pattern	Rule
Default identity type = `SystemAssigned` MSI or Service Principal
Least privilege: services get only minimal roles (Reader, KV Reader)
Tenant isolation: per-tenant resource group or role segregation
Access to secrets: always via vault reference, not hardcoded
`identity-map.yaml` reused across DevOps and Security Agents

📦 4. Naming and Tagging Conventions¶

Convention	Example
`env-service-region`	`prod-notification-eus`
Tags: `project`, `owner`, `env`, `trace_id`	Used across all provisioned resources
Vault secrets: `connectsoft/service/env/key`	Always prefixed with `connectsoft`
DNS zones: `internal.connectsoft.dev` + tenant subdomains	Used for service resolution and peering

♻️ 5. Reuse & Template Inheritance Rules¶

Component	Reuse Scope
Subnet definitions	Shared across services in same VNet
AKS node pool sizes	Reused per environment class (dev/staging/prod)
OpenTelemetry agent config	Injected if `observability-policy.yaml` is missing
Bicep, Terraform, Pulumi modules	Always versioned and imported from platform IaC registry

🧠 Semantic Memory Lookup Example¶

query: "provision AKS for production service with secure vault"
match:
  aks_template: aks_cluster_v3
  storage_encryption: enabled
  key_vault_binding: connectsoft-vault/prod/notification
  observability: otel_collector + log analytics

✅ Benefits of the Knowledge Base¶

Benefit	Outcome
🚀 Accelerated infra generation	Plug-and-play templates in Bicep/TF/Pulumi
🔐 Policy consistency	Same IAM, DNS, security posture across environments
🌍 Cloud abstraction	Terraform & Pulumi ensure portability
💾 Reusable and testable modules	All IaC artifacts inherit tested structure
📈 Infrastructure observability by default	Pre-integrated OTEL, logs, metrics
🧠 Semantic alignment	All infra outputs traceable and driven by shared intent

🔄 End-to-End Process Flow¶

The Infrastructure Architect Agent executes a deterministic, traceable, and reusable process to generate secure, environment-ready infrastructure blueprints across Azure, AWS, GCP, and Pulumi (.NET) ecosystems.

It coordinates input parsing, module resolution, IaC generation, observability injection, and lifecycle event emission — ensuring every microservice or environment is cloud-ready.

📋 Step-by-Step Flow¶

Step	Description
1️⃣ Input Collection & Validation	Load the required input files for the infrastructure setup and validate the essential parameters. The files include `solution-architecture.md`, `resource-configuration.yaml`, `identity-policy.yaml`, `field-retention-map.yaml`, `observability-policy.yaml`, and optionally `previous-infra-state.json` for drift detection. Validation ensures correct cloud provider selection, environment tags, and mandatory elements such as compute, DNS, and vault configurations.
2️⃣ Module Resolution	Select the appropriate base templates from the module library based on the input files (e.g., `aks_cluster`, `keyvault_module`). Determine the cloud-specific Infrastructure as Code (IaC) format to generate (e.g., Bicep, Terraform, Pulumi C#). Additionally, check for any custom overrides or tenant-specific constraints that may affect the generated infrastructure.
3️⃣ IaC Generation	Emit Infrastructure as Code (IaC) artifacts for various components, including: - Network: VNet, subnets, NSGs, peering - Compute: AKS, node pools, autoscaling groups - Storage: Blob/S3, databases, vaults - Identity: MSI, roles, bindings - DNS: zones, resolution paths, private endpoints Output formats include `infra.bicep`, `main.tf`, and `PulumiInfra.cs`.
4️⃣ Policy Injection & Observability Wiring	Apply policy injection and observability wiring by integrating the following: - Naming/tagging strategy for resource organization. - Trace ID embedding (`trace_id`, `service_name`, `environment`) for distributed tracing. - RBAC/IAM bindings based on `identity-policy.yaml` for secure access control. - OpenTelemetry collector & span wiring for monitoring. Validate that all resources are tagged with billing and governance labels, and ensure tracing spans and log sinks are configured.
5️⃣ Topology & Rollback Generation	Generate topology and rollback artifacts: - `network-topology.yaml`: Defines the network architecture. - `storage-definition.yaml`: Outlines storage configurations. - `identity-map.yaml`: Maps identity configurations. - `infra-topology.mmd`: Mermaid diagram representing infrastructure. - `infra-rollback.yaml`: If `rollback_safe: true`, generate rollback configurations. Optionally, diff the previous state (`previous-infra-state.json`) and emit `infra-drift-detected.yaml` if changes or missing resources are detected.
6️⃣ Lifecycle Metadata + Event Publishing	Generate lifecycle metadata and publish events: - Compose `infra-metadata.json` containing: - Agent version - Build ID - Resource count - Regions used - Trace ID - Output map (links to Bicep/Terraform/Pulumi). - Emit the `InfraProvisioningPlanPublished` lifecycle event. Optionally, emit `InfraDriftDetected`, `InfraRollbackReady`, or `InfraProvisioningFailed` events based on the process outcome.

1️⃣ Input Collection & Validation¶

Load:
- solution-architecture.md
- resource-configuration.yaml
- identity-policy.yaml
- field-retention-map.yaml
- observability-policy.yaml
- previous-infra-state.json (optional for drift detection)
Validate:
- Cloud provider selection
- Environment tags and tenant mappings
- Mandatory elements (e.g., compute + DNS + vault)

2️⃣ Module Resolution¶

Based on inputs:
- Select base templates from module library (aks_cluster, keyvault_module, etc.)
- Determine cloud-specific IaC format(s) to generate (Bicep, Terraform, Pulumi C#)
- Check for custom overrides or tenant-specific constraints

3️⃣ IaC Generation¶

Emit IaC artifacts for:
- Network: VNet, subnets, NSGs, peering
- Compute: AKS, node pools, autoscaling groups
- Storage: Blob/S3, DBs, vaults
- Identity: MSI, roles, bindings
- DNS: zones, resolution paths, private endpoints
Formats:
- infra.bicep
- main.tf
- PulumiInfra.cs

4️⃣ Policy Injection & Observability Wiring¶

Apply:
- Naming/tagging strategy
- Trace ID embedding (trace_id, service_name, environment)
- RBAC/IAM bindings based on identity-policy.yaml
- OpenTelemetry collector & span wiring
Validate:
- All resources tagged with billing and governance labels
- At least one tracing span and log sink configured

5️⃣ Topology & Rollback Generation¶

Emit:
- network-topology.yaml
- storage-definition.yaml
- identity-map.yaml
- infra-topology.mmd (Mermaid)
- infra-rollback.yaml if rollback_safe: true
Optionally:
- Diff from previous-infra-state.json
- Emit infra-drift-detected.yaml if resources changed or missing

6️⃣ Lifecycle Metadata + Event Publishing¶

Compose infra-metadata.json with:
- Agent version
- Build ID
- Resource count
- Regions used
- Trace ID
- Output map (bicep/terraform/pulumi links)
Emit:
- InfraProvisioningPlanPublished lifecycle event
- (Optional) InfraDriftDetected, InfraRollbackReady, InfraProvisioningFailed

🧠 Mermaid Process Diagram¶

flowchart TD
    A[Input Parsing] --> B[Template Resolution]
    B --> C[IaC Generation - Bicep, TF, Pulumi]
    C --> D[Policy + Observability Injection]
    D --> E[Topology + Rollback Output]
    E --> F[Emit Metadata + Events]

Hold "Alt" / "Option" to enable pan & zoom

🔁 Auto-Correction & Retry¶

Condition	Action
Missing cloud type	Default to Azure
DNS zone undefined	Inject fallback `internal.connectsoft.dev`
No OTEL config	Apply default OTLP + collector template
Storage class mismatch	Use default `Standard_LRS` or `gp2`
Secret naming violation	Normalize and log fix in output trace

📈 Observability Trace Spans¶

Span Name	Description
`infra_inputs_validated`	Initial agent activation success
`iac_generated`	When IaC is written to disk/output directory
`identity_mapped`	When MSI + RBAC bindings are computed
`topology_rendered`	Network + DNS plan finalized
`infra_provisioning_event_published`	Agent lifecycle marker sent

🛠️ Semantic Kernel Skills Overview¶

The Infrastructure Architect Agent is powered by a modular set of Semantic Kernel (.NET) skills.
Each skill handles a focused infrastructure concern — from input parsing and IaC generation to security, observability, and rollback management.

All skills are reusable, composable, and support trace-based observability and drift correction.

🔧 1. Core Infra Composition Skills¶

Skill	Purpose
`IaCScaffolderSkill`	Generates Bicep, Terraform, and Pulumi (C#) files from resolved templates
`CloudTopologyBuilderSkill`	Produces `network-topology.yaml`, `infra-topology.mmd`
`NodePoolPlannerSkill`	Determines AKS/EKS node sizes, autoscale rules, zones
`TaggingPolicyEnforcerSkill`	Injects `owner`, `project`, `env`, `trace_id` into all resources

🔐 2. Identity and Access Control Skills¶

Skill	Purpose
`IdentityMapGeneratorSkill`	Produces `identity-map.yaml` with MSI, SP, roles
`IAMRoleBinderSkill`	Creates IAM bindings in Bicep, TF, or Pulumi
`TenantIsolationPlannerSkill`	Applies per-tenant scoping rules for RBAC, resource groups
`VaultIntegrationSkill`	Validates and links vault access with identity bindings

🌍 3. Multi-Cloud IaC Generator Skills¶

Skill	Purpose
`BicepEmitterSkill`	Generates `infra.bicep` for Azure deployments
`TerraformEmitterSkill`	Outputs `main.tf` with providers, modules, locals
`PulumiCSharpEmitterSkill`	Converts resolved infra map into `PulumiInfra.cs` using .NET SDKs
`IaCDiffCheckerSkill`	Compares new IaC with `previous-infra-state.json` and emits drift report

📈 4. Observability and Topology Skills¶

Skill	Purpose
`OpenTelemetryWiringSkill`	Injects OTLP collector config, exporter URLs, and trace IDs
`TopologyDiagramWriterSkill`	Renders Mermaid diagram for VNet, Subnets, AKS, Vaults
`SpanEmitterSkill`	Emits infra lifecycle spans to OpenTelemetry and DevOps trace
`StorageDefinitionWriterSkill`	Builds `storage-definition.yaml` from retention, schema, and access tier policies

🔁 5. Rollback, Metadata, and Event Publication Skills¶

Skill	Purpose
`RollbackPlanBuilderSkill`	Produces `infra-rollback.yaml` based on diff and rollback flags
`MetadataManifestWriterSkill`	Outputs `infra-metadata.json` with trace and artifact map
`LifecycleEventPublisherSkill`	Emits `InfraProvisioningPlanPublished` and other lifecycle signals
`DriftAlertEmitterSkill`	Triggers `InfraDriftDetected` event and changelog if topology changed unexpectedly

📈 Skill Observability Example¶

{
  "trace_id": "trace-infra-44219",
  "skill": "PulumiCSharpEmitterSkill",
  "output": "PulumiInfra.cs",
  "cloud_provider": "Azure",
  "status": "success",
  "duration_ms": 134
}

🔁 Retry & Auto-Correction Behaviors¶

Skill	Condition	Auto-Correction
`VaultIntegrationSkill`	Secret missing reference	Injects fallback vault path per service/environment
`TenantIsolationPlannerSkill`	No tenant ID provided	Assumes shared infra mode
`OpenTelemetryWiringSkill`	Missing exporter config	Defaults to `http://otel.default:4317` OTLP
`PulumiCSharpEmitterSkill`	Unsupported resource	Downgrades to Terraform if fallback module available

📊 Metrics Emitted per Skill¶

Metric	Purpose
`iac_files_generated_total`	Total IaC templates produced
`drift_detected_total`	Number of diffs vs prior infra state
`span_injection_success`	Whether OTEL spans were injected
`rollback_plan_ready`	Boolean indicating rollback plan completeness
`event_publish_success`	Confirmed lifecycle notification sent

🧰 Core Technologies and Platforms¶

The Infrastructure Architect Agent uses a modular, cloud-native, and automation-first tech stack to generate, validate, and export infrastructure blueprints across cloud environments.
It supports multi-IaC, multi-cloud, and developer-native (Pulumi C#) workflows, ensuring seamless integration with ConnectSoft’s broader platform and DevOps agents.

☁️ Supported Cloud Platforms¶

Cloud Provider	Role in Agent
Azure	Primary cloud for production environments — AKS, Key Vault, Storage, Monitor
AWS	Optional deployments via Terraform — EKS, S3, IAM, CloudWatch
GCP	Supported via Terraform — GKE, Cloud Storage, IAM, Stackdriver
Hybrid (Self-hosted)	DNS stub zones, internal IP ranges, K3s, OpenTelemetry only

📦 Infrastructure-as-Code Engines¶

Tool	Purpose
Bicep	Azure-native IaC (AKS, VNets, Key Vault, RBAC)
Terraform	Cloud-agnostic IaC across Azure, AWS, GCP
Pulumi (C#)	.NET-native IaC for developer-centric infrastructure teams
ARM Templates (fallback only)	Legacy support for Azure services where Bicep isn’t available
Cloud SDKs	Used via Pulumi C# for resource orchestration

🛠️ Semantic Kernel (.NET)¶

Use	Details
Agent orchestration	Composes IaC generation flow from skills
Prompt interpretation	Parses and applies intent from YAML-based prompt definitions
Skill injection	Loads IaC emitter skills based on output target (Bicep, TF, Pulumi)
Trace integration	Embeds `trace_id`, `agent_version`, `service_name` into every generated file and span

🔐 Identity and Access¶

Technology	Purpose
Azure Managed Identity (MSI)	Used by AKS, Key Vault, Storage access
Terraform IAM modules	AWS/GCP support for minimal role policies
Pulumi.AzureNative.Authorization	C#-based identity provisioning
Key Vault & KMS	Securely manage tokens, connection strings, keys
RBAC Scopes	Automatically mapped via `identity-map.yaml`

🌍 Networking and DNS¶

Stack	Role
Azure Private DNS	Used for internal `.connectsoft.dev` subdomains
Route53 / Cloud DNS	Used for tenant-bound or public zones
VNet / VPC	Created per environment or tenant
NSGs / Security Groups	Applied per subnet or cluster node pool
Peering & Transit Gateway	Supported where cross-region communication is required

📊 Observability & Logging Stack¶

Component	Purpose
OpenTelemetry Collector	Injected into infra stack to emit spans and metrics
Azure Monitor / Log Analytics	Default observability sink for Azure-based infra
Prometheus + Grafana (optional)	Can be configured via Pulumi or Helm
CloudWatch / Stackdriver	Used on AWS/GCP via Terraform log routing modules
otel-agent-config.yaml	Emitted to standardize span schema per resource type

📁 Template Registry and Reuse¶

Format	Reuse Mechanism
.bicep	Imported from `connectsoft/iac/modules/*.bicep`
.tf	Uses Terraform `module` blocks with `source` ref
.cs (Pulumi)	Generated from `TemplateLibrary/Modules/*.cs`
Shared Tags/Locals	Used across all formats: `env`, `project`, `trace_id`, `region`

🔁 CI/CD and DevOps Integration¶

Tool	Integration Point
Azure DevOps Pipelines	Pulls IaC artifacts from `infra-out` folder
GitOps	Optional: sync from generated output into Flux/ArgoCD
`InfraProvisioningPlanPublished`	Triggers downstream deploy/test flows
Rollback Triggers	Based on `infra-rollback.yaml`, linked via trace and git ref

📜 System Prompt (Bootstrapping Instruction)¶

This system prompt governs how the Infrastructure Architect Agent initializes, interprets architectural intent, and generates IaC artifacts in Bicep, Terraform, and Pulumi (C#) formats.

It enforces ConnectSoft’s standards for security, observability, naming, versioning, and tenant-aware resource allocation.

✅ Full System Prompt (Plain Text)¶

You are the **Infrastructure Architect Agent** in the ConnectSoft AI Software Factory.

Your purpose is to take architectural definitions and environment metadata, and generate a complete, secure, and reusable infrastructure blueprint that can be deployed using Bicep, Terraform, or Pulumi (C#).

---

## Your Responsibilities:

1. Load inputs:
   - solution-architecture.md
   - application-architecture.md
   - identity-policy.yaml
   - resource-configuration.yaml
   - field-retention-map.yaml
   - observability-policy.yaml
   - previous-infra-state.json (optional)

2. Generate infrastructure outputs:
   - Bicep (`infra.bicep`)
   - Terraform (`main.tf`)
   - Pulumi C# (`PulumiInfra.cs`)
   - YAML metadata files: `network-topology.yaml`, `storage-definition.yaml`, `identity-map.yaml`
   - Mermaid diagram: `infra-topology.mmd`
   - Rollback and changelog files: `infra-rollback.yaml`, `infra-drift-detected.yaml`

3. Apply policies:
   - Use ConnectSoft’s naming, tagging, and RBAC conventions
   - Inject traceability metadata (`trace_id`, `agent_version`, `service_name`, `environment`)
   - Wire in OpenTelemetry collector and span templates

4. Emit lifecycle events:
   - `InfraProvisioningPlanPublished`
   - `InfraDriftDetected` (if previous infra state differs)
   - `InfraRollbackReady` (if rollback plan successfully created)

---

## Output Requirements:

- All artifacts must be consistent, versioned, and environment-specific
- At least one IaC format (Bicep, TF, or Pulumi) must be generated
- All infrastructure must include trace metadata and environment tags
- Resources must be deployable via automation pipelines (Azure DevOps, GitOps)
- No hardcoded secrets or identity strings — use vaults and RBAC

---

## Observability:

- Inject OTEL collector or sidecar per cluster or host
- Emit spans for: input parsing, template selection, IaC generation, rollback readiness
- Use `otel-agent-config.yaml` to standardize export targets and trace structure

---

## Fallbacks and Safety:

- If cloud provider not specified → default to Azure
- If observability config missing → inject default OTLP + stdout
- If tenant_id missing → assume shared multi-tenant mode
- If rollback disabled but detected drift → emit warning and skip promotion

🔐 Policy-Driven Agent Behavior¶

All DNS zones must resolve within *.connectsoft.dev
All secrets and config must be sourced via Azure Key Vault or equivalent
All generated outputs must conform to ConnectSoft's environment-tagging, traceability, and modular reuse conventions

📥 Input Prompt Template¶

The Infrastructure Architect Agent is activated by a structured YAML input prompt provided by the ConnectSoft orchestrator or Solution Architect Agent.
This prompt defines the service context, environment, tenant scope, and configuration sources needed to generate infrastructure artifacts.

✅ Sample Input Prompt (YAML)¶

assignment: provision-infrastructure

project:
  trace_id: trace-infra-99881
  service_name: NotificationService
  environment: Staging
  tenant_id: tenant-42
  agent_version: 1.3.0

inputs:
  solution_architecture_url: https://.../solution-architecture.md
  application_architecture_url: https://.../application-architecture.md
  resource_configuration_url: https://.../notification/resource-config.yaml
  identity_policy_url: https://.../notification/identity-policy.yaml
  field_retention_map_url: https://.../notification/field-retention.yaml
  observability_policy_url: https://.../notification/observability.yaml
  previous_infra_state_url: https://.../notification/infra-v1.1.0.json

settings:
  output_format: [bicep, terraform, pulumi]
  inject_otel: true
  rollback_safe: true
  emit_diagram: true
  cloud_provider: Azure
  enable_tenant_isolation: true

🧩 Required Input Fields¶

🔷 `project`¶

Field	Description
`trace_id`	Unique ID for traceability, reused across all outputs
`service_name`	The logical microservice name this infra supports
`environment`	Deployment stage (Dev, Staging, Production)
`tenant_id`	Tenant scope (optional — if omitted, shared mode assumed)
`agent_version`	Infrastructure Architect Agent version used to generate artifacts

📁 `inputs`¶

Key	Description
`solution_architecture_url`	High-level system blueprint
`application_architecture_url`	Logical service and zone breakdown
`resource_configuration_url`	Resource size, DNS, storage class, compute settings
`identity_policy_url`	RBAC, MSI, tenant roles, access constraints
`field_retention_map_url`	Storage requirements (encryption, backup)
`observability_policy_url`	Span injection and OTEL configuration
`previous_infra_state_url`	Optional — used for drift detection and rollback planning

⚙️ `settings`¶

Field	Description
`output_format`	Which IaC targets to generate (`bicep`, `terraform`, `pulumi`)
`inject_otel`	Whether to inject OTEL collector and trace wiring
`rollback_safe`	If true, `infra-rollback.yaml` will be generated
`emit_diagram`	If true, outputs `infra-topology.mmd` (Mermaid)
`cloud_provider`	Forces IaC resolution to Azure, AWS, GCP
`enable_tenant_isolation`	Enforces subnet, DNS, or RG separation for tenant-specific infra

✅ Validation Rules¶

Rule	Description
`service_name`, `trace_id`, and `environment` must be present	✅
At least one of `output_format` must be valid	✅
If `rollback_safe: true`, previous state must be provided	✅
Identity policies must define at least one principal binding	✅
DNS zone and subnet must resolve to valid topology unless `emit_diagram: false`	✅

📦 Minimal Prompt Example¶

assignment: provision-infrastructure

project:
  trace_id: trace-infra-44419
  service_name: AuditService
  environment: Dev

inputs:
  resource_configuration_url: https://.../audit/resource-config.yaml
  identity_policy_url: https://.../audit/identity-policy.yaml

settings:
  output_format: [pulumi]
  rollback_safe: false

📤 Output Expectations Overview¶

The Infrastructure Architect Agent emits a complete, environment-aware, and IaC-compatible infrastructure bundle for every service, tenant, and environment combination.

These outputs are used by DevOps, Observability, Security, and Cloud Architecture Agents to provision, validate, and monitor real-world infrastructure across Azure, AWS, GCP, or hybrid environments.

📦 Artifact Summary Table¶

Artifact	Format	Purpose
`infra.bicep`	Bicep	Azure-native infrastructure script
`main.tf`	Terraform	Multi-cloud IaC abstraction
`PulumiInfra.cs`	C# (.NET Pulumi)	Developer-native infrastructure declaration
`network-topology.yaml`	YAML	Defines VNet, subnets, DNS zones, NSGs
`identity-map.yaml`	YAML	Describes service-managed identities, role bindings
`storage-definition.yaml`	YAML	Blob buckets, DB definitions, backup policies
`infra-metadata.json`	JSON	Metadata: trace ID, version, region, cloud provider
`infra-rollback.yaml`	YAML	Reversion and resource safe-destruction map
`infra-policy-map.yaml`	YAML	Resource tags, region policies, identity zones
`infra-topology.mmd`	Mermaid	Visual architecture map of network + services
`otel-agent-config.yaml`	YAML	Telemetry collectors, exporters, OTLP endpoints
`InfraProvisioningPlanPublished`	JSON (Event)	Published lifecycle metadata for downstream use

🧾 Example: `infra-policy-map.yaml`¶

tags:
  owner: infra-team
  project: connectsoft
  trace_id: trace-infra-88288
  billing_code: cc123
resource_group_strategy: per-environment
tenant_isolation: subnet
observability:
  otel_enabled: true
  collector_url: http://otel.default:4317

📊 Example: `infra-metadata.json`¶

{
  "trace_id": "trace-infra-99122",
  "service": "NotificationService",
  "environment": "Production",
  "cloud_provider": "Azure",
  "agent_version": "1.3.0",
  "generated_at": "2025-05-02T01:55:00Z",
  "output_artifacts": {
    "bicep": "infra.bicep",
    "terraform": "main.tf",
    "pulumi": "PulumiInfra.cs"
  },
  "region": "eastus2",
  "rollback_ready": true
}

📈 Example: `InfraProvisioningPlanPublished` (Event)¶

{
  "event": "InfraProvisioningPlanPublished",
  "trace_id": "trace-infra-99122",
  "service": "NotificationService",
  "environment": "Production",
  "outputs": {
    "Pulumi": "PulumiInfra.cs",
    "metadata": "infra-metadata.json",
    "topology": "infra-topology.mmd"
  },
  "timestamp": "2025-05-02T01:55:00Z"
}

📌 Output Constraints & Validation Rules¶

Requirement	Enforced
All outputs include `trace_id`, `environment`, `agent_version`	✅
At least one IaC format (`bicep`, `tf`, or `pulumi`) must be present	✅
DNS + network must be represented in `network-topology.yaml`	✅
Observability outputs must be injected or defaulted	✅
If `rollback_safe: true`, `infra-rollback.yaml` must be emitted	✅
Mermaid diagram emitted if `emit_diagram: true`	✅

📘 Optional: `deployment-docs.md`¶

For downstream use in Developer Portal or Audit Trail.

# Infrastructure Deployment: NotificationService

**Environment:** Production  
**Cloud:** Azure  
**Version:** v1.3.0  
**Trace ID:** trace-infra-99122

## Components
- AKS Cluster with autoscaling
- Private VNet with DNS
- Azure Blob Storage with GRS
- Key Vault with managed identity
- OTEL Collector with OLTP export

## Rollback
Rollback enabled → `infra-rollback.yaml`

🧠 Memory Strategy Overview¶

The Infrastructure Architect Agent maintains a short-term session memory and long-term semantic memory to ensure:

Infrastructure consistency across environments and agents
Reuse of known patterns and resource identifiers
Rollback awareness and safe reversion
Drift detection and proactive remediation
Cross-agent alignment (DevOps, Security, Observability, Cloud)

🕐 Short-Term (Session) Memory¶

Key	Purpose
`trace_id`	Tracks all outputs, spans, and events for current run
`service_name`	Used in naming conventions, tags, identities
`environment`	Affects DNS, tags, scaling strategy, vaults
`output_format[]`	Controls which IaC formats to emit (Bicep, TF, Pulumi)
`generated_artifacts[]`	List of created files for summary + event emission
`drift_detected[]`	Captures differences from `previous-infra-state.json`
`tenant_mode`	Shared vs. isolated infrastructure toggles

🧠 Long-Term Semantic Memory¶

🔁 1. Provisioned Resources History¶

Data	Used For
Resource group names and locations	Prevent collisions, support idempotency
Vault references and secret names	Enforce secure naming + reuse
AKS/EKS cluster names and configurations	Auto-resolve for service scaleouts
DNS zones, subnets, and peering policies	Maintain cross-region resolution and tenancy compliance

🔐 2. Identity Mapping History¶

Tracked	Use Case
Previously bound MSIs, SPs, and RBAC roles	Re-apply correct scopes to new services
Tenant-based role templates	Enforce least privilege per tenant pattern
Vault identity linkage	Validate that all resources have secure vault access only

🌍 3. Observability and OTEL Tracing Memory¶

Retained	Purpose
Previously emitted `otel-agent-config.yaml`	Reuse collector endpoint + exporter rules
Known span naming conventions	Ensure trace compatibility across deployments
Log forwarding sinks	Automatically bind to Azure Monitor, CloudWatch, or Stackdriver per platform

♻️ 4. Template and Module Reuse¶

Element	Behavior
AKS/EKS node pool configs	Match by environment and region
VNet modules	Reuse across environments if `shared_vnet: true`
Vault templates	Loaded per team/project/tenant class
`infra-policy-map.yaml`	Maintains cost, tag, and scaling standards

📑 Memory Snapshot Output Example¶

{
  "trace_id": "trace-infra-55488",
  "version": "1.3.0",
  "rollback_ready": true,
  "generated_artifacts": [
    "PulumiInfra.cs",
    "network-topology.yaml",
    "identity-map.yaml",
    "infra-rollback.yaml"
  ],
  "reuse_policies": {
    "vnet": "shared",
    "dns": "per-tenant",
    "vault": "env-isolated"
  },
  "drift_detected": false
}

✅ Memory Benefits¶

Value	Result
🔁 Prevent redundant redeployment	Avoid resource re-creation when no change is needed
🔒 Secure identity reuse	Roles and bindings consistently applied
📈 Span and telemetry continuity	Standard OTEL tracing maintained over versions
📦 Consistent IaC structure	Modules and naming reused across hundreds of microservices
🧭 Rollback safety	Links each version to previous working state with traceability

✅ Validation and Correction Overview¶

The Infrastructure Architect Agent includes a comprehensive, multi-layered validation pipeline to ensure that all emitted IaC artifacts:

Are syntactically correct
Follow security, naming, and tagging conventions
Are traceable
Support safe provisioning and rollback
Conform to ConnectSoft’s platform policies across clouds and tenants

When validation fails, the agent applies auto-corrections, issues warnings, or blocks output publication.

🔍 Validation Phases¶

1️⃣ Schema & Structure Validation¶

Artifact	Rule
`infra.bicep` / `main.tf` / `PulumiInfra.cs`	Valid IaC syntax, no unresolved references
`network-topology.yaml`	Must include VNet, at least one subnet, DNS zone
`identity-map.yaml`	Must include at least one MSI or principal
`storage-definition.yaml`	Must define replication, encryption for each store
`infra-policy-map.yaml`	Required tags: `owner`, `project`, `env`, `trace_id`
`infra-topology.mmd`	Valid Mermaid syntax and referenced resource IDs exist

2️⃣ Security & Naming Enforcement¶

Rule	Correction
DNS zone must end in `.connectsoft.dev`	Append suffix if missing
Vault secrets must be reference-based (not inline)	Convert inline to `key_ref:` pattern
Service name must be kebab-case	Auto-format and update in IaC files
Tenant resources must include `tenant_id` in name/tag	Append suffix or tag if absent

3️⃣ Observability & Traceability¶

Rule	Correction
OTEL config must exist if `inject_otel: true`	Inject default `otel-agent-config.yaml`
Each IaC output must include `trace_id`, `agent_version`, and `environment`	Inject automatically if missing
Must emit OpenTelemetry spans for generation lifecycle	Auto-emit if span config is present or fallback allowed

4️⃣ Drift and Rollback Safety Checks¶

Rule	Behavior
If `rollback_safe: true`, but no `previous-infra-state.json` provided	Emit warning, fallback to snapshot-only rollback
If significant drift from previous state detected	Emit `infra-drift-detected.yaml`, block auto-promotion
If no drift and version unchanged	Block emission of duplicate `infra.bicep` or `PulumiInfra.cs`

🔁 Auto-Correction Behavior Matrix¶

Input Error	Correction Applied
DNS missing `.connectsoft.dev`	Append suffix, log to changelog
Vault secret defined inline	Convert to `key_ref` and generate binding
AKS region undefined	Default to `eastus2`
Missing `identity-policy.yaml`	Inject system-assigned MSI + Reader role
Missing tags	Inject default tags: `owner`, `project`, `trace_id`

📢 Blocking Conditions¶

Condition	Outcome
Invalid IaC syntax	Block artifact, emit `InfraProvisioningFailed`
Inline secrets with no fallback	Fail validation
Identity map missing required bindings	Fail validation
Drift detected with rollback disabled	Block publish, emit alert
Trace ID missing	Block all outputs until traceability is ensured

📈 Observability Spans Emitted¶

Span Name	Trigger
`iac_validation_passed`	All IaC outputs validated
`corrections_applied`	One or more auto-corrections were made
`rollback_plan_generated`	`infra-rollback.yaml` emitted successfully
`infra_drift_detected`	Infrastructure change from previous version
`publish_blocked_due_to_drift`	Promotion blocked, manual review required

✅ Validation Outcome Summary¶

Outcome	Action
✅ Pass	Emit `InfraProvisioningPlanPublished`
⚠️ Partial Pass (with auto-corrections)	Emit warning, mark trace
❌ Fail	Emit `InfraProvisioningFailed` or block downstream publication
🔁 Drift Detected	Emit changelog + `infra-drift-detected.yaml`, mark rollback required

🤝 Agent Collaboration Overview¶

The Infrastructure Architect Agent plays a central role in the ConnectSoft AI Software Factory by producing foundational infrastructure on which all other services and agents depend.
It acts as both a consumer of architectural intent and a provider of deployable environments, identity scaffolding, and cloud boundaries to downstream agents.

🔼 Upstream Providers (Input Sources)¶

Agent	Input Artifact
Solution Architect Agent	`solution-architecture.md` — defines zones, environments, regions
Application Architect Agent	`application-architecture.md` — lists logical services and zone groupings
Cloud Architecture Agent	`resource-configuration.yaml` — compute type, DNS, region, zones
Security Architect Agent	`identity-policy.yaml` — MSI, role bindings, secret access policies
Observability Agent	`observability-policy.yaml` — OTEL injection and span propagation
Data Architect Agent	`field-retention-map.yaml` — drives encryption, backup vault generation

🔽 Downstream Consumers¶

Agent	Consumes
DevOps Architect Agent	Pulls `identity-map.yaml`, `infra.bicep`, `PulumiInfra.cs` to enable secure pipelines and environments
Cloud Architecture Agent	Uses `network-topology.yaml`, `infra-policy-map.yaml` for region scaling, CDN planning
Security Architect Agent	Reads `infra-policy-map.yaml` and identity definitions for role enforcement
Observability Agent	Reuses `otel-agent-config.yaml`, emits spans from infra provisioning lifecycle
Rollback Executor Agent	Consumes `infra-rollback.yaml` when a downgrade or revert is required
Developer Portal Generator Agent	Displays topology (`infra-topology.mmd`) and status of infra per service/environment

📦 Published Events¶

Event	Trigger
`InfraProvisioningPlanPublished`	Emitted when generation completes with success
`InfraDriftDetected`	Published when current state differs from `previous-infra-state.json`
`InfraRollbackReady`	Emitted when rollback file is prepared and safe to apply
`InfraProvisioningFailed`	Published if IaC generation or validation fails

🔗 Artifact Linkage to Other Agents¶

Artifact	Consumed By
`infra.bicep`, `main.tf`, `PulumiInfra.cs`	DevOps Agent, CI/CD Orchestrator
`network-topology.yaml`	API Gateway Configurator, Cloud Scaling Agent
`identity-map.yaml`	Security, DevOps, Vault Injector Agents
`storage-definition.yaml`	Backup, Data Migration, and Cost Monitor Agents
`infra-topology.mmd`	Developer Portal, Audit Dashboard

🧠 Traceability and Coordination¶

All outputs contain:

metadata:
  trace_id: trace-infra-11889
  agent_version: 1.3.0
  service_name: NotificationService
  environment: Staging
  generated_on: 2025-05-02T02:10:00Z

These values are used to cross-reference deployments, rollback chains, and audit trails across DevOps, Security, and Monitoring agents.

📡 Integration Flow¶

flowchart TD
    SolutionArchitect --> InfrastructureArchitect
    ApplicationArchitect --> InfrastructureArchitect
    CloudArchitect --> InfrastructureArchitect
    SecurityArchitect --> InfrastructureArchitect
    ObservabilityAgent --> InfrastructureArchitect

    InfrastructureArchitect --> DevOpsArchitect
    InfrastructureArchitect --> CloudArchitect
    InfrastructureArchitect --> RollbackExecutor
    InfrastructureArchitect --> DeveloperPortal
    InfrastructureArchitect --> ObservabilityAgent

Hold "Alt" / "Option" to enable pan & zoom

✅ Collaboration Summary¶

Agent	Relationship
🔼 Architects & Planners	Provide policies, inputs, and intent
🔽 Executors	Consume infra, deploy, monitor, rollback, or visualize
🔁 Cyclical Feedback	Drift → security/governance validation → rollback or regenerate
📈 Spans + Events	Provide observability hooks to DevOps + platform telemetry

📡 Observability & Oversight Strategy¶

The Infrastructure Architect Agent is fully instrumented to support:

Real-time span emission
Lifecycle event publishing
Provisioning rollback awareness
Topology visualization
Audit traceability across clouds, tenants, and environments

These mechanisms ensure that every infrastructure generation cycle is observable, versioned, and governance-ready.

📈 OpenTelemetry Span Emission¶

Span Name	Trigger
`infra_inputs_parsed`	After validating incoming prompt and configs
`iac_generation_completed`	When all IaC formats are successfully written
`topology_rendered`	When `infra-topology.mmd` is generated
`rollback_plan_ready`	When `infra-rollback.yaml` is complete
`infra_lifecycle_event_emitted`	When `InfraProvisioningPlanPublished` is dispatched

Each span includes: - trace_id
- agent_version
- service_name
- environment
- duration_ms
- status

📢 Lifecycle Event Streams¶

Event	Purpose
`InfraProvisioningPlanPublished`	Main lifecycle signal with full output metadata
`InfraRollbackReady`	Indicates safe rollback plan was generated
`InfraDriftDetected`	Signals divergence from previously known infra state
`InfraProvisioningFailed`	Indicates generation or validation failure (used by DevOps + Observability)

📊 Dashboards & Audit Integration¶

Dashboard Tile	Description
IaC Generation Timeline	Visual trace of infra generation per service
Drift Detection Status	Last time drift was seen and what changed
Resource Count & Cost Tags	Breakdown of provisioned units and billing labels
RBAC & MSI Audit	Which identities were generated or reused
Topology Viewer	Mermaid-rendered overlay of DNS, VNet, zones, vaults, and compute

🧭 Rollback Awareness & Safety¶

If rollback_safe: true:

infra-rollback.yaml is always generated
Last successful output snapshot is stored
Changes are diffed against previous-infra-state.json
Downstream rollback agents and DevOps are notified

Rollback artifacts include:

rollback:
  to_version: 1.2.0
  generated_on: 2025-05-02T02:13:00Z
  trace_id: trace-infra-99321
  files: [infra.bicep, PulumiInfra.cs, identity-map.yaml]
  approved_by: system

📦 Final Output Bundle (Recap)¶

Artifact	Format
`infra.bicep` / `main.tf` / `PulumiInfra.cs`	IaC files
`identity-map.yaml` / `network-topology.yaml` / `storage-definition.yaml`	Infra blueprints
`infra-topology.mmd`	Mermaid diagram
`infra-metadata.json`	Versioned metadata
`infra-rollback.yaml`	Rollback spec
`infra-policy-map.yaml`	Tag + policy map
`otel-agent-config.yaml`	Telemetry routing
`InfraProvisioningPlanPublished`	Lifecycle event

✅ Agent Outcome Summary¶

Capability	Delivered
🔁 Multi-format IaC generation	✅ Bicep, Terraform, Pulumi (C#)
🧱 Modular infra blueprints	✅ Per environment, region, tenant
🔐 Secure RBAC and secrets	✅ MSI, Vault, scoped identity maps
🌐 DNS, VNet, AKS, Storage	✅ All core infra defined
📈 Observability spans	✅ OTEL and span-ready outputs
♻️ Drift + rollback coverage	✅ With lifecycle events + changelogs

Infrastructure Architect Agent Specification¶

🎯 Purpose¶

💡 Why This Agent Matters¶

🧱 What This Agent Enables¶

🧭 Position in the Factory Pipeline¶

🧠 Reusable Infrastructure Scenarios¶

📈 Vision Alignment¶

📡 Scope of Influence¶

🔧 System Layers Controlled by the Agent¶

🌍 Target Environments Provisioned¶

📦 Agent Output Mode Coverage¶

🧠 Scope Summary Diagram¶

✅ Key Impact Zones¶

📋 Core Responsibilities¶

🧱 1. Provisioning Blueprint Generation¶

🔐 2. Identity and Access Layer Definition¶

🌐 3. Network Topology Planning¶

💾 4. Storage and Data Service Definition¶

📈 5. Observability and Monitoring Setup¶

♻️ 6. Reusability and Policy Propagation¶

📢 7. Lifecycle & Compliance Hooks¶

✅ Summary¶

📥 Core Inputs¶

📂 Required Input Artifacts¶

📘 Sample Input: resource-configuration.yaml¶

📘 Sample Input: identity-policy.yaml¶

📘 Sample Input: observability-policy.yaml¶

🧩 Input Validation Rules¶

🔐 Input Prompt Snippet Example¶

🧠 Metadata Propagation¶

📤 Core Outputs¶

📦 Artifact Output Summary¶

🧾 Output Example: infra.bicep¶

🧾 Output Example: PulumiInfra.cs¶

🧾 Output Example: network-topology.yaml¶

🧾 Output Example: infra-metadata.json¶

✅ Output Completeness Rules¶

📈 Event Emission Example¶

📚 Agent Knowledge Base Overview¶

🧱 1. Infrastructure Modules Library¶

🌍 2. Cloud-Specific Knowledge¶

🔐 3. Identity and Access Policies¶

📦 4. Naming and Tagging Conventions¶

♻️ 5. Reuse & Template Inheritance Rules¶

🧠 Semantic Memory Lookup Example¶

✅ Benefits of the Knowledge Base¶

🔄 End-to-End Process Flow¶

📋 Step-by-Step Flow¶

1️⃣ Input Collection & Validation¶

2️⃣ Module Resolution¶

3️⃣ IaC Generation¶

4️⃣ Policy Injection & Observability Wiring¶

5️⃣ Topology & Rollback Generation¶

6️⃣ Lifecycle Metadata + Event Publishing¶

🧠 Mermaid Process Diagram¶

🔁 Auto-Correction & Retry¶

📈 Observability Trace Spans¶

🛠️ Semantic Kernel Skills Overview¶

🔧 1. Core Infra Composition Skills¶

🔐 2. Identity and Access Control Skills¶

🌍 3. Multi-Cloud IaC Generator Skills¶

📈 4. Observability and Topology Skills¶

🔁 5. Rollback, Metadata, and Event Publication Skills¶

📈 Skill Observability Example¶

🔁 Retry & Auto-Correction Behaviors¶

📊 Metrics Emitted per Skill¶

🧰 Core Technologies and Platforms¶

☁️ Supported Cloud Platforms¶

📦 Infrastructure-as-Code Engines¶

🛠️ Semantic Kernel (.NET)¶

🔐 Identity and Access¶

🌍 Networking and DNS¶

📊 Observability & Logging Stack¶

📁 Template Registry and Reuse¶

🔁 CI/CD and DevOps Integration¶

📜 System Prompt (Bootstrapping Instruction)¶

✅ Full System Prompt (Plain Text)¶

🔐 Policy-Driven Agent Behavior¶

📥 Input Prompt Template¶

✅ Sample Input Prompt (YAML)¶

📘 Sample Input: `resource-configuration.yaml`¶

📘 Sample Input: `identity-policy.yaml`¶

📘 Sample Input: `observability-policy.yaml`¶

🧾 Output Example: `infra.bicep`¶

🧾 Output Example: `PulumiInfra.cs`¶

🧾 Output Example: `network-topology.yaml`¶

🧾 Output Example: `infra-metadata.json`¶

🔷 `project`¶

📁 `inputs`¶

⚙️ `settings`¶

🧾 Example: `infra-policy-map.yaml`¶

📊 Example: `infra-metadata.json`¶

📈 Example: `InfraProvisioningPlanPublished` (Event)¶

📘 Optional: `deployment-docs.md`¶