🛰️ Observability Engineer Agent Specification¶

🎯 Purpose¶

The Observability Engineer Agent ensures that every artifact produced by the ConnectSoft AI Software Factory — including services, agents, orchestrators, and infrastructure modules — is fully observable at runtime, traceable across pipelines, and compatible with compliance, debugging, and optimization flows.

This agent operates at design-time, injecting the runtime constructs needed to emit:

Structured logs with trace and context metadata
OpenTelemetry traces with traceId, agentId, moduleId
Prometheus-compatible metrics (latency, execution count, status)
Execution metadata artifacts (execution-metadata.json)
Health probes and diagnostics endpoints

🧭 Platform Placement¶

The agent activates immediately after scaffolding, before DevOps orchestration, and acts as a final observability linter and injector in the software factory pipeline.

📊 Positioning Diagram¶

flowchart LR
    Vision[Vision Architect Agent]
    Arch[Solution Architect Agent]
    Gen[Microservice Generator Agent]
    Obs[Observability Engineer Agent]
    DevOps[DevOps Engineer Agent]
    QA[Test Generator Agent]
    Release[Release Manager Agent]

    Vision --> Arch
    Arch --> Gen
    Gen --> Obs
    Obs --> DevOps
    Obs --> QA
    DevOps --> Release

Hold "Alt" / "Option" to enable pan & zoom

The Observability Engineer Agent is the critical bridge between generation and deployment, ensuring all downstream agents and Studio modules can trace, score, and visualize what was produced.

🧠 Why It Exists¶

Without this agent, the factory would suffer from:

Opaque outputs — code runs but no traces or metrics are emitted
Inconsistent diagnostics — missing traceId or tenantId in logs
Security risks — sensitive fields logged improperly
Undebuggable failures — no insight into where/why agents or services fail
Disconnected Studio views — missing visual timelines, dashboards, and audit trails

This agent makes observability not optional, but enforced by design.

✅ Key Outcomes Enabled¶

Outcome	Description
Traceability	All service logs, spans, and metrics are linked to `traceId`, `agentId`, and `moduleId`.
Telemetry-first design	Every handler, adapter, and controller emits OTEL spans and metric counters.
Policy-aware logging	Logs are structured, redact secrets, and conform to sensitivity rules.
Metrics instrumentation	Prometheus metrics are automatically exposed and tagged by context.
Execution observability	`execution-metadata.json` is generated and attached to every trace.
Studio visualization support	Dashboards, timelines, and health heatmaps become available out-of-the-box.

In summary: The Observability Engineer Agent is the sensor layer of the AI Software Factory. It ensures that every action is visible, every failure is diagnosable, and every output is auditable.

📋 Responsibilities¶

The Observability Engineer Agent is responsible for injecting, configuring, and validating all observability layers required to support traceable, measurable, and diagnosable services in the ConnectSoft AI Software Factory.

These responsibilities span across:

Code injection (metrics, logging, tracing decorators)
Configuration generation (OTEL setup, Serilog config)
Validation and linting (policy conformance, log structure)
Artifact production (execution-metadata.json, /metrics, /healthz)
Studio visibility support (traceId lineage, dashboards, logs)

🧠 Responsibilities Breakdown¶

Responsibility	Description
Inject OpenTelemetry Tracing	Adds span start/stop, propagation logic, and `traceId`, `agentId`, `moduleId` into handlers, services, and API layers.
Instrument Runtime Metrics	Generates metric counters, histograms, and gauges using `Prometheus.Net`, with auto-tagging by tenant and agent.
Inject Structured Logging Configuration	Adds `Serilog`, `ILogger`, or OTEL-compatible log providers, enriched with standard fields like `traceId`, `executionId`, and redaction markers.
Emit `execution-metadata.json`	Produces structured files per skill execution that summarize duration, scope, output, and trace links for later indexing.
Add Health Check Probes	Registers `AddHealthChecks()` endpoints (`/healthz`, `/readyz`, `/livez`) with module-specific checks (e.g., DB, messaging, actors).
Validate Observability Readiness	Ensures that each generated artifact conforms to platform observability standards — spans are present, logs are structured, PII is masked.
Apply Policy and Redaction Rules	Automatically detects PII leaks or unstructured log output and replaces them with redacted values or throws validation errors.
Configure Metrics Exporters	Sets up endpoints (`/metrics`) and formats compatible with Prometheus scraping, including per-tenant scoping.
Emit `ObservabilityReady` Event	Triggers downstream flows (e.g., DevOps, Studio) once observability compliance is verified.
Support Studio Dashboards and Timelines	Supplies required telemetry (`traceId`, `metricType`, `agentId`) for visualization of spans, logs, and pipeline health views.

🚀 Example Responsibilities in Action¶

[✓] BookAppointmentHandler.cs wrapped with OpenTelemetry span
[✓] execution-metadata.json generated with traceId and duration
[✓] /metrics endpoint registered for Prometheus
[✓] Logging uses Serilog with structured output and redaction
[✓] HealthCheck endpoint registered with 2 custom probes
[✓] ObservabilityReady event emitted

✅ Enables platform to track, debug, visualize, and audit all agent-generated components.

In short: The Observability Engineer Agent transforms raw code into runtime-aware systems by embedding telemetry, logs, and policy validation as default behaviors.

📥 Inputs¶

The Observability Engineer Agent consumes blueprints, trace context, and module scaffolds to understand what kind of system it needs to instrument and which observability mechanisms to apply.

These inputs determine:

What kind of telemetry to inject (e.g., REST vs Function vs Actor)
Which context metadata to include (e.g., tenantId, traceId)
Which policy constraints to enforce (e.g., PII redaction, metrics exposure)
Where in the codebase to apply changes (e.g., startup files, controllers, handlers)

🧩 Required Inputs¶

Input Type	Description
📄 Service Blueprint	Defines module type (e.g., REST, Actor, Scheduler), architecture layers, and features to be instrumented.
🧠 Agent Execution Context	Includes `traceId`, `agentId`, `skillId`, `moduleId`, and `tenantId` for trace tagging and output labeling.
📦 Generated Codebase	The scaffolded microservice codebase (with handlers, controllers, DI setup) that will receive instrumentation.
📜 Observability Configuration Profile	Optional overrides for which exporters to use (e.g., OTEL + Prometheus), redaction settings, and health check targets.
📂 Execution Metadata Inputs	Previous traces and metadata from earlier agents that define known contracts, expected log points, or injected code locations.
🔐 Policy Contract (optional)	A JSON or YAML config defining organizational constraints (e.g., “log nothing from PII fields”, “always expose /healthz”).

📘 Example Input Values¶

moduleId: booking-service
traceId: trace-2025-0519-xyz
agentId: observability-engineer
serviceType: RestApi
features:
  - UseMassTransit
  - UseNHibernate
  - UseSemanticKernel
metrics:
  enabled: true
  exporters:
    - Prometheus
redaction:
  sensitivityTags:
    - pii
    - secret
healthChecks:
  endpoints:
    - /healthz
    - /readyz

🔁 Input Resolution Workflow¶

🧠 Load trace context from orchestrator
📄 Parse service blueprint YAML
📁 Scan generated file tree (e.g., Startup.cs, Controllers, Handlers)
📜 Load observability config (default or project-specific)
🔍 Detect injectable points (e.g., AddOpenTelemetry, UseMetrics)
🧪 Plan injection, emit preview, validate readiness

✅ Input Design Principles¶

Inputs are idempotent and composable
Blueprint-driven: all behaviors align with ServiceBlueprint.yaml
Multi-tenant scoped via tenantId
Compatible with agent orchestration, memory system, and Studio dashboards

In short: The Observability Engineer Agent relies on structured blueprints, execution metadata, and config overlays to know where, what, and how to inject telemetry safely and correctly.

📤 Outputs¶

The Observability Engineer Agent produces a set of design-time telemetry artifacts, code augmentations, and metadata files that guarantee the generated module will be fully observable at runtime.

These outputs are injected directly into the service folder structure or saved as metadata for DevOps pipelines, Studio dashboards, and compliance validators.

🧩 Output Types¶

Output	Description
`execution-metadata.json`	Structured record of the trace session, including `traceId`, `agentId`, `skillId`, `moduleId`, duration, and output hash.
Telemetry-Injected Code Files	Modified or generated files like `Startup.cs`, `Program.cs`, controller classes, and service handlers augmented with logging, spans, and metrics decorators.
Logging Configuration Files	Files such as `logger.json`, `serilog.json`, or `appsettings.logging.json` with Serilog or OTEL-compatible log enrichers and sinks.
Tracing Configuration	OTEL bootstrap entries in `Program.cs` or DI registrations (`AddOpenTelemetry`, exporters, resource attributes).
Metrics Endpoint Setup	`.cs` files that expose `/metrics`, counters, histograms, and gauge exporters for Prometheus.
Health Check Configuration	Code in `Startup.cs` and probes like `/healthz`, `/readyz`, and `/livez`, with status response contracts.
Validation Reports (optional)	Internal `.json` or `.md` files that document what observability checks passed or failed during injection.
Telemetry Readme Section	Appended content to `README.md` describing exposed observability endpoints and metrics for human reference.
Event Emission: `ObservabilityReady`	Trigger event indicating successful observability injection, which activates DevOps, QA, or Studio flows.

📂 Output File Examples¶

🔍 `execution-metadata.json`¶

{
  "traceId": "trace-abc123",
  "agentId": "observability-engineer",
  "skillId": "InjectTelemetry",
  "moduleId": "booking-service",
  "durationMs": 1457,
  "status": "Success",
  "outputFiles": [
    "Startup.cs",
    "execution-metadata.json",
    "appsettings.logging.json"
  ],
  "exportedMetrics": ["http_requests_total", "agent_execution_duration_seconds"]
}

🧬 Injected Code Snippet (Startup.cs)¶

services.AddOpenTelemetry()
    .WithTracing(builder => builder
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSource("BookingService")
        .SetResourceBuilder(ResourceBuilder.CreateDefault()
            .AddService("booking-service")
            .AddAttributes(new[]
            {
                new KeyValuePair<string, object>("tenantId", "vetclinic-001"),
                new KeyValuePair<string, object>("traceId", traceId)
            }))
        .AddOtlpExporter());

services.AddHealthChecks()
    .AddCheck<DatabaseHealthCheck>("Database")
    .AddCheck<MessagingHealthCheck>("MassTransitBus");

📈 Injected Metrics Endpoint¶

endpoints.MapMetrics(); // exposes /metrics with Prometheus exporters

📄 README (Telemetry Section)¶

### 🔍 Observability

This service exposes:
- `/metrics` for Prometheus scraping
- `/healthz` and `/readyz` for health status
- OTEL spans tagged with `traceId`, `agentId`, and `moduleId`

All logs follow ConnectSoft structured logging standards.

✅ Output Quality Guarantees¶

Each output:

Includes required trace and module metadata
Passes observability validation checks
Is compatible with OpenTelemetry pipelines
Is machine-readable and Studio-ingestible
Enables runtime traceability and diagnostics by default

In short: The Observability Engineer Agent produces the telemetry foundation that makes all modules visible, diagnosable, and trusted — from Studio dashboards to production monitoring.

📚 Knowledge Base¶

The Observability Engineer Agent operates using a deep, structured knowledge base of:

Observability standards (OpenTelemetry, Prometheus, Serilog)
ConnectSoft trace metadata schema
Telemetry placement rules for Clean Architecture services
Security and redaction policies
Best practices for health checks and structured logs
Code injection templates for each supported runtime (REST API, gRPC, Function, Actor)

This knowledge is used to determine where and how to inject observability instrumentation that will be:

Valid
Secure
Compatible with CI/CD
Usable by Studio dashboards and feedback loops

🧠 Built-In Concepts and Standards¶

Knowledge Area	Contents
🧬 Trace Metadata Schema	Required fields: `traceId`, `agentId`, `skillId`, `moduleId`, `tenantId`, `executionId`. Used across logs, spans, metrics.
📦 File Injection Rules	Startup bootstrap points (`Program.cs`, `Startup.cs`, DI layers), controller decorators, handler wrappers.
📊 Metrics Templates	Metric name conventions (`agent_execution_duration_seconds`, `http_requests_total`, etc.), label tagging, histogram/gauge setup.
📜 Serilog/OTEL Conventions	`MinimumLevel`, `Enrich.FromLogContext()`, structured format templates, sink routing, and enrichment strategies.
📈 Health Check Coverage	Which services require health probes (DB, bus, cache), endpoint naming (`/healthz`, `/readyz`), and standard status response formats.
🔐 Redaction Policies	Regex patterns for secrets, PII fields, `sensitivity: pii` blueprint tags → redacted at log emit time.
🔁 Agent Coordination Hooks	When and how to emit `ObservabilityReady` event, what it must include (traceId, summary, observabilityLevel).

🧩 Blueprint-to-Telemetry Mapping¶

The agent understands how to map service structure into observability hooks:

Service Type	Observability Knowledge Applied
REST API	Inject spans in controllers, logs in middleware, metrics in `UseEndpoints()`
Actor Service	OTEL instrumentation via activity sources and actor lifecycle spans
Azure Function	Use `ILogger` for structured logs, export OTEL spans via decorators
MassTransit	Instrument consumers, message pumps, retry handlers

📘 Sample Prompt Memory Entries¶

{
  "traceId": "trace-2025-05-19-xyz",
  "skillId": "InjectTelemetry",
  "agentId": "observability-engineer",
  "injectedMetric": "http_requests_total",
  "patternUsed": "AspNetCore.RequestPipeline",
  "blueprint": {
    "serviceType": "RestApi",
    "features": ["UseMassTransit", "UseNHibernate"]
  }
}

→ Enables reuse of the same instrumentation logic across future flows with similar structure.

🧠 Memory System Integration¶

The agent queries:

📥 Past execution-metadata.json files for prior trace context
📚 Stored logging and OTEL config examples from prior successful services
🧠 Semantic memory (via vector DB) to reuse optimal injection strategies for similar modules
🔍 Redaction pattern corpus from the Security Engineer Agent

All knowledge is versioned, tagged by skillId, and retrievable by other agents.

In short: The Observability Engineer Agent uses a codified, reusable observability knowledge base to make every service diagnosable, measurable, and safe-by-default — across domains, tenants, and agents.

🔁 Process Flow¶

The Observability Engineer Agent follows a deterministic and modular design-time flow that enables it to inspect a scaffolded module, inject observability capabilities, validate conformance, and emit traceable outputs.

This flow is both repeatable and adaptable based on the service type (e.g., REST API, Azure Function, Actor Host) and blueprint metadata.

🧭 Standard Agent Flow¶

flowchart TD
    Start([Start Agent Execution])
    LoadBlueprint[Load ServiceBlueprint.yaml]
    LoadContext[Load traceId, agentId, tenantId, moduleId]
    DetectType[Determine service type & runtime model]
    ScanFiles[Scan generated service codebase]
    PlanInjection[Plan observability hooks (logs, spans, metrics)]
    InjectTelemetry[Inject logging + tracing + metrics + health checks]
    Validate[Validate observability compliance]
    EmitMetadata[Generate execution-metadata.json]
    EmitEvent[Emit ObservabilityReady event]
    End([Finish & Pass control to DevOps/Studio])

    Start --> LoadBlueprint
    LoadBlueprint --> LoadContext
    LoadContext --> DetectType
    DetectType --> ScanFiles
    ScanFiles --> PlanInjection
    PlanInjection --> InjectTelemetry
    InjectTelemetry --> Validate
    Validate --> EmitMetadata
    EmitMetadata --> EmitEvent
    EmitEvent --> End

Hold "Alt" / "Option" to enable pan & zoom

🔍 Detailed Steps¶

Step	Description
1. Load Blueprint	Parses the service definition to understand service type, enabled features (e.g., MassTransit, NHibernate), and observability expectations.
2. Load Execution Context	Retrieves `traceId`, `agentId`, `skillId`, `tenantId`, and `moduleId` for trace enrichment and Studio integration.
3. Detect Runtime Type	Determines what kind of system is being instrumented (REST, Actor, Function, etc.) to select the proper injection strategy.
4. Scan Files	Walks the scaffolded project files (e.g., `Startup.cs`, `Handlers`, `Controllers`) and identifies injection points.
5. Plan Hook Injection	Maps observability hooks to code locations using predefined templates and previously successful patterns from memory.
6. Inject Telemetry	Adds:
→ `ILogger` usage and config
→ OTEL spans via decorators or middlewares
→ Prometheus metrics with auto-labeling
→ Health checks and endpoints
7. Validate Compliance	Verifies all required identifiers (`traceId`, etc.) are injected, metrics endpoints are exposed, and sensitive fields are redacted.
8. Emit Metadata	Writes `execution-metadata.json` to persist agent run context and injection results for traceability.
9. Emit ObservabilityReady	Publishes an event signaling successful instrumentation so DevOps, QA, or Studio workflows can continue.

🧠 Context Awareness Throughout¶

Decisions vary based on:
- Runtime type (API vs Actor vs Function)
- Tenant-specific observability policies
- PII sensitivity in blueprint inputs
- Memory reuse from prior services

📘 Example Process Snapshot¶

[✓] Service type: REST API
[✓] Traced methods injected: 5
[✓] Metrics endpoint added: /metrics
[✓] Health check: /healthz + 2 probes
[✓] Logger enriched with traceId, agentId, moduleId
[✓] Redaction policy: pii masking enabled
[✓] execution-metadata.json generated
[✓] Event emitted: ObservabilityReady

In summary: The Observability Engineer Agent follows a structured, adaptive flow that transforms scaffolded code into a runtime-observable, traceable, and policy-compliant system module — ready for deployment and Studio introspection.

🧠 Kernel Skills¶

The Observability Engineer Agent relies on a focused set of Semantic Kernel skills to carry out its responsibilities. These skills represent atomic, reusable, and orchestratable capabilities that the agent invokes during service instrumentation.

Each skill maps to a well-defined behavior such as injecting spans, emitting structured logs, validating telemetry, or generating metadata. Together, they form the operational vocabulary of the agent.

🧩 Core Skills¶

Skill Name	Purpose
`InjectTraceDecorators`	Adds OpenTelemetry span instrumentation to controllers, handlers, consumers, and background workers.
`GenerateExecutionMetadata`	Produces a complete `execution-metadata.json` file with `traceId`, `agentId`, `moduleId`, `durationMs`, and injected components.
`EmitLogConfiguration`	Generates or modifies logging setup (e.g., `Serilog`, `ILogger`) to include trace-enriched and structured logs.
`InjectMetricCounters`	Adds Prometheus-compatible counters, histograms, or gauges to services and injects `/metrics` endpoints.
`AddHealthCheckProbes`	Registers and configures `/healthz`, `/readyz`, and any blueprint-defined health endpoints.
`ValidateObservabilityReady`	Verifies that spans, logs, metrics, and trace context are present and conform to ConnectSoft standards.
`EmitObservabilityEvent`	Emits the `ObservabilityReady` event with summary payload for downstream workflows.
`ApplyRedactionPolicies`	Applies masking or removal logic to PII-tagged fields during structured log emission setup.
`ScanTelemetryViolations`	Detects missing `traceId`, unstructured log patterns, and improperly scoped spans.
`ReuseTelemetryTemplate`	Retrieves and applies telemetry injection patterns from previously successful modules via semantic memory.

🧪 Sample Skill Invocation Chain¶

🧠 Agent Execution Trace:

→ InjectTraceDecorators
→ InjectMetricCounters
→ EmitLogConfiguration
→ AddHealthCheckProbes
→ ApplyRedactionPolicies
→ GenerateExecutionMetadata
→ ValidateObservabilityReady
→ EmitObservabilityEvent

Each skill is traceable via skillId and scoped by moduleId, traceId, and agentId, enabling precise execution replay and telemetry correlation.

🔁 Skill Composition and Orchestration¶

Skills are:
- Composable → used individually or bundled into orchestration plans.
- Configurable → adapt based on blueprint and policy profile.
- Idempotent → safe to re-run without duplicating effects.
- Traceable → each invocation emits telemetry (used by itself!).

🧠 Reuse via Skill Memory¶

The agent stores:
- Prior successful metrics injection strategies
- Effective health check setups for similar service topologies
- Optimized log configuration samples from sibling modules
Skills like ReuseTelemetryTemplate fetch and reuse this knowledge from vector DB + metadata index.

In summary: The Observability Engineer Agent is powered by a precision toolbelt of telemetry-focused kernel skills, each enabling it to transform uninstrumented modules into observable, validated, and Studio-integrated assets — autonomously.

⚙️ Technology Stack¶

The Observability Engineer Agent uses a modern, telemetry-driven, .NET-compatible technology stack, tightly integrated into the ConnectSoft AI Software Factory. The stack is selected for its:

Compatibility with generated modules (REST, gRPC, Actor, Function)
Support for trace enrichment, structured logs, and metrics
Alignment with cloud-native, multi-tenant, and OpenTelemetry-first practices
Extensibility for agent-based injection, testing, and validation

🧩 Runtime Technologies Targeted¶

Area	Technology
Application Framework	ASP.NET Core (.NET 8)
Tracing and Instrumentation	OpenTelemetry SDK (`AddOpenTelemetry`, `ActivitySource`)
Logging	Serilog (`ILogger`, structured sinks, enricher support)
Metrics	Prometheus.Net, `Meter`, `Histogram`, `Counter`, `Gauge`
Health Checks	`Microsoft.Extensions.Diagnostics.HealthChecks`, custom probes
Containerization	Docker, `/metrics` and `/healthz` exposed as K8s probes
Observability Output	`execution-metadata.json`, logs, OTLP spans, metrics endpoint
Studio Integration	Trace Explorer, Metric Dashboards, Policy Violation Viewer

🧠 Agent Infrastructure Stack¶

Component	Stack Element
Execution Environment	Semantic Kernel + C# planner bindings
Memory Layer	Azure AI Search / Qdrant (semantic memory), Blob Storage (execution-metadata)
Validation Runtime	In-process .NET analyzers and telemetry pattern matchers
CI/CD Integration	YAML pipelines emit metrics and validation results, integrated with Azure DevOps and Studio

📘 Example Libraries and Tools Used¶

- Microsoft.Extensions.Logging
- OpenTelemetry.Instrumentation.AspNetCore
- OpenTelemetry.Exporter.Console / OTLP
- Serilog.Sinks.Console / Serilog.Sinks.ApplicationInsights
- Prometheus.AspNetCore
- App.Metrics (fallback metrics support)

🧱 Injection Compatibility Matrix¶

Module Type	Supported Tooling
REST APIs	OTEL spans via middleware + Serilog
gRPC Services	ActivitySource + method instrumentation
Azure Functions	Manual `ActivitySource`, structured `ILogger`
Actor Services	Trace injection via message handlers
Background Jobs	Task instrumentation, retry metrics

📊 Export Targets¶

Signal Type	Exported To
Logs	Application Insights / OTEL log processor
Traces	OTLP collector, Studio Trace Viewer
Metrics	Prometheus endpoint (`/metrics`), Grafana
Events	ConnectSoft Event Bus → `ObservabilityReady`, `PolicyViolation`

In summary: The Observability Engineer Agent is built for deep integration into modern .NET telemetry, using OpenTelemetry, Serilog, and Prometheus to enforce traceability across all generated modules — ready for cloud-native and agent-aware deployments.

💬 System Prompt¶

The system prompt is the foundational instruction injected into the Observability Engineer Agent when it is initialized within a ConnectSoft orchestration flow. It sets the persona, responsibility scope, and expected behaviors of the agent across all invocations.

This prompt ensures the agent always acts as a design-time observability enforcer — not a runtime participant — and aligns every action with ConnectSoft’s observability-first, multi-tenant, and traceable-by-default principles.

🧠 Default System Prompt (English – Markdown format)¶

# 🎯 Role: Observability Engineer Agent

You are the Observability Engineer Agent in the ConnectSoft AI Software Factory.

Your job is to ensure that every generated module — including REST APIs, gRPC services, Azure Functions, actors, orchestrators, and background workers — is **fully observable at runtime**.

You operate at **design time**, analyzing generated code and configuration to inject all required telemetry, including:

- OpenTelemetry tracing spans with `traceId`, `agentId`, and `moduleId`
- Structured logging using Serilog or ILogger with enrichment and redaction
- Prometheus-compatible metrics (`http_requests_total`, `agent_execution_duration_seconds`, etc.)
- Health check endpoints (`/healthz`, `/readyz`, `/livez`) with status validation
- An `execution-metadata.json` file describing the trace context and injection results

You must verify that the generated outputs:

- Are compliant with ConnectSoft observability policy
- Do not leak secrets or PII in logs
- Include all required metadata for Studio trace and dashboard views
- Expose the correct endpoints and export formats

If observability violations are found, document them clearly in metadata and trigger failure or warning responses. If injection is successful, emit an `ObservabilityReady` event with a summary of what was instrumented.

You are responsible for making the software **measurable, debuggable, and trusted** — before it is released.

📎 Prompt Metadata¶

Key	Value
`agentId`	`observability-engineer`
`roleType`	`design-time instrumentation`
`category`	`QA, DevOps-Ready, Traceability`
`activatesOnEvent`	`ServiceScaffolded`, `AgentOutputReady`
`emitsEvent`	`ObservabilityReady`, `ObservabilityPolicyViolated`

🧠 Why the Prompt Matters¶

Clarifies that the agent does not participate at runtime
Prevents accidental regeneration of runtime behavior (e.g., health check consumers)
Ensures consistent behavior across modules, domains, and teams
Establishes traceability patterns expected by downstream tools like Studio, DevOps, and Security Orchestrator

In short: The system prompt defines the Observability Engineer Agent’s identity, mission, and operating discipline — ensuring all modules it touches are monitorable, diagnosable, and platform-compliant by design.

💬 Input Prompt Template¶

The input prompt template is the dynamic, structured instruction sent to the agent during execution. It integrates contextual blueprint data, execution metadata, and platform-specific configuration to guide the agent's behavior during a specific observability injection task.

This template is completed by the orchestrator or coordinator agent, combining:

The service type and its runtime characteristics
The agent’s trace and skill identifiers
Optional observability policy overrides
Feature toggles and tenant-specific settings

📑 Template Format (Markdown + YAML Hybrid)¶

# 🛰️ Observability Injection Task

You are the Observability Engineer Agent.

## 🧩 Target Module
- Module ID: `{{moduleId}}`
- Tenant ID: `{{tenantId}}`
- Trace ID: `{{traceId}}`
- Agent ID: `observability-engineer`
- Skill ID: `InjectTelemetry`
- Runtime Type: `{{serviceType}}`

## 📦 Blueprint Features
```yaml
features:
  - UseMassTransit
  - UseNHibernate
  - UseSemanticKernel
serviceType: {{serviceType}}

⚙️ Observability Requirements¶

metrics:
  enabled: true
  exporters:
    - Prometheus
tracing:
  enabled: true
  exporter: OTLP
logging:
  structured: true
  redactSensitive: true
  piiTags:
    - "sensitivity: pii"
    - "secret"
healthChecks:
  endpoints:
    - /healthz
    - /readyz
    - /livez

✅ Expected Outcomes¶

Add OTEL spans to handlers, endpoints, and consumers.
Generate /metrics endpoint exposing Prometheus counters.
Emit structured logs with enrichment and redaction enabled.
Register all requested health check endpoints.
Produce execution-metadata.json summarizing injection results.
Emit the ObservabilityReady event with outcome metadata.

Please inject all telemetry as per ConnectSoft observability standards and log any policy violations found.

---

## 🧪 Example Populated Prompt

```markdown
Module ID: invoice-service  
Tenant ID: petsure-001  
Trace ID: trace-2025-05-19-invoice123  
Service Type: RestApi  
Blueprint Features: UseMassTransit, UseNHibernate  

Expect: OTEL tracing, Prometheus metrics, /metrics + /healthz endpoints, Serilog config with redaction

🧠 Why the Input Prompt Template Matters¶

Enables context-specific injection plans (e.g., add health checks only if required)
Ensures observability remains configurable, declarative, and predictable
Supports tenant-aware policy variations and compliance rules
Powers Studio dashboards with correct module tagging and health visibility

In short: The input prompt template is the instructional payload that lets the Observability Engineer Agent act precisely, securely, and traceably — tailored to each generated module and tenant.

📤 Output Expectations¶

This cycle defines what the Observability Engineer Agent is expected to produce at the end of its execution. Outputs must be machine-parseable, CI/CD-consumable, and Studio-integrated — with strict tagging and compliance to observability-first standards.

All outputs contribute directly to runtime visibility, security, and traceability for agent-generated SaaS modules.

✅ Output Deliverables¶

Output Artifact	Description
🧬 `execution-metadata.json`	JSON file with trace metadata, skillId, duration, injected elements, and status. Used by DevOps, Studio, and audit tooling.
🧠 OTEL Tracing Injection	Modified `.cs` files with `ActivitySource.StartActivity(...)` spans around request handlers, consumers, and workflows.
📊 Prometheus Metrics Endpoint	Code added to expose `/metrics` with per-request counters, latency histograms, and labels (traceId, tenantId, etc.).
📜 Structured Logging Configuration	Serilog or `ILogger` settings injected into `Startup.cs` or `appsettings.logging.json`, with enrichers (`traceId`, `moduleId`, `agentId`) and redaction support.
🩺 Health Check Endpoints	Configuration and controller endpoints for `/healthz`, `/readyz`, and `/livez`, mapped to injected dependency probes.
🧪 Observability Validation Report (optional)	Diagnostic output listing any failed injections or redaction gaps (e.g., unstructured `Console.WriteLine`, missing `traceId`).
📩 Event: `ObservabilityReady`	Event published to the coordination layer once injection is complete and validated. Triggers DevOps, QA, or Studio actions.
📄 README.md Observability Section	Append or generate a section describing observability behavior (e.g., exposed endpoints, span behavior, metrics).

🧾 Example `execution-metadata.json`¶

{
  "traceId": "trace-2025-05-19-abc123",
  "agentId": "observability-engineer",
  "skillId": "InjectTelemetry",
  "moduleId": "invoice-service",
  "status": "Success",
  "durationMs": 1875,
  "exportedMetrics": [
    "http_requests_total",
    "agent_execution_duration_seconds"
  ],
  "injected": {
    "spans": 4,
    "metrics": 3,
    "logEnrichers": 5,
    "healthEndpoints": 2
  },
  "outputFiles": [
    "Startup.cs",
    "execution-metadata.json",
    "MetricsExporter.cs"
  ]
}

🔍 Naming and File Placement Rules¶

File Type	Placement Path
`execution-metadata.json`	`/modules/{moduleId}/metadata/`
Code files (`*.cs`)	`/Application/`, `/Infrastructure/`, or `/API/` layer
`logger.json` / `appsettings.*.json`	`/Configuration/Logging/` or root
`README.md` additions	Appended under "Observability" section

🧠 Behavior Contracts¶

All outputs must:

Be deterministic (replayable)
Be scoped by tenantId, traceId, and moduleId
Comply with platform observability rules (e.g., PII redaction, traceId required)
Be linkable to agent execution in Studio (dashboards, trace viewer, metadata inspector)

In short: The agent must output everything needed for the service to be debuggable, monitorable, auditable, and compliant — starting with instrumentation and ending with trace-ready metadata.

🧠 Memory Model¶

The Observability Engineer Agent uses a hybrid memory architecture to support:

📥 Retrieval of prior telemetry strategies
🔁 Reuse of successful injection patterns
📊 Storage of execution outputs for traceability
🧠 Semantic similarity for selecting observability templates

Memory allows the agent to evolve intelligently and inject proven, context-aware telemetry logic into new modules, based on past executions, blueprint patterns, and skill performance.

🧩 Types of Memory Used¶

Memory Type	Purpose
📂 Execution Metadata Store	Stores `execution-metadata.json` for each agent run, including injected spans, metrics, duration, and trace identifiers.
📚 Structured Metadata Index	Tracks telemetry components, enrichment rules, redaction violations, injection outcomes. Indexed by `agentId`, `moduleId`, `skillId`, etc.
🧠 Semantic Memory (Vector DB)	Retrieves similar past modules and injection plans using blueprint embeddings (`text-embedding-ada-002`, etc.).
🧾 Log Configuration Corpus	Stores reusable logging strategies, Serilog profiles, structured enrichers, and redaction DSLs.
📊 Metrics Dictionary	Canonical list of standard metrics (e.g., `agent_execution_duration_seconds`) and their past usage patterns.
🔍 Telemetry Pattern Library	Templates for injecting OTEL spans, metrics, health checks, with versioned examples across runtimes.

📘 Example Semantic Memory Chunk¶

{
  "traceId": "trace-2025-05-15-xyz789",
  "moduleId": "booking-service",
  "agentId": "observability-engineer",
  "skillId": "InjectMetricCounters",
  "text": "Exposed /metrics with http_requests_total, tenant-scoped labels",
  "embedding": [0.13, 0.74, -0.21, ...],
  "tags": ["metrics", "Prometheus", "traceable", "REST"],
  "status": "Success"
}

🗂️ Memory Scoping Strategy¶

Memory is scoped per:

traceId, projectId, sprintId
agentId, skillId, moduleId
tenantId, environment, serviceType
outputType (metrics, spans, logs, probes)
status (success, failure, warning)

This enables precision filtering, similarity search, and trace-based graph linking.

📎 Agent Reads from Memory To:¶

Use Case	Memory Used
Reuse successful injection code	Semantic memory from past modules with similar structure
Avoid repeat redaction issues	Redaction validation history (tagged `sensitivity: pii`)
Track telemetry conformance	Linter outcomes and prior observability violations
Generate baseline metrics	Metrics pattern dictionary (usage frequency + exporters)

📤 Agent Writes to Memory:¶

execution-metadata.json with full run summary
log-config.json, metrics.json, otel-span-plan.json (optional)
ObservabilityReady event with memory pointer
Traces tagged in Studio with agentId: observability-engineer

🔁 Memory Behavior Characteristics¶

Property	Value
Versioned	✅ Yes (each telemetry plan and injection is tracked by version + timestamp)
Replayable	✅ Execution history can be replayed to re-inject identical instrumentation
Composable	✅ Multiple memory entries may be merged to generate a new injection strategy
Auditable	✅ All memory updates are logged and linked to `traceId` and `executionId`

In short: The Observability Engineer Agent is memory-empowered — capable of reasoning from historical telemetry, reusing known-good observability layouts, and ensuring continuous improvement across all injected services.

✅ Validation Mechanisms¶

Validation is a critical phase in the Observability Engineer Agent’s lifecycle. After telemetry components are injected, the agent performs a design-time verification pass to ensure:

All observability elements are present and correctly scoped
No violations of ConnectSoft traceability or redaction policy exist
The generated outputs are production-safe, Studio-visible, and traceable
Standards for OpenTelemetry, Serilog, metrics, and health endpoints are respected

🔍 What Is Validated?¶

Component	Validation Criteria
Tracing (OTEL)	`traceId`, `agentId`, and `moduleId` must be included in all spans. `ActivitySource` must be correctly configured.
Logging	Logs must be structured (JSON or key-value), enriched with context, and not emit `Console.WriteLine`.
Metrics (Prometheus)	`/metrics` must be exposed, with standard counters and tenant/module labels present.
Health Checks	Required probes (`/healthz`, `/readyz`) must exist and return `200 OK`.
Execution Metadata	`execution-metadata.json` must include trace context, status, duration, and injected item counts.
PII Redaction	Any field marked `sensitivity: pii` or `secret` must not appear in plain logs.
Policy Compliance	Validates against the tenant- or environment-specific observability policy (e.g., redaction, logging level, required metrics).

🧪 Validation Workflow¶

flowchart TD
    Start[Injected Code]
    RunTests[Run Linter Checks]
    CheckSpans[Verify Spans & Trace Enrichment]
    CheckLogs[Check Logging Format & Redaction]
    CheckMetrics[Verify Prometheus Labels & Endpoint]
    CheckHealth[Validate Health Probes & Startup]
    GenerateReport[Create validation summary]
    StatusCheck{All Passed?}
    EmitSuccess[Emit ObservabilityReady]
    EmitViolation[Emit ObservabilityPolicyViolated]

    Start --> RunTests
    RunTests --> CheckSpans --> CheckLogs --> CheckMetrics --> CheckHealth --> GenerateReport --> StatusCheck
    StatusCheck -->|Yes| EmitSuccess
    StatusCheck -->|No| EmitViolation

Hold "Alt" / "Option" to enable pan & zoom

📄 Example Validation Report (Structured JSON)¶

{
  "traceId": "trace-2025-05-19-obs123",
  "agentId": "observability-engineer",
  "status": "Success",
  "validated": {
    "structuredLogs": true,
    "otelSpans": 6,
    "metricsEndpoint": "/metrics",
    "labelsPresent": ["tenantId", "moduleId"],
    "healthEndpoints": ["/healthz", "/readyz"]
  },
  "redactionCheck": {
    "sensitiveFieldLeak": false
  },
  "policyVersion": "obs-policy-v2.3"
}

🧠 Validation Tools and Heuristics¶

Regex scanners for raw Console.WriteLine, string.Format, and unstructured output
Reflection-based checks for OTEL ActivitySource usage and enrichment
Static analysis of DI container (AddOpenTelemetry, AddHealthChecks)
Redaction enforcement on blueprint-defined fields with sensitivity tags
Metrics validator that simulates /metrics scrape and checks required counters

❌ What Causes Validation Failure?¶

Violation	Action Taken
Missing `traceId` in spans	Injection re-run or fail
Unstructured log output detected	Marked as warning or error
`/metrics` endpoint not found	Error — cannot emit `ObservabilityReady`
Sensitive field (`password`, `email`) unredacted	Critical error — blocked release
Health check not returning 200 OK	Warning, may continue in test mode
`execution-metadata.json` not created	Hard failure

📣 Event Emission Based on Outcome¶

Outcome	Event Emitted
All checks pass	`ObservabilityReady`
Minor warnings	`ObservabilityReady + warnings`
Hard failure	`ObservabilityPolicyViolated`

In short: Validation is how the agent earns trust from the platform — ensuring that every observability injection is complete, compliant, and safe before it reaches runtime or CI/CD.

🔁 Retry & Correction Flow¶

Even though the Observability Engineer Agent operates at design-time with deterministic inputs, errors may occur due to:

Incomplete or malformed code scaffolding
Policy mismatches or unexpected configuration states
Toolchain versioning issues (e.g., outdated OTEL packages)
Agent misalignment due to blueprint evolution or conflicting decorators

Therefore, the agent is equipped with a built-in correction strategy, allowing it to retry safely, regenerate selectively, or escalate to human review.

🔧 Retry and Correction Triggers¶

Condition	Triggers Retry?
`execution-metadata.json` missing	✅ Yes (generate fallback)
`/metrics` endpoint not found	✅ Yes (re-inject exporter)
`traceId` not propagated in span	✅ Yes (auto-patch span code)
Unstructured logging detected	✅ Yes (replace with structured pattern)
PII redaction failure	🚨 No retry — escalate to policy violation
Health check endpoint not working	⚠ Retry once with basic default probes
Logger misconfiguration (missing sink)	✅ Re-inject log sink
Previous attempt exceeded `durationBudget`	❌ Do not retry — emit failure report

🧪 Retry & Correction Flow Diagram¶

flowchart TD
    Start[Initial Injection Attempt]
    Validate[Run Observability Validators]
    Pass{Validation Success?}
    RetryConditions[Check Retryable Errors]
    Correct[Auto-Correct Observability Deficiencies]
    Retry[Re-run Injection Steps]
    EndSuccess[Emit ObservabilityReady]
    EndFail[Emit ObservabilityPolicyViolated]

    Start --> Validate --> Pass
    Pass -->|Yes| EndSuccess
    Pass -->|No| RetryConditions
    RetryConditions -->|Retryable| Correct --> Retry --> Validate
    RetryConditions -->|Not Retryable| EndFail

Hold "Alt" / "Option" to enable pan & zoom

🛠️ Auto-Correction Strategies¶

Issue Detected	Auto-Fix Applied
Missing `traceId` in spans	Add span enrichment from DI context
Logger lacks enrichment fields	Re-inject `Enrich.FromLogContext()`
`/metrics` not bound	Add `.MapMetrics()` to endpoint config
Missing OTEL exporter	Add default OTLP exporter with fallback port
Health probe handler absent	Generate `BasicHealthCheck.cs` with 200 OK stub
`ILogger` used incorrectly	Wrap in Serilog with `ForContext()` enrichment

💾 Execution Metadata on Retry¶

Each retry is logged with a new executionId under the same traceId:

{
  "traceId": "trace-2025-05-19-obs123",
  "executionId": "exec-retry-002",
  "agentId": "observability-engineer",
  "retryOf": "exec-001",
  "status": "SuccessAfterRetry",
  "issuesResolved": ["otelSpanMissing", "metricsEndpointAbsent"]
}

→ Enables Studio trace viewer and test history to highlight retried operations.

🚨 Escalation Triggers (Non-Retryable)¶

If the following are detected, the agent emits a violation event and halts:

PII not redacted due to logic gap or bypass
Conflicting observability configurations
Missing blueprint context (e.g., tenantId undefined)
File system permission issues preventing injection
Infinite retry loop detected (retry count > 2)

📣 Events Emitted¶

Outcome	Event
Retry successful	`ObservabilityReady` with `retries: 1`
Retry failed but partial OK	`ObservabilityReadyPartial`
Retry failed critically	`ObservabilityPolicyViolated`

In short: The Observability Engineer Agent is resilient — it detects failures, repairs them autonomously, and only escalates when critical policy risks are involved. It ensures no broken or untraceable service is deployed unnoticed.

🤝 Collaboration Interfaces¶

The Observability Engineer Agent does not work in isolation. It actively collaborates with multiple other agents and platform components, ensuring observability is:

Injected at the right time
Aligned with other concerns (e.g., DevOps, QA, security)
Available for Studio dashboards
Used as part of continuous validation and platform scoring

This cycle defines how the agent communicates, responds to events, and integrates with other personas in the ConnectSoft agentic system.

🔗 Direct Agent Collaborations¶

Collaborating Agent	Interaction Summary
🧱 Microservice Generator Agent	Invokes observability injection after scaffold completion. Shares module path and blueprint trace.
🧠 Backend Developer Agent	The code this agent generates (e.g., handlers, controllers) is instrumented for spans and metrics by the Observability Agent.
🧪 Test Generator Agent	Consumes observability signals to drive test coverage checks (e.g., “traced path exists?”, “log assertions?”).
🔧 DevOps Engineer Agent	Reads `execution-metadata.json` and observability configuration to set up monitoring, alerting, and CI pipelines.
🔐 Security Engineer Agent	Defines redaction policies and PII patterns that must be validated by the observability agent.
📦 Documentation Writer Agent	Appends an “Observability” section to `README.md` using metadata and exposed endpoints detected by this agent.
🧠 Studio Agent	Consumes `ObservabilityReady` event and trace metadata to populate dashboards, graphs, and execution lineage.

📬 Events Emitted & Consumed¶

Event Name	Role
`ServiceScaffolded`	🔄 Consumed → triggers observability injection
`AgentOutputReady`	🔄 Consumed → instrumentation of generated source code
`ObservabilityReady`	✅ Emitted → signals instrumentation complete and verified
`ObservabilityPolicyViolated`	❌ Emitted → signals agent failed to meet required standards
`ExecutionMetadataGenerated`	📤 Emitted → trace metadata with injection details

🧠 Shared Knowledge Contracts¶

Interface	Used With	Purpose
`execution-metadata.json`	DevOps, Studio, QA	Cross-agent trace of what was injected, validated, and exported
`RedactionPolicy.yaml`	Security Agent	List of sensitive fields to redact during log injection
`MetricRegistry.json`	DevOps + Monitoring	Metrics emitted by the agent for setup in Prometheus / Grafana
`TraceEventGraph`	Studio + QA	OTEL spans and links used for visual timelines and audit trails

🧭 Coordination Flow Example¶

sequenceDiagram
    participant Generator as Microservice Generator Agent
    participant Observability as Observability Engineer Agent
    participant DevOps as DevOps Engineer Agent
    participant Studio as Studio Agent

    Generator->>Observability: ServiceScaffolded
    Observability->>Observability: Inject Spans + Logs + Metrics
    Observability->>DevOps: Emit execution-metadata.json
    Observability->>Studio: Emit ObservabilityReady

Hold "Alt" / "Option" to enable pan & zoom

✅ Enables Studio to show trace-linked telemetry, and DevOps to monitor services out-of-the-box.

🧠 Platform Interfaces¶

Orchestration FSM → The Observability Agent’s steps are registered as required before DevOpsAgent can run.
Studio API → Pulls observability reports, status, and linkable spans per agent/module from metadata.
CI/CD Hooks → Fails pipelines if ObservabilityPolicyViolated is received.

In short: The Observability Engineer Agent is a connective node in the agent graph — ensuring telemetry is not just injected, but consumed and validated by the broader ConnectSoft system.

📃 Agent Contract¶

The agent contract defines the formal, declarative specification that governs how the Observability Engineer Agent:

Is invoked by the orchestration system
Accepts and validates its inputs
Emits outputs, events, and metadata
Interoperates with other agents and pipelines
Aligns with ConnectSoft execution protocols

It enables the platform to treat the agent as a pluggable, traceable unit of automation, enforce runtime expectations, and replay or validate behavior during trace analysis.

📄 Contract Overview¶

agentId: observability-engineer
role: "Design-Time Telemetry Injector"
category: "Observability, QA, Platform Readiness"
description: >
  Ensures every generated module is traceable, measurable, and diagnosable at runtime.
  Injects OTEL spans, structured logs, Prometheus metrics, and execution metadata.

triggers:
  - ServiceScaffolded
  - AgentOutputReady

inputs:
  - ServiceBlueprint.yaml
  - traceContext.json
  - ObservabilityPolicy.yaml (optional)
  - Previously generated source code

outputs:
  - execution-metadata.json
  - Updated source files with spans, metrics, logs
  - Prometheus endpoint (`/metrics`)
  - Health checks (`/healthz`, `/readyz`)
  - README.md telemetry summary (optional)
  - Event: ObservabilityReady
  - Event: ObservabilityPolicyViolated

skills:
  - InjectTraceDecorators
  - EmitLogConfiguration
  - InjectMetricCounters
  - AddHealthCheckProbes
  - GenerateExecutionMetadata
  - ValidateObservabilityReady
  - ApplyRedactionPolicies
  - EmitObservabilityEvent

memory:
  scope: [traceId, moduleId, skillId, tenantId]
  stores:
    - executionMetadataStore
    - telemetryInjectionPatterns (semantic memory)
    - redactionHistory
    - metricsUsageCorpus

validations:
  - Structured logs present and enriched
  - Required OTEL spans exist and propagate traceId
  - `/metrics` and `/healthz` endpoints exposed
  - PII fields redacted
  - execution-metadata.json generated and complete

version: "1.0.0"
status: active

✅ Key Capabilities Declared in Contract¶

Capability	Description
Declarative inputs/outputs	Ensures the orchestrator knows what must exist before and after agent execution
Trace-compliant event structure	All emitted events include `traceId`, `agentId`, `moduleId`, `skillId`
Retry-ready with memory linkage	Failed runs can use past memory to retry injection safely
FSM-aware behavior hooks	Used to slot the agent into finite state orchestration flows
Audit and security enforcement	Allows CI/CD pipelines to assert: no release without `ObservabilityReady`

🧠 Example: Contract Usage by Orchestrator¶

when: ServiceScaffolded
then:
  - agent: observability-engineer
    must:
      - emit: execution-metadata.json
      - trigger: ObservabilityReady
    fallback:
      - if: validationFailed
        emit: ObservabilityPolicyViolated

📬 Events Declared in Contract¶

Event	Description
`ObservabilityReady`	Signals successful injection and validation of telemetry
`ObservabilityPolicyViolated`	Raised when redaction, tracing, or metrics standards are not met
`ExecutionMetadataGenerated`	Emits trace-linked metadata about the current agent operation

In short: The agent contract defines how the platform interfaces with the Observability Engineer Agent — treating it not just as code, but as a governed, orchestrated, and testable automation unit in the software factory.

🧭 Studio View Integration¶

The ConnectSoft Studio is the visual control center of the AI Software Factory. It shows:

Agent activity timelines
Execution flows and trace graphs
Metrics dashboards
Policy violations and retry history
Health of generated modules

The Observability Engineer Agent plays a critical role in ensuring that Studio can visualize runtime telemetry, validate execution metadata, and present module readiness with confidence.

🧩 Visual Elements Powered by This Agent¶

Studio Feature	Powered by Agent Output
Trace Explorer	`execution-metadata.json`, OTEL span map
Observability Dashboard	Metrics summary: spans injected, metrics exposed, health endpoints live
Module Timeline View	Timestamps from `execution-metadata.json`, durationMs, retries
Telemetry Coverage Scorecard	Count of traced operations, metrics present, enrichment fields
Policy Compliance Heatmap	Warnings/errors from validation (e.g., missing traceId, unredacted PII)
README + Docs Viewer	Injected “Observability” section from README.md
Agent Retry History	`executionId` lineage with retry status per traceId
Redaction Violation Tracker	Logs showing detection of unmasked sensitive fields

📊 Metrics Visualized in Studio¶

agent_execution_duration_seconds{agentId="observability-engineer"}
otel_span_count{moduleId="invoice-service"}
log_enrichment_coverage{tenantId="petsure-001"}
metrics_endpoint_exposed{status="true"}
policy_violations_total{type="pii_unmasked"}

📈 Example UI Widgets¶

🔍 Execution Trace Summary¶

Module: invoice-service
Agent: observability-engineer
Status: ✅ Ready
Trace ID: trace-2025-05-19-invoice
Execution Duration: 1.84s
Spans Injected: 5
Metrics Exported: 3
Log Enrichers: traceId, agentId, tenantId, moduleId

🚥 Observability Compliance Status¶

graph TD
  A[Startup.cs] -->|✓ OTEL Injected| B[Controller.cs]
  B -->|✓ Metrics Present| C[MessageConsumer.cs]
  C -->|⚠ PII Redaction Warning| D[Log Validator]

Hold "Alt" / "Option" to enable pan & zoom

🧠 Required Metadata for Studio Hooks¶

Field in `execution-metadata.json`	Purpose
`traceId`	Links to global orchestration view
`agentId`	Agent execution lane identification
`moduleId`	Highlights what service/module was affected
`skillId`	Skill-level timing and validation breakdown
`status`, `durationMs`	Timeline and performance overlays
`injected.spans`, `metrics`, etc.	Metric overlays in dashboard
`violations[]`	Trigger policy compliance heatmap

📬 Triggered Studio Events¶

Event	Studio Effect
`ObservabilityReady`	Unlocks “Ready for Deployment” status on module
`ObservabilityPolicyViolated`	Displays violation markers, blocks CI/CD release
`ExecutionMetadataGenerated`	Enables drill-down timeline and retry inspection

💡 Additional Studio Visual Cues¶

Color-coded observability score badges (e.g., 5/5: spans, metrics, logs, probes, redaction)
Tooltips showing which OTEL spans were injected and by whom
Click-through to metrics like request latency, trace coverage per endpoint

In short: The Observability Engineer Agent gives Studio the data, structure, and trace metadata to visualize, inspect, and validate how observable every generated service is — turning telemetry into a first-class experience.

🧬 Traceability Schema¶

The traceability schema defines the core identifiers and telemetry fields that the Observability Engineer Agent injects, validates, and propagates across logs, spans, metrics, and metadata.

This schema ensures:

Every action is traceable across agents, modules, tenants, and executions
Studio, DevOps, and analytics tools can correlate events, metrics, and logs
Multi-tenant safety by isolating observability to tenant and module boundaries
Autonomous feedback loops are grounded in deterministic trace IDs

🧩 Core Identifiers¶

Field	Description
`traceId`	Globally unique identifier for the end-to-end flow of a module or agent plan execution.
`agentId`	The persona executing the skill (e.g., `observability-engineer`).
`skillId`	The function performed (e.g., `InjectTelemetry`, `GenerateExecutionMetadata`).
`moduleId`	Logical service/module under instrumentation (e.g., `invoice-service`).
`tenantId`	Tenant-specific scoping identifier (e.g., `petsure-001`).
`executionId`	Unique ID for a single run or retry of the agent, under a trace.
`environment`	Target runtime context (`dev`, `stage`, `prod`).
`status`	Result of the skill run (`Success`, `RetrySuccess`, `Violation`, etc.).
`durationMs`	Time taken to complete the skill.
`outputChecksum`	Hash (SHA256) of the emitted result set (files, config, metadata).

📘 Example Execution Metadata (Trace-Schema-Aligned)¶

{
  "traceId": "trace-2025-05-19-xyz123",
  "agentId": "observability-engineer",
  "skillId": "InjectMetricCounters",
  "moduleId": "booking-service",
  "tenantId": "vetclinic-001",
  "executionId": "exec-042",
  "environment": "stage",
  "status": "Success",
  "durationMs": 1567,
  "outputChecksum": "sha256:ab12f8d4..."
}

📡 Telemetry Field Mapping¶

Artifact Type	Fields Injected or Emitted
Structured Logs	`traceId`, `agentId`, `moduleId`, `tenantId`, `skillId`, `status`
OTEL Spans	`traceId`, `moduleId`, `agentId` (in `ResourceAttributes`)
Prometheus Metrics	`traceId`, `tenantId`, `moduleId` as labels
Health Endpoints	Metadata enriched with `moduleId`, optional `traceId` in headers
execution-metadata.json	All core traceability fields, including retry lineage

🔄 Retry-Aware Extensions¶

When retries occur:

{
  "retryOf": "exec-041",
  "status": "SuccessAfterRetry",
  "issuesResolved": ["missingMetrics", "untracedSpan"]
}

→ Enables Studio to show retry lineage and DevOps to track recoverability.

🔐 Security-Aware Scoping¶

Every observability record is tagged with tenantId and moduleId to ensure:
No cross-tenant leakage
Proper data partitioning in metrics, logs, dashboards
PII fields are explicitly excluded from trace schema unless redacted

📊 OpenTelemetry Resource Attributes Injected¶

.ResourceBuilder(ResourceBuilder.CreateDefault()
    .AddService("booking-service")
    .AddAttributes(new[]
    {
        new KeyValuePair<string, object>("tenantId", "vetclinic-001"),
        new KeyValuePair<string, object>("moduleId", "booking-service"),
        new KeyValuePair<string, object>("agentId", "observability-engineer")
    }))

✅ Validation Enforcement¶

The agent runs conformance checks to ensure:

Every span/log/metric includes traceId
Every emitted event has agentId + skillId
All files, logs, and configs are uniquely traceable via executionId
Nothing is emitted without tenantId when required

In short: The traceability schema is the backbone of observability governance. It ensures every injected behavior is linked, searchable, accountable, and safe — across all modules, tenants, and execution flows.

🧾 Observability DSL / Metrics Profile¶

The Observability DSL (Domain-Specific Language) is a structured configuration format (typically YAML or JSON) that allows agents, orchestrators, and blueprints to declaratively define:

What observability features must be injected
Which metrics must be exposed
How spans and logs should be enriched
What policies should be enforced per tenant or environment

It gives the Observability Engineer Agent a declarative, machine-readable contract to guide and customize its behavior.

📘 Example Observability DSL (YAML)¶

observability:
  tracing:
    enabled: true
    exporter: otlp
    spanEnrichment:
      include:
        - traceId
        - agentId
        - moduleId
        - tenantId
  logging:
    type: structured
    provider: serilog
    enrichers:
      - traceId
      - tenantId
      - executionId
    redaction:
      piiFields:
        - email
        - ssn
        - password
  metrics:
    enabled: true
    endpoint: /metrics
    exporters:
      - prometheus
    counters:
      - name: http_requests_total
        labels: [tenantId, moduleId]
      - name: agent_execution_duration_seconds
        type: histogram
  healthChecks:
    enabled: true
    endpoints:
      - /healthz
      - /readyz

🧩 Why This DSL Matters¶

Customizability → Blueprint or tenant-specific telemetry behavior
Separation of concerns → Orchestrators configure, agent executes
Consistency → Shared templates across hundreds of modules
Traceability → DSL becomes part of execution-metadata.json
Governance → Policy compliance is declared, not inferred

📊 Supported Metrics Profile¶

The DSL supports predefined metrics contracts used by the Observability Engineer Agent to:

Generate default instrumentation
Expose /metrics endpoint in a Prometheus-friendly format
Label metrics with scoped identifiers (e.g., tenantId, moduleId)

✅ Common Metrics Supported¶

Metric Name	Type	Description
`http_requests_total`	Counter	Number of HTTP requests received, labeled by route, status, tenant
`agent_execution_duration_seconds`	Histogram	Duration of agent skill execution
`trace_span_count_total`	Counter	Count of spans injected per module
`log_lines_emitted_total`	Counter	Total logs emitted, labeled by log level
`pii_redaction_violations_total`	Counter	Number of failed redaction attempts
`metrics_scrape_success_total`	Counter	Number of successful `/metrics` scrapes by Prometheus
`health_probe_status`	Gauge	1 = healthy, 0 = failed, for `/healthz`, `/readyz`

🔐 DSL + Policy Integration¶

DSL can be merged or overridden with tenant or environment-specific policies like:

environments:
  prod:
    logging:
      redaction:
        required: true
    metrics:
      exporters: [prometheus]
      counters:
        - name: sla_violation_total

📥 Input Sources for DSL¶

Blueprint ServiceBlueprint.yaml
Centralized policy registry
Observability profiles per tenant or industry (e.g., HIPAA-safe, PCI-ready)
Memory-injected recommendations from prior executions

🧠 Runtime Use in Agent¶

The agent parses the DSL into an execution plan:

{
  "injectTracing": true,
  "injectMetrics": true,
  "spanAttributes": ["traceId", "moduleId"],
  "logEnrichers": ["tenantId", "executionId"],
  "metricDefinitions": [
    { "name": "http_requests_total", "labels": ["tenantId", "moduleId"] },
    { "name": "agent_execution_duration_seconds", "type": "histogram" }
  ]
}

→ Used internally by kernel skills like InjectMetricCounters and EmitLogConfiguration.

📤 Output: DSL-Aware Metadata¶

Included in execution-metadata.json under observabilityDslSnapshot:

"observabilityDslSnapshot": {
  "metrics": {
    "enabled": true,
    "exporters": ["prometheus"],
    "counters": ["http_requests_total", "agent_execution_duration_seconds"]
  },
  "logging": {
    "redaction": {
      "piiFields": ["email", "ssn"]
    }
  }
}

In short: The Observability DSL allows ConnectSoft to standardize, govern, and adapt observability across thousands of generated services — with the Observability Engineer Agent acting as its runtime executor and compliance enforcer.

🔐 Policy & Security Guardrails¶

The Observability Engineer Agent is not only responsible for injecting telemetry — it must also enforce security, compliance, and tenant-safety constraints as part of every injection. This ensures the factory outputs are:

Safe by construction
Policy-aligned per tenant/environment
Compliant with privacy standards (e.g., masking, PII)
Auditable across all observability behaviors

✅ Policy Enforcement Responsibilities¶

Area	Guardrail Enforced
PII Redaction	Auto-detect fields like `email`, `password`, `ssn` and apply structured masking or redaction logic in logs.
Tenant Isolation	All metrics, logs, and spans must include `tenantId` and be scoped to `moduleId` — no cross-tenant exposure allowed.
Trace Enrichment	`traceId`, `agentId`, and `skillId` must be included in all telemetry events (logs, spans, metrics).
Log Format Compliance	Only structured logging (e.g., JSON) is permitted; usage of `Console.WriteLine` is flagged.
Health Check Safety	All probes must not leak infrastructure state or secrets in error responses.
Metric Label Validation	All exposed metrics must include required labels (e.g., `tenantId`, `moduleId`, `statusCode`) to support observability partitioning.
Sink Hardening	Logs must not be routed to insecure or public sinks unless explicitly allowed by config (e.g., `stdout` only in dev).
Trace Export Scoping	OTEL exports (OTLP) must be scoped by tenant/environment and routed to secure endpoints only.

📄 Example Redaction Policy Config¶

logging:
  redaction:
    enabled: true
    piiFields:
      - email
      - ssn
      - birthdate
    redactionFormat: "[REDACTED]"
    fallbackBehavior: blockInjectionIfViolated

→ Enforced during EmitLogConfiguration and ApplyRedactionPolicies skills.

🚫 Violations That Block Release¶

Violation	Severity	Action
PII fields not redacted	❌ Critical	Block CI/CD
Missing `tenantId` in logs	❌ Major	Fail validation
Logs written to public sink	❌ Major	Emit `ObservabilityPolicyViolated`
Metrics without tenant labels	❌ Major	Flag retry or error
Spans without `traceId`	⚠ Warning	Retry once, escalate if persistent

📢 Events Emitted on Violation¶

{
  "event": "ObservabilityPolicyViolated",
  "traceId": "trace-2025-05-19-policyfail123",
  "agentId": "observability-engineer",
  "violations": [
    "pii_unmasked",
    "metrics_tenantLabel_missing"
  ],
  "status": "Blocked"
}

🧠 Policy Sources¶

ObservabilityPolicy.yaml from blueprint or tenant profile
Factory-wide policy registry (e.g., policies/global/observability/v2.0.yaml)
Dynamically loaded rules based on environment (e.g., prod vs dev)

📋 Validated Enforcement Matrix¶

Enforcement Area	Validated With
PII fields	Regex patterns + blueprint tags (`sensitivity: pii`)
Logging sinks	`appsettings.logging.json` parser
Metric labels	Simulated `/metrics` scrape and schema check
Span structure	OTEL trace analysis from memory snapshot
Health endpoint safety	Static + runtime signature checks for response body

🔄 Policy-Aware Retry Behavior¶

Safe retry allowed for:
- Missing metric labels
- Log format mismatch
Hard fail (no retry) for:
- Unredacted secrets or PII
- Missing tenantId in logs or spans

In short: The Observability Engineer Agent is a security-first observability enforcer — applying tenant-safe, privacy-aware, and compliance-driven guardrails to every telemetry behavior in the generated system.

🧪 Scenario: Instrumenting a Generated REST API Service¶

Let’s walk through a real-world example of how the Observability Engineer Agent fits into a full orchestration flow, from project initialization to deployment readiness, showcasing its traceable, modular, and secure impact.

📘 Scenario: "InvoiceService" Generation¶

🧭 Starting Point:¶

Tenant: petsure-001
Module: invoice-service
Trigger: ServiceScaffolded
Runtime Type: REST API (.NET 8)
Target Environment: staging
DSL Policy: Prometheus + OTEL + Serilog + Redaction required

🔄 Agent Execution Flow¶

sequenceDiagram
    participant Vision as Vision Architect Agent
    participant Generator as Microservice Generator Agent
    participant Observability as Observability Engineer Agent
    participant DevOps as DevOps Engineer Agent
    participant Studio as Studio Agent

    Vision->>Generator: Emit ServiceBlueprint.yaml
    Generator->>Observability: Emit ServiceScaffolded
    Observability->>Observability: Inject Spans, Metrics, Logs, Health Checks
    Observability->>Observability: Validate Redaction, Trace Enrichment
    Observability->>Observability: Generate execution-metadata.json
    Observability->>Studio: Emit ObservabilityReady Event
    Observability->>DevOps: Provide Observability Metadata

Hold "Alt" / "Option" to enable pan & zoom

✅ Outputs Produced¶

Artifact	Description
`Startup.cs`	Injected with `AddOpenTelemetry()`, `AddHealthChecks()`, `UseSerilog()`
`MetricsExporter.cs`	Exposes `/metrics` for Prometheus
`execution-metadata.json`	Includes `traceId`, `agentId`, `status`, injected items, duration
`README.md`	New section titled Observability, listing available endpoints
Event: `ObservabilityReady`	Emitted and consumed by Studio, DevOps, QA
Logs and Metrics: OTEL + Prometheus	Tagged with `tenantId`, `traceId`, `moduleId`, `agentId`
Redaction Validator Report	Confirms all PII fields masked in logs and config

🧠 Snapshot of execution-metadata.json¶

{
  "traceId": "trace-invoice-2025-05-19-abc123",
  "agentId": "observability-engineer",
  "skillId": "InjectTelemetry",
  "moduleId": "invoice-service",
  "executionId": "exec-007",
  "tenantId": "petsure-001",
  "status": "Success",
  "durationMs": 1734,
  "outputFiles": ["Startup.cs", "execution-metadata.json", "MetricsExporter.cs"],
  "spansInjected": 5,
  "metricsExposed": ["http_requests_total", "agent_execution_duration_seconds"],
  "healthChecks": ["/healthz", "/readyz"],
  "logEnrichers": ["traceId", "agentId", "tenantId", "moduleId"],
  "violations": []
}

📊 Studio Dashboard View¶

Module: invoice-service Status: ✅ ObservabilityReady Trace Coverage: 5 spans Metrics: Prometheus ✅ Log Format: Structured JSON with full enrichment Redaction: Passed Duration: 1.7s Execution ID: exec-007 Retry History: None

🚀 Outcome¶

DevOps receives metadata and continues deployment to staging
QA Agent starts validating with telemetry visibility
Studio shows full trace, metrics graph, and observability heatmap
PII risk audit confirms compliance
Release Manager Agent approves promotion to production

In summary: This end-to-end flow demonstrates how the Observability Engineer Agent activates post-scaffolding, injects secure and traceable telemetry, validates all outputs, and signals readiness for downstream automation — all with zero manual intervention.

🛰️ Observability Engineer Agent Specification¶

🎯 Purpose¶

🧭 Platform Placement¶

📊 Positioning Diagram¶

🧠 Why It Exists¶

✅ Key Outcomes Enabled¶

📋 Responsibilities¶

🧠 Responsibilities Breakdown¶

🚀 Example Responsibilities in Action¶

📥 Inputs¶

🧩 Required Inputs¶

📘 Example Input Values¶

🔁 Input Resolution Workflow¶

✅ Input Design Principles¶

📤 Outputs¶

🧩 Output Types¶

📂 Output File Examples¶

🔍 execution-metadata.json¶

🧬 Injected Code Snippet (Startup.cs)¶

📈 Injected Metrics Endpoint¶

📄 README (Telemetry Section)¶

✅ Output Quality Guarantees¶

📚 Knowledge Base¶

🧠 Built-In Concepts and Standards¶

🧩 Blueprint-to-Telemetry Mapping¶

📘 Sample Prompt Memory Entries¶

🧠 Memory System Integration¶

🔁 Process Flow¶

🧭 Standard Agent Flow¶

🔍 Detailed Steps¶

🧠 Context Awareness Throughout¶

📘 Example Process Snapshot¶

🧠 Kernel Skills¶

🧩 Core Skills¶

🧪 Sample Skill Invocation Chain¶

🔁 Skill Composition and Orchestration¶

🧠 Reuse via Skill Memory¶

⚙️ Technology Stack¶

🧩 Runtime Technologies Targeted¶

🧠 Agent Infrastructure Stack¶

📘 Example Libraries and Tools Used¶

🧱 Injection Compatibility Matrix¶

📊 Export Targets¶

💬 System Prompt¶

🧠 Default System Prompt (English – Markdown format)¶

📎 Prompt Metadata¶

🧠 Why the Prompt Matters¶

💬 Input Prompt Template¶

📑 Template Format (Markdown + YAML Hybrid)¶

⚙️ Observability Requirements¶

✅ Expected Outcomes¶

🧠 Why the Input Prompt Template Matters¶

📤 Output Expectations¶

✅ Output Deliverables¶

🧾 Example execution-metadata.json¶

🔍 Naming and File Placement Rules¶

🧠 Behavior Contracts¶

🧠 Memory Model¶

🧩 Types of Memory Used¶

📘 Example Semantic Memory Chunk¶

🗂️ Memory Scoping Strategy¶

📎 Agent Reads from Memory To:¶

📤 Agent Writes to Memory:¶

🔁 Memory Behavior Characteristics¶

✅ Validation Mechanisms¶

🔍 What Is Validated?¶

🧪 Validation Workflow¶

📄 Example Validation Report (Structured JSON)¶

🧠 Validation Tools and Heuristics¶

❌ What Causes Validation Failure?¶

📣 Event Emission Based on Outcome¶

🔁 Retry & Correction Flow¶

🔧 Retry and Correction Triggers¶

🧪 Retry & Correction Flow Diagram¶

🛠️ Auto-Correction Strategies¶

💾 Execution Metadata on Retry¶

🚨 Escalation Triggers (Non-Retryable)¶

📣 Events Emitted¶

🤝 Collaboration Interfaces¶

🔗 Direct Agent Collaborations¶

🔍 `execution-metadata.json`¶

🧾 Example `execution-metadata.json`¶