Skip to content

πŸ›°οΈ Observability Engineer Agent Specification

🎯 Purpose

The Observability Engineer Agent ensures that every artifact produced by the ConnectSoft AI Software Factory β€” including services, agents, orchestrators, and infrastructure modules β€” is fully observable at runtime, traceable across pipelines, and compatible with compliance, debugging, and optimization flows.

This agent operates at design-time, injecting the runtime constructs needed to emit:

  • Structured logs with trace and context metadata
  • OpenTelemetry traces with traceId, agentId, moduleId
  • Prometheus-compatible metrics (latency, execution count, status)
  • Execution metadata artifacts (execution-metadata.json)
  • Health probes and diagnostics endpoints

🧭 Platform Placement

The agent activates immediately after scaffolding, before DevOps orchestration, and acts as a final observability linter and injector in the software factory pipeline.

πŸ“Š Positioning Diagram

flowchart LR
    Vision[Vision Architect Agent]
    Arch[Solution Architect Agent]
    Gen[Microservice Generator Agent]
    Obs[Observability Engineer Agent]
    DevOps[DevOps Engineer Agent]
    QA[Test Generator Agent]
    Release[Release Manager Agent]

    Vision --> Arch
    Arch --> Gen
    Gen --> Obs
    Obs --> DevOps
    Obs --> QA
    DevOps --> Release
Hold "Alt" / "Option" to enable pan & zoom

The Observability Engineer Agent is the critical bridge between generation and deployment, ensuring all downstream agents and Studio modules can trace, score, and visualize what was produced.


🧠 Why It Exists

Without this agent, the factory would suffer from:

  • Opaque outputs β€” code runs but no traces or metrics are emitted
  • Inconsistent diagnostics β€” missing traceId or tenantId in logs
  • Security risks β€” sensitive fields logged improperly
  • Undebuggable failures β€” no insight into where/why agents or services fail
  • Disconnected Studio views β€” missing visual timelines, dashboards, and audit trails

This agent makes observability not optional, but enforced by design.


βœ… Key Outcomes Enabled

Outcome Description
Traceability All service logs, spans, and metrics are linked to traceId, agentId, and moduleId.
Telemetry-first design Every handler, adapter, and controller emits OTEL spans and metric counters.
Policy-aware logging Logs are structured, redact secrets, and conform to sensitivity rules.
Metrics instrumentation Prometheus metrics are automatically exposed and tagged by context.
Execution observability execution-metadata.json is generated and attached to every trace.
Studio visualization support Dashboards, timelines, and health heatmaps become available out-of-the-box.

In summary: The Observability Engineer Agent is the sensor layer of the AI Software Factory. It ensures that every action is visible, every failure is diagnosable, and every output is auditable.


πŸ“‹ Responsibilities

The Observability Engineer Agent is responsible for injecting, configuring, and validating all observability layers required to support traceable, measurable, and diagnosable services in the ConnectSoft AI Software Factory.

These responsibilities span across:

  • Code injection (metrics, logging, tracing decorators)
  • Configuration generation (OTEL setup, Serilog config)
  • Validation and linting (policy conformance, log structure)
  • Artifact production (execution-metadata.json, /metrics, /healthz)
  • Studio visibility support (traceId lineage, dashboards, logs)

🧠 Responsibilities Breakdown

Responsibility Description
Inject OpenTelemetry Tracing Adds span start/stop, propagation logic, and traceId, agentId, moduleId into handlers, services, and API layers.
Instrument Runtime Metrics Generates metric counters, histograms, and gauges using Prometheus.Net, with auto-tagging by tenant and agent.
Inject Structured Logging Configuration Adds Serilog, ILogger, or OTEL-compatible log providers, enriched with standard fields like traceId, executionId, and redaction markers.
Emit execution-metadata.json Produces structured files per skill execution that summarize duration, scope, output, and trace links for later indexing.
Add Health Check Probes Registers AddHealthChecks() endpoints (/healthz, /readyz, /livez) with module-specific checks (e.g., DB, messaging, actors).
Validate Observability Readiness Ensures that each generated artifact conforms to platform observability standards β€” spans are present, logs are structured, PII is masked.
Apply Policy and Redaction Rules Automatically detects PII leaks or unstructured log output and replaces them with redacted values or throws validation errors.
Configure Metrics Exporters Sets up endpoints (/metrics) and formats compatible with Prometheus scraping, including per-tenant scoping.
Emit ObservabilityReady Event Triggers downstream flows (e.g., DevOps, Studio) once observability compliance is verified.
Support Studio Dashboards and Timelines Supplies required telemetry (traceId, metricType, agentId) for visualization of spans, logs, and pipeline health views.

πŸš€ Example Responsibilities in Action

[βœ“] BookAppointmentHandler.cs wrapped with OpenTelemetry span
[βœ“] execution-metadata.json generated with traceId and duration
[βœ“] /metrics endpoint registered for Prometheus
[βœ“] Logging uses Serilog with structured output and redaction
[βœ“] HealthCheck endpoint registered with 2 custom probes
[βœ“] ObservabilityReady event emitted

βœ… Enables platform to track, debug, visualize, and audit all agent-generated components.


In short: The Observability Engineer Agent transforms raw code into runtime-aware systems by embedding telemetry, logs, and policy validation as default behaviors.


πŸ“₯ Inputs

The Observability Engineer Agent consumes blueprints, trace context, and module scaffolds to understand what kind of system it needs to instrument and which observability mechanisms to apply.

These inputs determine:

  • What kind of telemetry to inject (e.g., REST vs Function vs Actor)
  • Which context metadata to include (e.g., tenantId, traceId)
  • Which policy constraints to enforce (e.g., PII redaction, metrics exposure)
  • Where in the codebase to apply changes (e.g., startup files, controllers, handlers)

🧩 Required Inputs

Input Type Description
πŸ“„ Service Blueprint Defines module type (e.g., REST, Actor, Scheduler), architecture layers, and features to be instrumented.
🧠 Agent Execution Context Includes traceId, agentId, skillId, moduleId, and tenantId for trace tagging and output labeling.
πŸ“¦ Generated Codebase The scaffolded microservice codebase (with handlers, controllers, DI setup) that will receive instrumentation.
πŸ“œ Observability Configuration Profile Optional overrides for which exporters to use (e.g., OTEL + Prometheus), redaction settings, and health check targets.
πŸ“‚ Execution Metadata Inputs Previous traces and metadata from earlier agents that define known contracts, expected log points, or injected code locations.
πŸ” Policy Contract (optional) A JSON or YAML config defining organizational constraints (e.g., β€œlog nothing from PII fields”, β€œalways expose /healthz”).

πŸ“˜ Example Input Values

moduleId: booking-service
traceId: trace-2025-0519-xyz
agentId: observability-engineer
serviceType: RestApi
features:
  - UseMassTransit
  - UseNHibernate
  - UseSemanticKernel
metrics:
  enabled: true
  exporters:
    - Prometheus
redaction:
  sensitivityTags:
    - pii
    - secret
healthChecks:
  endpoints:
    - /healthz
    - /readyz

πŸ” Input Resolution Workflow

  1. 🧠 Load trace context from orchestrator
  2. πŸ“„ Parse service blueprint YAML
  3. πŸ“ Scan generated file tree (e.g., Startup.cs, Controllers, Handlers)
  4. πŸ“œ Load observability config (default or project-specific)
  5. πŸ” Detect injectable points (e.g., AddOpenTelemetry, UseMetrics)
  6. πŸ§ͺ Plan injection, emit preview, validate readiness

βœ… Input Design Principles

  • Inputs are idempotent and composable
  • Blueprint-driven: all behaviors align with ServiceBlueprint.yaml
  • Multi-tenant scoped via tenantId
  • Compatible with agent orchestration, memory system, and Studio dashboards

In short: The Observability Engineer Agent relies on structured blueprints, execution metadata, and config overlays to know where, what, and how to inject telemetry safely and correctly.


πŸ“€ Outputs

The Observability Engineer Agent produces a set of design-time telemetry artifacts, code augmentations, and metadata files that guarantee the generated module will be fully observable at runtime.

These outputs are injected directly into the service folder structure or saved as metadata for DevOps pipelines, Studio dashboards, and compliance validators.


🧩 Output Types

Output Description
execution-metadata.json Structured record of the trace session, including traceId, agentId, skillId, moduleId, duration, and output hash.
Telemetry-Injected Code Files Modified or generated files like Startup.cs, Program.cs, controller classes, and service handlers augmented with logging, spans, and metrics decorators.
Logging Configuration Files Files such as logger.json, serilog.json, or appsettings.logging.json with Serilog or OTEL-compatible log enrichers and sinks.
Tracing Configuration OTEL bootstrap entries in Program.cs or DI registrations (AddOpenTelemetry, exporters, resource attributes).
Metrics Endpoint Setup .cs files that expose /metrics, counters, histograms, and gauge exporters for Prometheus.
Health Check Configuration Code in Startup.cs and probes like /healthz, /readyz, and /livez, with status response contracts.
Validation Reports (optional) Internal .json or .md files that document what observability checks passed or failed during injection.
Telemetry Readme Section Appended content to README.md describing exposed observability endpoints and metrics for human reference.
Event Emission: ObservabilityReady Trigger event indicating successful observability injection, which activates DevOps, QA, or Studio flows.

πŸ“‚ Output File Examples

πŸ” execution-metadata.json

{
  "traceId": "trace-abc123",
  "agentId": "observability-engineer",
  "skillId": "InjectTelemetry",
  "moduleId": "booking-service",
  "durationMs": 1457,
  "status": "Success",
  "outputFiles": [
    "Startup.cs",
    "execution-metadata.json",
    "appsettings.logging.json"
  ],
  "exportedMetrics": ["http_requests_total", "agent_execution_duration_seconds"]
}

🧬 Injected Code Snippet (Startup.cs)

services.AddOpenTelemetry()
    .WithTracing(builder => builder
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSource("BookingService")
        .SetResourceBuilder(ResourceBuilder.CreateDefault()
            .AddService("booking-service")
            .AddAttributes(new[]
            {
                new KeyValuePair<string, object>("tenantId", "vetclinic-001"),
                new KeyValuePair<string, object>("traceId", traceId)
            }))
        .AddOtlpExporter());

services.AddHealthChecks()
    .AddCheck<DatabaseHealthCheck>("Database")
    .AddCheck<MessagingHealthCheck>("MassTransitBus");

πŸ“ˆ Injected Metrics Endpoint

endpoints.MapMetrics(); // exposes /metrics with Prometheus exporters

πŸ“„ README (Telemetry Section)

### πŸ” Observability

This service exposes:
- `/metrics` for Prometheus scraping
- `/healthz` and `/readyz` for health status
- OTEL spans tagged with `traceId`, `agentId`, and `moduleId`

All logs follow ConnectSoft structured logging standards.

βœ… Output Quality Guarantees

Each output:

  • Includes required trace and module metadata
  • Passes observability validation checks
  • Is compatible with OpenTelemetry pipelines
  • Is machine-readable and Studio-ingestible
  • Enables runtime traceability and diagnostics by default

In short: The Observability Engineer Agent produces the telemetry foundation that makes all modules visible, diagnosable, and trusted β€” from Studio dashboards to production monitoring.


πŸ“š Knowledge Base

The Observability Engineer Agent operates using a deep, structured knowledge base of:

  • Observability standards (OpenTelemetry, Prometheus, Serilog)
  • ConnectSoft trace metadata schema
  • Telemetry placement rules for Clean Architecture services
  • Security and redaction policies
  • Best practices for health checks and structured logs
  • Code injection templates for each supported runtime (REST API, gRPC, Function, Actor)

This knowledge is used to determine where and how to inject observability instrumentation that will be:

  • Valid
  • Secure
  • Compatible with CI/CD
  • Usable by Studio dashboards and feedback loops

🧠 Built-In Concepts and Standards

Knowledge Area Contents
🧬 Trace Metadata Schema Required fields: traceId, agentId, skillId, moduleId, tenantId, executionId. Used across logs, spans, metrics.
πŸ“¦ File Injection Rules Startup bootstrap points (Program.cs, Startup.cs, DI layers), controller decorators, handler wrappers.
πŸ“Š Metrics Templates Metric name conventions (agent_execution_duration_seconds, http_requests_total, etc.), label tagging, histogram/gauge setup.
πŸ“œ Serilog/OTEL Conventions MinimumLevel, Enrich.FromLogContext(), structured format templates, sink routing, and enrichment strategies.
πŸ“ˆ Health Check Coverage Which services require health probes (DB, bus, cache), endpoint naming (/healthz, /readyz), and standard status response formats.
πŸ” Redaction Policies Regex patterns for secrets, PII fields, sensitivity: pii blueprint tags β†’ redacted at log emit time.
πŸ” Agent Coordination Hooks When and how to emit ObservabilityReady event, what it must include (traceId, summary, observabilityLevel).

🧩 Blueprint-to-Telemetry Mapping

The agent understands how to map service structure into observability hooks:

Service Type Observability Knowledge Applied
REST API Inject spans in controllers, logs in middleware, metrics in UseEndpoints()
Actor Service OTEL instrumentation via activity sources and actor lifecycle spans
Azure Function Use ILogger for structured logs, export OTEL spans via decorators
MassTransit Instrument consumers, message pumps, retry handlers

πŸ“˜ Sample Prompt Memory Entries

{
  "traceId": "trace-2025-05-19-xyz",
  "skillId": "InjectTelemetry",
  "agentId": "observability-engineer",
  "injectedMetric": "http_requests_total",
  "patternUsed": "AspNetCore.RequestPipeline",
  "blueprint": {
    "serviceType": "RestApi",
    "features": ["UseMassTransit", "UseNHibernate"]
  }
}

β†’ Enables reuse of the same instrumentation logic across future flows with similar structure.


🧠 Memory System Integration

The agent queries:

  • πŸ“₯ Past execution-metadata.json files for prior trace context
  • πŸ“š Stored logging and OTEL config examples from prior successful services
  • 🧠 Semantic memory (via vector DB) to reuse optimal injection strategies for similar modules
  • πŸ” Redaction pattern corpus from the Security Engineer Agent

All knowledge is versioned, tagged by skillId, and retrievable by other agents.


In short: The Observability Engineer Agent uses a codified, reusable observability knowledge base to make every service diagnosable, measurable, and safe-by-default β€” across domains, tenants, and agents.


πŸ” Process Flow

The Observability Engineer Agent follows a deterministic and modular design-time flow that enables it to inspect a scaffolded module, inject observability capabilities, validate conformance, and emit traceable outputs.

This flow is both repeatable and adaptable based on the service type (e.g., REST API, Azure Function, Actor Host) and blueprint metadata.


🧭 Standard Agent Flow

flowchart TD
    Start([Start Agent Execution])
    LoadBlueprint[Load ServiceBlueprint.yaml]
    LoadContext[Load traceId, agentId, tenantId, moduleId]
    DetectType[Determine service type & runtime model]
    ScanFiles[Scan generated service codebase]
    PlanInjection[Plan observability hooks (logs, spans, metrics)]
    InjectTelemetry[Inject logging + tracing + metrics + health checks]
    Validate[Validate observability compliance]
    EmitMetadata[Generate execution-metadata.json]
    EmitEvent[Emit ObservabilityReady event]
    End([Finish & Pass control to DevOps/Studio])

    Start --> LoadBlueprint
    LoadBlueprint --> LoadContext
    LoadContext --> DetectType
    DetectType --> ScanFiles
    ScanFiles --> PlanInjection
    PlanInjection --> InjectTelemetry
    InjectTelemetry --> Validate
    Validate --> EmitMetadata
    EmitMetadata --> EmitEvent
    EmitEvent --> End
Hold "Alt" / "Option" to enable pan & zoom

πŸ” Detailed Steps

Step Description
1. Load Blueprint Parses the service definition to understand service type, enabled features (e.g., MassTransit, NHibernate), and observability expectations.
2. Load Execution Context Retrieves traceId, agentId, skillId, tenantId, and moduleId for trace enrichment and Studio integration.
3. Detect Runtime Type Determines what kind of system is being instrumented (REST, Actor, Function, etc.) to select the proper injection strategy.
4. Scan Files Walks the scaffolded project files (e.g., Startup.cs, Handlers, Controllers) and identifies injection points.
5. Plan Hook Injection Maps observability hooks to code locations using predefined templates and previously successful patterns from memory.
6. Inject Telemetry Adds:
β†’ ILogger usage and config
β†’ OTEL spans via decorators or middlewares
β†’ Prometheus metrics with auto-labeling
β†’ Health checks and endpoints
7. Validate Compliance Verifies all required identifiers (traceId, etc.) are injected, metrics endpoints are exposed, and sensitive fields are redacted.
8. Emit Metadata Writes execution-metadata.json to persist agent run context and injection results for traceability.
9. Emit ObservabilityReady Publishes an event signaling successful instrumentation so DevOps, QA, or Studio workflows can continue.

🧠 Context Awareness Throughout

  • Decisions vary based on:
    • Runtime type (API vs Actor vs Function)
    • Tenant-specific observability policies
    • PII sensitivity in blueprint inputs
    • Memory reuse from prior services

πŸ“˜ Example Process Snapshot

[βœ“] Service type: REST API
[βœ“] Traced methods injected: 5
[βœ“] Metrics endpoint added: /metrics
[βœ“] Health check: /healthz + 2 probes
[βœ“] Logger enriched with traceId, agentId, moduleId
[βœ“] Redaction policy: pii masking enabled
[βœ“] execution-metadata.json generated
[βœ“] Event emitted: ObservabilityReady

In summary: The Observability Engineer Agent follows a structured, adaptive flow that transforms scaffolded code into a runtime-observable, traceable, and policy-compliant system module β€” ready for deployment and Studio introspection.


🧠 Kernel Skills

The Observability Engineer Agent relies on a focused set of Semantic Kernel skills to carry out its responsibilities. These skills represent atomic, reusable, and orchestratable capabilities that the agent invokes during service instrumentation.

Each skill maps to a well-defined behavior such as injecting spans, emitting structured logs, validating telemetry, or generating metadata. Together, they form the operational vocabulary of the agent.


🧩 Core Skills

Skill Name Purpose
InjectTraceDecorators Adds OpenTelemetry span instrumentation to controllers, handlers, consumers, and background workers.
GenerateExecutionMetadata Produces a complete execution-metadata.json file with traceId, agentId, moduleId, durationMs, and injected components.
EmitLogConfiguration Generates or modifies logging setup (e.g., Serilog, ILogger) to include trace-enriched and structured logs.
InjectMetricCounters Adds Prometheus-compatible counters, histograms, or gauges to services and injects /metrics endpoints.
AddHealthCheckProbes Registers and configures /healthz, /readyz, and any blueprint-defined health endpoints.
ValidateObservabilityReady Verifies that spans, logs, metrics, and trace context are present and conform to ConnectSoft standards.
EmitObservabilityEvent Emits the ObservabilityReady event with summary payload for downstream workflows.
ApplyRedactionPolicies Applies masking or removal logic to PII-tagged fields during structured log emission setup.
ScanTelemetryViolations Detects missing traceId, unstructured log patterns, and improperly scoped spans.
ReuseTelemetryTemplate Retrieves and applies telemetry injection patterns from previously successful modules via semantic memory.

πŸ§ͺ Sample Skill Invocation Chain

🧠 Agent Execution Trace:

β†’ InjectTraceDecorators
β†’ InjectMetricCounters
β†’ EmitLogConfiguration
β†’ AddHealthCheckProbes
β†’ ApplyRedactionPolicies
β†’ GenerateExecutionMetadata
β†’ ValidateObservabilityReady
β†’ EmitObservabilityEvent

Each skill is traceable via skillId and scoped by moduleId, traceId, and agentId, enabling precise execution replay and telemetry correlation.


πŸ” Skill Composition and Orchestration

  • Skills are:
    • Composable β†’ used individually or bundled into orchestration plans.
    • Configurable β†’ adapt based on blueprint and policy profile.
    • Idempotent β†’ safe to re-run without duplicating effects.
    • Traceable β†’ each invocation emits telemetry (used by itself!).

🧠 Reuse via Skill Memory

  • The agent stores:
    • Prior successful metrics injection strategies
    • Effective health check setups for similar service topologies
    • Optimized log configuration samples from sibling modules
  • Skills like ReuseTelemetryTemplate fetch and reuse this knowledge from vector DB + metadata index.

In summary: The Observability Engineer Agent is powered by a precision toolbelt of telemetry-focused kernel skills, each enabling it to transform uninstrumented modules into observable, validated, and Studio-integrated assets β€” autonomously.


βš™οΈ Technology Stack

The Observability Engineer Agent uses a modern, telemetry-driven, .NET-compatible technology stack, tightly integrated into the ConnectSoft AI Software Factory. The stack is selected for its:

  • Compatibility with generated modules (REST, gRPC, Actor, Function)
  • Support for trace enrichment, structured logs, and metrics
  • Alignment with cloud-native, multi-tenant, and OpenTelemetry-first practices
  • Extensibility for agent-based injection, testing, and validation

🧩 Runtime Technologies Targeted

Area Technology
Application Framework ASP.NET Core (.NET 8)
Tracing and Instrumentation OpenTelemetry SDK (AddOpenTelemetry, ActivitySource)
Logging Serilog (ILogger, structured sinks, enricher support)
Metrics Prometheus.Net, Meter, Histogram, Counter, Gauge
Health Checks Microsoft.Extensions.Diagnostics.HealthChecks, custom probes
Containerization Docker, /metrics and /healthz exposed as K8s probes
Observability Output execution-metadata.json, logs, OTLP spans, metrics endpoint
Studio Integration Trace Explorer, Metric Dashboards, Policy Violation Viewer

🧠 Agent Infrastructure Stack

Component Stack Element
Execution Environment Semantic Kernel + C# planner bindings
Memory Layer Azure AI Search / Qdrant (semantic memory), Blob Storage (execution-metadata)
Validation Runtime In-process .NET analyzers and telemetry pattern matchers
CI/CD Integration YAML pipelines emit metrics and validation results, integrated with Azure DevOps and Studio

πŸ“˜ Example Libraries and Tools Used

- Microsoft.Extensions.Logging
- OpenTelemetry.Instrumentation.AspNetCore
- OpenTelemetry.Exporter.Console / OTLP
- Serilog.Sinks.Console / Serilog.Sinks.ApplicationInsights
- Prometheus.AspNetCore
- App.Metrics (fallback metrics support)

🧱 Injection Compatibility Matrix

Module Type Supported Tooling
REST APIs OTEL spans via middleware + Serilog
gRPC Services ActivitySource + method instrumentation
Azure Functions Manual ActivitySource, structured ILogger
Actor Services Trace injection via message handlers
Background Jobs Task instrumentation, retry metrics

πŸ“Š Export Targets

Signal Type Exported To
Logs Application Insights / OTEL log processor
Traces OTLP collector, Studio Trace Viewer
Metrics Prometheus endpoint (/metrics), Grafana
Events ConnectSoft Event Bus β†’ ObservabilityReady, PolicyViolation

In summary: The Observability Engineer Agent is built for deep integration into modern .NET telemetry, using OpenTelemetry, Serilog, and Prometheus to enforce traceability across all generated modules β€” ready for cloud-native and agent-aware deployments.


πŸ’¬ System Prompt

The system prompt is the foundational instruction injected into the Observability Engineer Agent when it is initialized within a ConnectSoft orchestration flow. It sets the persona, responsibility scope, and expected behaviors of the agent across all invocations.

This prompt ensures the agent always acts as a design-time observability enforcer β€” not a runtime participant β€” and aligns every action with ConnectSoft’s observability-first, multi-tenant, and traceable-by-default principles.


🧠 Default System Prompt (English – Markdown format)

# 🎯 Role: Observability Engineer Agent

You are the Observability Engineer Agent in the ConnectSoft AI Software Factory.

Your job is to ensure that every generated module β€” including REST APIs, gRPC services, Azure Functions, actors, orchestrators, and background workers β€” is **fully observable at runtime**.

You operate at **design time**, analyzing generated code and configuration to inject all required telemetry, including:

- OpenTelemetry tracing spans with `traceId`, `agentId`, and `moduleId`
- Structured logging using Serilog or ILogger with enrichment and redaction
- Prometheus-compatible metrics (`http_requests_total`, `agent_execution_duration_seconds`, etc.)
- Health check endpoints (`/healthz`, `/readyz`, `/livez`) with status validation
- An `execution-metadata.json` file describing the trace context and injection results

You must verify that the generated outputs:

- Are compliant with ConnectSoft observability policy
- Do not leak secrets or PII in logs
- Include all required metadata for Studio trace and dashboard views
- Expose the correct endpoints and export formats

If observability violations are found, document them clearly in metadata and trigger failure or warning responses. If injection is successful, emit an `ObservabilityReady` event with a summary of what was instrumented.

You are responsible for making the software **measurable, debuggable, and trusted** β€” before it is released.

πŸ“Ž Prompt Metadata

Key Value
agentId observability-engineer
roleType design-time instrumentation
category QA, DevOps-Ready, Traceability
activatesOnEvent ServiceScaffolded, AgentOutputReady
emitsEvent ObservabilityReady, ObservabilityPolicyViolated

🧠 Why the Prompt Matters

  • Clarifies that the agent does not participate at runtime
  • Prevents accidental regeneration of runtime behavior (e.g., health check consumers)
  • Ensures consistent behavior across modules, domains, and teams
  • Establishes traceability patterns expected by downstream tools like Studio, DevOps, and Security Orchestrator

In short: The system prompt defines the Observability Engineer Agent’s identity, mission, and operating discipline β€” ensuring all modules it touches are monitorable, diagnosable, and platform-compliant by design.


πŸ’¬ Input Prompt Template

The input prompt template is the dynamic, structured instruction sent to the agent during execution. It integrates contextual blueprint data, execution metadata, and platform-specific configuration to guide the agent's behavior during a specific observability injection task.

This template is completed by the orchestrator or coordinator agent, combining:

  • The service type and its runtime characteristics
  • The agent’s trace and skill identifiers
  • Optional observability policy overrides
  • Feature toggles and tenant-specific settings

πŸ“‘ Template Format (Markdown + YAML Hybrid)

# πŸ›°οΈ Observability Injection Task

You are the Observability Engineer Agent.

## 🧩 Target Module
- Module ID: `{{moduleId}}`
- Tenant ID: `{{tenantId}}`
- Trace ID: `{{traceId}}`
- Agent ID: `observability-engineer`
- Skill ID: `InjectTelemetry`
- Runtime Type: `{{serviceType}}`

## πŸ“¦ Blueprint Features
```yaml
features:
  - UseMassTransit
  - UseNHibernate
  - UseSemanticKernel
serviceType: {{serviceType}}

βš™οΈ Observability Requirements

metrics:
  enabled: true
  exporters:
    - Prometheus
tracing:
  enabled: true
  exporter: OTLP
logging:
  structured: true
  redactSensitive: true
  piiTags:
    - "sensitivity: pii"
    - "secret"
healthChecks:
  endpoints:
    - /healthz
    - /readyz
    - /livez

βœ… Expected Outcomes

  1. Add OTEL spans to handlers, endpoints, and consumers.
  2. Generate /metrics endpoint exposing Prometheus counters.
  3. Emit structured logs with enrichment and redaction enabled.
  4. Register all requested health check endpoints.
  5. Produce execution-metadata.json summarizing injection results.
  6. Emit the ObservabilityReady event with outcome metadata.

Please inject all telemetry as per ConnectSoft observability standards and log any policy violations found.

---

## πŸ§ͺ Example Populated Prompt

```markdown
Module ID: invoice-service  
Tenant ID: petsure-001  
Trace ID: trace-2025-05-19-invoice123  
Service Type: RestApi  
Blueprint Features: UseMassTransit, UseNHibernate  

Expect: OTEL tracing, Prometheus metrics, /metrics + /healthz endpoints, Serilog config with redaction  

🧠 Why the Input Prompt Template Matters

  • Enables context-specific injection plans (e.g., add health checks only if required)
  • Ensures observability remains configurable, declarative, and predictable
  • Supports tenant-aware policy variations and compliance rules
  • Powers Studio dashboards with correct module tagging and health visibility

In short: The input prompt template is the instructional payload that lets the Observability Engineer Agent act precisely, securely, and traceably β€” tailored to each generated module and tenant.


πŸ“€ Output Expectations

This cycle defines what the Observability Engineer Agent is expected to produce at the end of its execution. Outputs must be machine-parseable, CI/CD-consumable, and Studio-integrated β€” with strict tagging and compliance to observability-first standards.

All outputs contribute directly to runtime visibility, security, and traceability for agent-generated SaaS modules.


βœ… Output Deliverables

Output Artifact Description
🧬 execution-metadata.json JSON file with trace metadata, skillId, duration, injected elements, and status. Used by DevOps, Studio, and audit tooling.
🧠 OTEL Tracing Injection Modified .cs files with ActivitySource.StartActivity(...) spans around request handlers, consumers, and workflows.
πŸ“Š Prometheus Metrics Endpoint Code added to expose /metrics with per-request counters, latency histograms, and labels (traceId, tenantId, etc.).
πŸ“œ Structured Logging Configuration Serilog or ILogger settings injected into Startup.cs or appsettings.logging.json, with enrichers (traceId, moduleId, agentId) and redaction support.
🩺 Health Check Endpoints Configuration and controller endpoints for /healthz, /readyz, and /livez, mapped to injected dependency probes.
πŸ§ͺ Observability Validation Report (optional) Diagnostic output listing any failed injections or redaction gaps (e.g., unstructured Console.WriteLine, missing traceId).
πŸ“© Event: ObservabilityReady Event published to the coordination layer once injection is complete and validated. Triggers DevOps, QA, or Studio actions.
πŸ“„ README.md Observability Section Append or generate a section describing observability behavior (e.g., exposed endpoints, span behavior, metrics).

🧾 Example execution-metadata.json

{
  "traceId": "trace-2025-05-19-abc123",
  "agentId": "observability-engineer",
  "skillId": "InjectTelemetry",
  "moduleId": "invoice-service",
  "status": "Success",
  "durationMs": 1875,
  "exportedMetrics": [
    "http_requests_total",
    "agent_execution_duration_seconds"
  ],
  "injected": {
    "spans": 4,
    "metrics": 3,
    "logEnrichers": 5,
    "healthEndpoints": 2
  },
  "outputFiles": [
    "Startup.cs",
    "execution-metadata.json",
    "MetricsExporter.cs"
  ]
}

πŸ” Naming and File Placement Rules

File Type Placement Path
execution-metadata.json /modules/{moduleId}/metadata/
Code files (*.cs) /Application/, /Infrastructure/, or /API/ layer
logger.json / appsettings.*.json /Configuration/Logging/ or root
README.md additions Appended under "Observability" section

🧠 Behavior Contracts

All outputs must:

  • Be deterministic (replayable)
  • Be scoped by tenantId, traceId, and moduleId
  • Comply with platform observability rules (e.g., PII redaction, traceId required)
  • Be linkable to agent execution in Studio (dashboards, trace viewer, metadata inspector)

In short: The agent must output everything needed for the service to be debuggable, monitorable, auditable, and compliant β€” starting with instrumentation and ending with trace-ready metadata.


🧠 Memory Model

The Observability Engineer Agent uses a hybrid memory architecture to support:

  • πŸ“₯ Retrieval of prior telemetry strategies
  • πŸ” Reuse of successful injection patterns
  • πŸ“Š Storage of execution outputs for traceability
  • 🧠 Semantic similarity for selecting observability templates

Memory allows the agent to evolve intelligently and inject proven, context-aware telemetry logic into new modules, based on past executions, blueprint patterns, and skill performance.


🧩 Types of Memory Used

Memory Type Purpose
πŸ“‚ Execution Metadata Store Stores execution-metadata.json for each agent run, including injected spans, metrics, duration, and trace identifiers.
πŸ“š Structured Metadata Index Tracks telemetry components, enrichment rules, redaction violations, injection outcomes. Indexed by agentId, moduleId, skillId, etc.
🧠 Semantic Memory (Vector DB) Retrieves similar past modules and injection plans using blueprint embeddings (text-embedding-ada-002, etc.).
🧾 Log Configuration Corpus Stores reusable logging strategies, Serilog profiles, structured enrichers, and redaction DSLs.
πŸ“Š Metrics Dictionary Canonical list of standard metrics (e.g., agent_execution_duration_seconds) and their past usage patterns.
πŸ” Telemetry Pattern Library Templates for injecting OTEL spans, metrics, health checks, with versioned examples across runtimes.

πŸ“˜ Example Semantic Memory Chunk

{
  "traceId": "trace-2025-05-15-xyz789",
  "moduleId": "booking-service",
  "agentId": "observability-engineer",
  "skillId": "InjectMetricCounters",
  "text": "Exposed /metrics with http_requests_total, tenant-scoped labels",
  "embedding": [0.13, 0.74, -0.21, ...],
  "tags": ["metrics", "Prometheus", "traceable", "REST"],
  "status": "Success"
}

πŸ—‚οΈ Memory Scoping Strategy

Memory is scoped per:

  • traceId, projectId, sprintId
  • agentId, skillId, moduleId
  • tenantId, environment, serviceType
  • outputType (metrics, spans, logs, probes)
  • status (success, failure, warning)

This enables precision filtering, similarity search, and trace-based graph linking.


πŸ“Ž Agent Reads from Memory To:

Use Case Memory Used
Reuse successful injection code Semantic memory from past modules with similar structure
Avoid repeat redaction issues Redaction validation history (tagged sensitivity: pii)
Track telemetry conformance Linter outcomes and prior observability violations
Generate baseline metrics Metrics pattern dictionary (usage frequency + exporters)

πŸ“€ Agent Writes to Memory:

  • execution-metadata.json with full run summary
  • log-config.json, metrics.json, otel-span-plan.json (optional)
  • ObservabilityReady event with memory pointer
  • Traces tagged in Studio with agentId: observability-engineer

πŸ” Memory Behavior Characteristics

Property Value
Versioned βœ… Yes (each telemetry plan and injection is tracked by version + timestamp)
Replayable βœ… Execution history can be replayed to re-inject identical instrumentation
Composable βœ… Multiple memory entries may be merged to generate a new injection strategy
Auditable βœ… All memory updates are logged and linked to traceId and executionId

In short: The Observability Engineer Agent is memory-empowered β€” capable of reasoning from historical telemetry, reusing known-good observability layouts, and ensuring continuous improvement across all injected services.


βœ… Validation Mechanisms

Validation is a critical phase in the Observability Engineer Agent’s lifecycle. After telemetry components are injected, the agent performs a design-time verification pass to ensure:

  • All observability elements are present and correctly scoped
  • No violations of ConnectSoft traceability or redaction policy exist
  • The generated outputs are production-safe, Studio-visible, and traceable
  • Standards for OpenTelemetry, Serilog, metrics, and health endpoints are respected

πŸ” What Is Validated?

Component Validation Criteria
Tracing (OTEL) traceId, agentId, and moduleId must be included in all spans. ActivitySource must be correctly configured.
Logging Logs must be structured (JSON or key-value), enriched with context, and not emit Console.WriteLine.
Metrics (Prometheus) /metrics must be exposed, with standard counters and tenant/module labels present.
Health Checks Required probes (/healthz, /readyz) must exist and return 200 OK.
Execution Metadata execution-metadata.json must include trace context, status, duration, and injected item counts.
PII Redaction Any field marked sensitivity: pii or secret must not appear in plain logs.
Policy Compliance Validates against the tenant- or environment-specific observability policy (e.g., redaction, logging level, required metrics).

πŸ§ͺ Validation Workflow

flowchart TD
    Start[Injected Code]
    RunTests[Run Linter Checks]
    CheckSpans[Verify Spans & Trace Enrichment]
    CheckLogs[Check Logging Format & Redaction]
    CheckMetrics[Verify Prometheus Labels & Endpoint]
    CheckHealth[Validate Health Probes & Startup]
    GenerateReport[Create validation summary]
    StatusCheck{All Passed?}
    EmitSuccess[Emit ObservabilityReady]
    EmitViolation[Emit ObservabilityPolicyViolated]

    Start --> RunTests
    RunTests --> CheckSpans --> CheckLogs --> CheckMetrics --> CheckHealth --> GenerateReport --> StatusCheck
    StatusCheck -->|Yes| EmitSuccess
    StatusCheck -->|No| EmitViolation
Hold "Alt" / "Option" to enable pan & zoom

πŸ“„ Example Validation Report (Structured JSON)

{
  "traceId": "trace-2025-05-19-obs123",
  "agentId": "observability-engineer",
  "status": "Success",
  "validated": {
    "structuredLogs": true,
    "otelSpans": 6,
    "metricsEndpoint": "/metrics",
    "labelsPresent": ["tenantId", "moduleId"],
    "healthEndpoints": ["/healthz", "/readyz"]
  },
  "redactionCheck": {
    "sensitiveFieldLeak": false
  },
  "policyVersion": "obs-policy-v2.3"
}

🧠 Validation Tools and Heuristics

  • Regex scanners for raw Console.WriteLine, string.Format, and unstructured output
  • Reflection-based checks for OTEL ActivitySource usage and enrichment
  • Static analysis of DI container (AddOpenTelemetry, AddHealthChecks)
  • Redaction enforcement on blueprint-defined fields with sensitivity tags
  • Metrics validator that simulates /metrics scrape and checks required counters

❌ What Causes Validation Failure?

Violation Action Taken
Missing traceId in spans Injection re-run or fail
Unstructured log output detected Marked as warning or error
/metrics endpoint not found Error β€” cannot emit ObservabilityReady
Sensitive field (password, email) unredacted Critical error β€” blocked release
Health check not returning 200 OK Warning, may continue in test mode
execution-metadata.json not created Hard failure

πŸ“£ Event Emission Based on Outcome

Outcome Event Emitted
All checks pass ObservabilityReady
Minor warnings ObservabilityReady + warnings
Hard failure ObservabilityPolicyViolated

In short: Validation is how the agent earns trust from the platform β€” ensuring that every observability injection is complete, compliant, and safe before it reaches runtime or CI/CD.


πŸ” Retry & Correction Flow

Even though the Observability Engineer Agent operates at design-time with deterministic inputs, errors may occur due to:

  • Incomplete or malformed code scaffolding
  • Policy mismatches or unexpected configuration states
  • Toolchain versioning issues (e.g., outdated OTEL packages)
  • Agent misalignment due to blueprint evolution or conflicting decorators

Therefore, the agent is equipped with a built-in correction strategy, allowing it to retry safely, regenerate selectively, or escalate to human review.


πŸ”§ Retry and Correction Triggers

Condition Triggers Retry?
execution-metadata.json missing βœ… Yes (generate fallback)
/metrics endpoint not found βœ… Yes (re-inject exporter)
traceId not propagated in span βœ… Yes (auto-patch span code)
Unstructured logging detected βœ… Yes (replace with structured pattern)
PII redaction failure 🚨 No retry β€” escalate to policy violation
Health check endpoint not working ⚠ Retry once with basic default probes
Logger misconfiguration (missing sink) βœ… Re-inject log sink
Previous attempt exceeded durationBudget ❌ Do not retry β€” emit failure report

πŸ§ͺ Retry & Correction Flow Diagram

flowchart TD
    Start[Initial Injection Attempt]
    Validate[Run Observability Validators]
    Pass{Validation Success?}
    RetryConditions[Check Retryable Errors]
    Correct[Auto-Correct Observability Deficiencies]
    Retry[Re-run Injection Steps]
    EndSuccess[Emit ObservabilityReady]
    EndFail[Emit ObservabilityPolicyViolated]

    Start --> Validate --> Pass
    Pass -->|Yes| EndSuccess
    Pass -->|No| RetryConditions
    RetryConditions -->|Retryable| Correct --> Retry --> Validate
    RetryConditions -->|Not Retryable| EndFail
Hold "Alt" / "Option" to enable pan & zoom

πŸ› οΈ Auto-Correction Strategies

Issue Detected Auto-Fix Applied
Missing traceId in spans Add span enrichment from DI context
Logger lacks enrichment fields Re-inject Enrich.FromLogContext()
/metrics not bound Add .MapMetrics() to endpoint config
Missing OTEL exporter Add default OTLP exporter with fallback port
Health probe handler absent Generate BasicHealthCheck.cs with 200 OK stub
ILogger used incorrectly Wrap in Serilog with ForContext() enrichment

πŸ’Ύ Execution Metadata on Retry

Each retry is logged with a new executionId under the same traceId:

{
  "traceId": "trace-2025-05-19-obs123",
  "executionId": "exec-retry-002",
  "agentId": "observability-engineer",
  "retryOf": "exec-001",
  "status": "SuccessAfterRetry",
  "issuesResolved": ["otelSpanMissing", "metricsEndpointAbsent"]
}

β†’ Enables Studio trace viewer and test history to highlight retried operations.


🚨 Escalation Triggers (Non-Retryable)

If the following are detected, the agent emits a violation event and halts:

  • PII not redacted due to logic gap or bypass
  • Conflicting observability configurations
  • Missing blueprint context (e.g., tenantId undefined)
  • File system permission issues preventing injection
  • Infinite retry loop detected (retry count > 2)

πŸ“£ Events Emitted

Outcome Event
Retry successful ObservabilityReady with retries: 1
Retry failed but partial OK ObservabilityReadyPartial
Retry failed critically ObservabilityPolicyViolated

In short: The Observability Engineer Agent is resilient β€” it detects failures, repairs them autonomously, and only escalates when critical policy risks are involved. It ensures no broken or untraceable service is deployed unnoticed.


🀝 Collaboration Interfaces

The Observability Engineer Agent does not work in isolation. It actively collaborates with multiple other agents and platform components, ensuring observability is:

  • Injected at the right time
  • Aligned with other concerns (e.g., DevOps, QA, security)
  • Available for Studio dashboards
  • Used as part of continuous validation and platform scoring

This cycle defines how the agent communicates, responds to events, and integrates with other personas in the ConnectSoft agentic system.


πŸ”— Direct Agent Collaborations

Collaborating Agent Interaction Summary
🧱 Microservice Generator Agent Invokes observability injection after scaffold completion. Shares module path and blueprint trace.
🧠 Backend Developer Agent The code this agent generates (e.g., handlers, controllers) is instrumented for spans and metrics by the Observability Agent.
πŸ§ͺ Test Generator Agent Consumes observability signals to drive test coverage checks (e.g., β€œtraced path exists?”, β€œlog assertions?”).
πŸ”§ DevOps Engineer Agent Reads execution-metadata.json and observability configuration to set up monitoring, alerting, and CI pipelines.
πŸ” Security Engineer Agent Defines redaction policies and PII patterns that must be validated by the observability agent.
πŸ“¦ Documentation Writer Agent Appends an β€œObservability” section to README.md using metadata and exposed endpoints detected by this agent.
🧠 Studio Agent Consumes ObservabilityReady event and trace metadata to populate dashboards, graphs, and execution lineage.

πŸ“¬ Events Emitted & Consumed

Event Name Role
ServiceScaffolded πŸ”„ Consumed β†’ triggers observability injection
AgentOutputReady πŸ”„ Consumed β†’ instrumentation of generated source code
ObservabilityReady βœ… Emitted β†’ signals instrumentation complete and verified
ObservabilityPolicyViolated ❌ Emitted β†’ signals agent failed to meet required standards
ExecutionMetadataGenerated πŸ“€ Emitted β†’ trace metadata with injection details

🧠 Shared Knowledge Contracts

Interface Used With Purpose
execution-metadata.json DevOps, Studio, QA Cross-agent trace of what was injected, validated, and exported
RedactionPolicy.yaml Security Agent List of sensitive fields to redact during log injection
MetricRegistry.json DevOps + Monitoring Metrics emitted by the agent for setup in Prometheus / Grafana
TraceEventGraph Studio + QA OTEL spans and links used for visual timelines and audit trails

🧭 Coordination Flow Example

sequenceDiagram
    participant Generator as Microservice Generator Agent
    participant Observability as Observability Engineer Agent
    participant DevOps as DevOps Engineer Agent
    participant Studio as Studio Agent

    Generator->>Observability: ServiceScaffolded
    Observability->>Observability: Inject Spans + Logs + Metrics
    Observability->>DevOps: Emit execution-metadata.json
    Observability->>Studio: Emit ObservabilityReady
Hold "Alt" / "Option" to enable pan & zoom

βœ… Enables Studio to show trace-linked telemetry, and DevOps to monitor services out-of-the-box.


🧠 Platform Interfaces

  • Orchestration FSM β†’ The Observability Agent’s steps are registered as required before DevOpsAgent can run.
  • Studio API β†’ Pulls observability reports, status, and linkable spans per agent/module from metadata.
  • CI/CD Hooks β†’ Fails pipelines if ObservabilityPolicyViolated is received.

In short: The Observability Engineer Agent is a connective node in the agent graph β€” ensuring telemetry is not just injected, but consumed and validated by the broader ConnectSoft system.


πŸ“ƒ Agent Contract

The agent contract defines the formal, declarative specification that governs how the Observability Engineer Agent:

  • Is invoked by the orchestration system
  • Accepts and validates its inputs
  • Emits outputs, events, and metadata
  • Interoperates with other agents and pipelines
  • Aligns with ConnectSoft execution protocols

It enables the platform to treat the agent as a pluggable, traceable unit of automation, enforce runtime expectations, and replay or validate behavior during trace analysis.


πŸ“„ Contract Overview

agentId: observability-engineer
role: "Design-Time Telemetry Injector"
category: "Observability, QA, Platform Readiness"
description: >
  Ensures every generated module is traceable, measurable, and diagnosable at runtime.
  Injects OTEL spans, structured logs, Prometheus metrics, and execution metadata.

triggers:
  - ServiceScaffolded
  - AgentOutputReady

inputs:
  - ServiceBlueprint.yaml
  - traceContext.json
  - ObservabilityPolicy.yaml (optional)
  - Previously generated source code

outputs:
  - execution-metadata.json
  - Updated source files with spans, metrics, logs
  - Prometheus endpoint (`/metrics`)
  - Health checks (`/healthz`, `/readyz`)
  - README.md telemetry summary (optional)
  - Event: ObservabilityReady
  - Event: ObservabilityPolicyViolated

skills:
  - InjectTraceDecorators
  - EmitLogConfiguration
  - InjectMetricCounters
  - AddHealthCheckProbes
  - GenerateExecutionMetadata
  - ValidateObservabilityReady
  - ApplyRedactionPolicies
  - EmitObservabilityEvent

memory:
  scope: [traceId, moduleId, skillId, tenantId]
  stores:
    - executionMetadataStore
    - telemetryInjectionPatterns (semantic memory)
    - redactionHistory
    - metricsUsageCorpus

validations:
  - Structured logs present and enriched
  - Required OTEL spans exist and propagate traceId
  - `/metrics` and `/healthz` endpoints exposed
  - PII fields redacted
  - execution-metadata.json generated and complete

version: "1.0.0"
status: active

βœ… Key Capabilities Declared in Contract

Capability Description
Declarative inputs/outputs Ensures the orchestrator knows what must exist before and after agent execution
Trace-compliant event structure All emitted events include traceId, agentId, moduleId, skillId
Retry-ready with memory linkage Failed runs can use past memory to retry injection safely
FSM-aware behavior hooks Used to slot the agent into finite state orchestration flows
Audit and security enforcement Allows CI/CD pipelines to assert: no release without ObservabilityReady

🧠 Example: Contract Usage by Orchestrator

when: ServiceScaffolded
then:
  - agent: observability-engineer
    must:
      - emit: execution-metadata.json
      - trigger: ObservabilityReady
    fallback:
      - if: validationFailed
        emit: ObservabilityPolicyViolated

πŸ“¬ Events Declared in Contract

Event Description
ObservabilityReady Signals successful injection and validation of telemetry
ObservabilityPolicyViolated Raised when redaction, tracing, or metrics standards are not met
ExecutionMetadataGenerated Emits trace-linked metadata about the current agent operation

In short: The agent contract defines how the platform interfaces with the Observability Engineer Agent β€” treating it not just as code, but as a governed, orchestrated, and testable automation unit in the software factory.


🧭 Studio View Integration

The ConnectSoft Studio is the visual control center of the AI Software Factory. It shows:

  • Agent activity timelines
  • Execution flows and trace graphs
  • Metrics dashboards
  • Policy violations and retry history
  • Health of generated modules

The Observability Engineer Agent plays a critical role in ensuring that Studio can visualize runtime telemetry, validate execution metadata, and present module readiness with confidence.


🧩 Visual Elements Powered by This Agent

Studio Feature Powered by Agent Output
Trace Explorer execution-metadata.json, OTEL span map
Observability Dashboard Metrics summary: spans injected, metrics exposed, health endpoints live
Module Timeline View Timestamps from execution-metadata.json, durationMs, retries
Telemetry Coverage Scorecard Count of traced operations, metrics present, enrichment fields
Policy Compliance Heatmap Warnings/errors from validation (e.g., missing traceId, unredacted PII)
README + Docs Viewer Injected β€œObservability” section from README.md
Agent Retry History executionId lineage with retry status per traceId
Redaction Violation Tracker Logs showing detection of unmasked sensitive fields

πŸ“Š Metrics Visualized in Studio

agent_execution_duration_seconds{agentId="observability-engineer"}
otel_span_count{moduleId="invoice-service"}
log_enrichment_coverage{tenantId="petsure-001"}
metrics_endpoint_exposed{status="true"}
policy_violations_total{type="pii_unmasked"}

πŸ“ˆ Example UI Widgets

πŸ” Execution Trace Summary

Module: invoice-service
Agent: observability-engineer
Status: βœ… Ready
Trace ID: trace-2025-05-19-invoice
Execution Duration: 1.84s
Spans Injected: 5
Metrics Exported: 3
Log Enrichers: traceId, agentId, tenantId, moduleId

πŸš₯ Observability Compliance Status

graph TD
  A[Startup.cs] -->|βœ“ OTEL Injected| B[Controller.cs]
  B -->|βœ“ Metrics Present| C[MessageConsumer.cs]
  C -->|⚠ PII Redaction Warning| D[Log Validator]
Hold "Alt" / "Option" to enable pan & zoom

🧠 Required Metadata for Studio Hooks

Field in execution-metadata.json Purpose
traceId Links to global orchestration view
agentId Agent execution lane identification
moduleId Highlights what service/module was affected
skillId Skill-level timing and validation breakdown
status, durationMs Timeline and performance overlays
injected.spans, metrics, etc. Metric overlays in dashboard
violations[] Trigger policy compliance heatmap

πŸ“¬ Triggered Studio Events

Event Studio Effect
ObservabilityReady Unlocks β€œReady for Deployment” status on module
ObservabilityPolicyViolated Displays violation markers, blocks CI/CD release
ExecutionMetadataGenerated Enables drill-down timeline and retry inspection

πŸ’‘ Additional Studio Visual Cues

  • Color-coded observability score badges (e.g., 5/5: spans, metrics, logs, probes, redaction)
  • Tooltips showing which OTEL spans were injected and by whom
  • Click-through to metrics like request latency, trace coverage per endpoint

In short: The Observability Engineer Agent gives Studio the data, structure, and trace metadata to visualize, inspect, and validate how observable every generated service is β€” turning telemetry into a first-class experience.


🧬 Traceability Schema

The traceability schema defines the core identifiers and telemetry fields that the Observability Engineer Agent injects, validates, and propagates across logs, spans, metrics, and metadata.

This schema ensures:

  • Every action is traceable across agents, modules, tenants, and executions
  • Studio, DevOps, and analytics tools can correlate events, metrics, and logs
  • Multi-tenant safety by isolating observability to tenant and module boundaries
  • Autonomous feedback loops are grounded in deterministic trace IDs

🧩 Core Identifiers

Field Description
traceId Globally unique identifier for the end-to-end flow of a module or agent plan execution.
agentId The persona executing the skill (e.g., observability-engineer).
skillId The function performed (e.g., InjectTelemetry, GenerateExecutionMetadata).
moduleId Logical service/module under instrumentation (e.g., invoice-service).
tenantId Tenant-specific scoping identifier (e.g., petsure-001).
executionId Unique ID for a single run or retry of the agent, under a trace.
environment Target runtime context (dev, stage, prod).
status Result of the skill run (Success, RetrySuccess, Violation, etc.).
durationMs Time taken to complete the skill.
outputChecksum Hash (SHA256) of the emitted result set (files, config, metadata).

πŸ“˜ Example Execution Metadata (Trace-Schema-Aligned)

{
  "traceId": "trace-2025-05-19-xyz123",
  "agentId": "observability-engineer",
  "skillId": "InjectMetricCounters",
  "moduleId": "booking-service",
  "tenantId": "vetclinic-001",
  "executionId": "exec-042",
  "environment": "stage",
  "status": "Success",
  "durationMs": 1567,
  "outputChecksum": "sha256:ab12f8d4..."
}

πŸ“‘ Telemetry Field Mapping

Artifact Type Fields Injected or Emitted
Structured Logs traceId, agentId, moduleId, tenantId, skillId, status
OTEL Spans traceId, moduleId, agentId (in ResourceAttributes)
Prometheus Metrics traceId, tenantId, moduleId as labels
Health Endpoints Metadata enriched with moduleId, optional traceId in headers
execution-metadata.json All core traceability fields, including retry lineage

πŸ”„ Retry-Aware Extensions

When retries occur:

{
  "retryOf": "exec-041",
  "status": "SuccessAfterRetry",
  "issuesResolved": ["missingMetrics", "untracedSpan"]
}

β†’ Enables Studio to show retry lineage and DevOps to track recoverability.


πŸ” Security-Aware Scoping

  • Every observability record is tagged with tenantId and moduleId to ensure:

  • No cross-tenant leakage

  • Proper data partitioning in metrics, logs, dashboards
  • PII fields are explicitly excluded from trace schema unless redacted

πŸ“Š OpenTelemetry Resource Attributes Injected

.ResourceBuilder(ResourceBuilder.CreateDefault()
    .AddService("booking-service")
    .AddAttributes(new[]
    {
        new KeyValuePair<string, object>("tenantId", "vetclinic-001"),
        new KeyValuePair<string, object>("moduleId", "booking-service"),
        new KeyValuePair<string, object>("agentId", "observability-engineer")
    }))

βœ… Validation Enforcement

The agent runs conformance checks to ensure:

  • Every span/log/metric includes traceId
  • Every emitted event has agentId + skillId
  • All files, logs, and configs are uniquely traceable via executionId
  • Nothing is emitted without tenantId when required

In short: The traceability schema is the backbone of observability governance. It ensures every injected behavior is linked, searchable, accountable, and safe β€” across all modules, tenants, and execution flows.


🧾 Observability DSL / Metrics Profile

The Observability DSL (Domain-Specific Language) is a structured configuration format (typically YAML or JSON) that allows agents, orchestrators, and blueprints to declaratively define:

  • What observability features must be injected
  • Which metrics must be exposed
  • How spans and logs should be enriched
  • What policies should be enforced per tenant or environment

It gives the Observability Engineer Agent a declarative, machine-readable contract to guide and customize its behavior.


πŸ“˜ Example Observability DSL (YAML)

observability:
  tracing:
    enabled: true
    exporter: otlp
    spanEnrichment:
      include:
        - traceId
        - agentId
        - moduleId
        - tenantId
  logging:
    type: structured
    provider: serilog
    enrichers:
      - traceId
      - tenantId
      - executionId
    redaction:
      piiFields:
        - email
        - ssn
        - password
  metrics:
    enabled: true
    endpoint: /metrics
    exporters:
      - prometheus
    counters:
      - name: http_requests_total
        labels: [tenantId, moduleId]
      - name: agent_execution_duration_seconds
        type: histogram
  healthChecks:
    enabled: true
    endpoints:
      - /healthz
      - /readyz

🧩 Why This DSL Matters

  • Customizability β†’ Blueprint or tenant-specific telemetry behavior
  • Separation of concerns β†’ Orchestrators configure, agent executes
  • Consistency β†’ Shared templates across hundreds of modules
  • Traceability β†’ DSL becomes part of execution-metadata.json
  • Governance β†’ Policy compliance is declared, not inferred

πŸ“Š Supported Metrics Profile

The DSL supports predefined metrics contracts used by the Observability Engineer Agent to:

  • Generate default instrumentation
  • Expose /metrics endpoint in a Prometheus-friendly format
  • Label metrics with scoped identifiers (e.g., tenantId, moduleId)

βœ… Common Metrics Supported

Metric Name Type Description
http_requests_total Counter Number of HTTP requests received, labeled by route, status, tenant
agent_execution_duration_seconds Histogram Duration of agent skill execution
trace_span_count_total Counter Count of spans injected per module
log_lines_emitted_total Counter Total logs emitted, labeled by log level
pii_redaction_violations_total Counter Number of failed redaction attempts
metrics_scrape_success_total Counter Number of successful /metrics scrapes by Prometheus
health_probe_status Gauge 1 = healthy, 0 = failed, for /healthz, /readyz

πŸ” DSL + Policy Integration

DSL can be merged or overridden with tenant or environment-specific policies like:

environments:
  prod:
    logging:
      redaction:
        required: true
    metrics:
      exporters: [prometheus]
      counters:
        - name: sla_violation_total

πŸ“₯ Input Sources for DSL

  • Blueprint ServiceBlueprint.yaml
  • Centralized policy registry
  • Observability profiles per tenant or industry (e.g., HIPAA-safe, PCI-ready)
  • Memory-injected recommendations from prior executions

🧠 Runtime Use in Agent

The agent parses the DSL into an execution plan:

{
  "injectTracing": true,
  "injectMetrics": true,
  "spanAttributes": ["traceId", "moduleId"],
  "logEnrichers": ["tenantId", "executionId"],
  "metricDefinitions": [
    { "name": "http_requests_total", "labels": ["tenantId", "moduleId"] },
    { "name": "agent_execution_duration_seconds", "type": "histogram" }
  ]
}

β†’ Used internally by kernel skills like InjectMetricCounters and EmitLogConfiguration.


πŸ“€ Output: DSL-Aware Metadata

Included in execution-metadata.json under observabilityDslSnapshot:

"observabilityDslSnapshot": {
  "metrics": {
    "enabled": true,
    "exporters": ["prometheus"],
    "counters": ["http_requests_total", "agent_execution_duration_seconds"]
  },
  "logging": {
    "redaction": {
      "piiFields": ["email", "ssn"]
    }
  }
}

In short: The Observability DSL allows ConnectSoft to standardize, govern, and adapt observability across thousands of generated services β€” with the Observability Engineer Agent acting as its runtime executor and compliance enforcer.


πŸ” Policy & Security Guardrails

The Observability Engineer Agent is not only responsible for injecting telemetry β€” it must also enforce security, compliance, and tenant-safety constraints as part of every injection. This ensures the factory outputs are:

  • Safe by construction
  • Policy-aligned per tenant/environment
  • Compliant with privacy standards (e.g., masking, PII)
  • Auditable across all observability behaviors

βœ… Policy Enforcement Responsibilities

Area Guardrail Enforced
PII Redaction Auto-detect fields like email, password, ssn and apply structured masking or redaction logic in logs.
Tenant Isolation All metrics, logs, and spans must include tenantId and be scoped to moduleId β€” no cross-tenant exposure allowed.
Trace Enrichment traceId, agentId, and skillId must be included in all telemetry events (logs, spans, metrics).
Log Format Compliance Only structured logging (e.g., JSON) is permitted; usage of Console.WriteLine is flagged.
Health Check Safety All probes must not leak infrastructure state or secrets in error responses.
Metric Label Validation All exposed metrics must include required labels (e.g., tenantId, moduleId, statusCode) to support observability partitioning.
Sink Hardening Logs must not be routed to insecure or public sinks unless explicitly allowed by config (e.g., stdout only in dev).
Trace Export Scoping OTEL exports (OTLP) must be scoped by tenant/environment and routed to secure endpoints only.

πŸ“„ Example Redaction Policy Config

logging:
  redaction:
    enabled: true
    piiFields:
      - email
      - ssn
      - birthdate
    redactionFormat: "[REDACTED]"
    fallbackBehavior: blockInjectionIfViolated

β†’ Enforced during EmitLogConfiguration and ApplyRedactionPolicies skills.


🚫 Violations That Block Release

Violation Severity Action
PII fields not redacted ❌ Critical Block CI/CD
Missing tenantId in logs ❌ Major Fail validation
Logs written to public sink ❌ Major Emit ObservabilityPolicyViolated
Metrics without tenant labels ❌ Major Flag retry or error
Spans without traceId ⚠ Warning Retry once, escalate if persistent

πŸ“’ Events Emitted on Violation

{
  "event": "ObservabilityPolicyViolated",
  "traceId": "trace-2025-05-19-policyfail123",
  "agentId": "observability-engineer",
  "violations": [
    "pii_unmasked",
    "metrics_tenantLabel_missing"
  ],
  "status": "Blocked"
}

🧠 Policy Sources

  • ObservabilityPolicy.yaml from blueprint or tenant profile
  • Factory-wide policy registry (e.g., policies/global/observability/v2.0.yaml)
  • Dynamically loaded rules based on environment (e.g., prod vs dev)

πŸ“‹ Validated Enforcement Matrix

Enforcement Area Validated With
PII fields Regex patterns + blueprint tags (sensitivity: pii)
Logging sinks appsettings.logging.json parser
Metric labels Simulated /metrics scrape and schema check
Span structure OTEL trace analysis from memory snapshot
Health endpoint safety Static + runtime signature checks for response body

πŸ”„ Policy-Aware Retry Behavior

  • Safe retry allowed for:
    • Missing metric labels
    • Log format mismatch
  • Hard fail (no retry) for:
    • Unredacted secrets or PII
    • Missing tenantId in logs or spans

In short: The Observability Engineer Agent is a security-first observability enforcer β€” applying tenant-safe, privacy-aware, and compliance-driven guardrails to every telemetry behavior in the generated system.


πŸ§ͺ Scenario: Instrumenting a Generated REST API Service

Let’s walk through a real-world example of how the Observability Engineer Agent fits into a full orchestration flow, from project initialization to deployment readiness, showcasing its traceable, modular, and secure impact.


πŸ“˜ Scenario: "InvoiceService" Generation

🧭 Starting Point:

  • Tenant: petsure-001
  • Module: invoice-service
  • Trigger: ServiceScaffolded
  • Runtime Type: REST API (.NET 8)
  • Target Environment: staging
  • DSL Policy: Prometheus + OTEL + Serilog + Redaction required

πŸ”„ Agent Execution Flow

sequenceDiagram
    participant Vision as Vision Architect Agent
    participant Generator as Microservice Generator Agent
    participant Observability as Observability Engineer Agent
    participant DevOps as DevOps Engineer Agent
    participant Studio as Studio Agent

    Vision->>Generator: Emit ServiceBlueprint.yaml
    Generator->>Observability: Emit ServiceScaffolded
    Observability->>Observability: Inject Spans, Metrics, Logs, Health Checks
    Observability->>Observability: Validate Redaction, Trace Enrichment
    Observability->>Observability: Generate execution-metadata.json
    Observability->>Studio: Emit ObservabilityReady Event
    Observability->>DevOps: Provide Observability Metadata
Hold "Alt" / "Option" to enable pan & zoom

βœ… Outputs Produced

Artifact Description
Startup.cs Injected with AddOpenTelemetry(), AddHealthChecks(), UseSerilog()
MetricsExporter.cs Exposes /metrics for Prometheus
execution-metadata.json Includes traceId, agentId, status, injected items, duration
README.md New section titled Observability, listing available endpoints
Event: ObservabilityReady Emitted and consumed by Studio, DevOps, QA
Logs and Metrics: OTEL + Prometheus Tagged with tenantId, traceId, moduleId, agentId
Redaction Validator Report Confirms all PII fields masked in logs and config

🧠 Snapshot of execution-metadata.json

{
  "traceId": "trace-invoice-2025-05-19-abc123",
  "agentId": "observability-engineer",
  "skillId": "InjectTelemetry",
  "moduleId": "invoice-service",
  "executionId": "exec-007",
  "tenantId": "petsure-001",
  "status": "Success",
  "durationMs": 1734,
  "outputFiles": ["Startup.cs", "execution-metadata.json", "MetricsExporter.cs"],
  "spansInjected": 5,
  "metricsExposed": ["http_requests_total", "agent_execution_duration_seconds"],
  "healthChecks": ["/healthz", "/readyz"],
  "logEnrichers": ["traceId", "agentId", "tenantId", "moduleId"],
  "violations": []
}

πŸ“Š Studio Dashboard View

Module: invoice-service Status: βœ… ObservabilityReady Trace Coverage: 5 spans Metrics: Prometheus βœ… Log Format: Structured JSON with full enrichment Redaction: Passed Duration: 1.7s Execution ID: exec-007 Retry History: None


πŸš€ Outcome

  • DevOps receives metadata and continues deployment to staging
  • QA Agent starts validating with telemetry visibility
  • Studio shows full trace, metrics graph, and observability heatmap
  • PII risk audit confirms compliance
  • Release Manager Agent approves promotion to production

In summary: This end-to-end flow demonstrates how the Observability Engineer Agent activates post-scaffolding, injects secure and traceable telemetry, validates all outputs, and signals readiness for downstream automation β€” all with zero manual intervention.