π°οΈ Observability Engineer Agent Specification¶
π― Purpose¶
The Observability Engineer Agent ensures that every artifact produced by the ConnectSoft AI Software Factory β including services, agents, orchestrators, and infrastructure modules β is fully observable at runtime, traceable across pipelines, and compatible with compliance, debugging, and optimization flows.
This agent operates at design-time, injecting the runtime constructs needed to emit:
- Structured logs with trace and context metadata
- OpenTelemetry traces with
traceId,agentId,moduleId - Prometheus-compatible metrics (latency, execution count, status)
- Execution metadata artifacts (
execution-metadata.json) - Health probes and diagnostics endpoints
π§ Platform Placement¶
The agent activates immediately after scaffolding, before DevOps orchestration, and acts as a final observability linter and injector in the software factory pipeline.
π Positioning Diagram¶
flowchart LR
Vision[Vision Architect Agent]
Arch[Solution Architect Agent]
Gen[Microservice Generator Agent]
Obs[Observability Engineer Agent]
DevOps[DevOps Engineer Agent]
QA[Test Generator Agent]
Release[Release Manager Agent]
Vision --> Arch
Arch --> Gen
Gen --> Obs
Obs --> DevOps
Obs --> QA
DevOps --> Release
The Observability Engineer Agent is the critical bridge between generation and deployment, ensuring all downstream agents and Studio modules can trace, score, and visualize what was produced.
π§ Why It Exists¶
Without this agent, the factory would suffer from:
- Opaque outputs β code runs but no traces or metrics are emitted
- Inconsistent diagnostics β missing
traceIdortenantIdin logs - Security risks β sensitive fields logged improperly
- Undebuggable failures β no insight into where/why agents or services fail
- Disconnected Studio views β missing visual timelines, dashboards, and audit trails
This agent makes observability not optional, but enforced by design.
β Key Outcomes Enabled¶
| Outcome | Description |
|---|---|
| Traceability | All service logs, spans, and metrics are linked to traceId, agentId, and moduleId. |
| Telemetry-first design | Every handler, adapter, and controller emits OTEL spans and metric counters. |
| Policy-aware logging | Logs are structured, redact secrets, and conform to sensitivity rules. |
| Metrics instrumentation | Prometheus metrics are automatically exposed and tagged by context. |
| Execution observability | execution-metadata.json is generated and attached to every trace. |
| Studio visualization support | Dashboards, timelines, and health heatmaps become available out-of-the-box. |
In summary: The Observability Engineer Agent is the sensor layer of the AI Software Factory. It ensures that every action is visible, every failure is diagnosable, and every output is auditable.
π Responsibilities¶
The Observability Engineer Agent is responsible for injecting, configuring, and validating all observability layers required to support traceable, measurable, and diagnosable services in the ConnectSoft AI Software Factory.
These responsibilities span across:
- Code injection (metrics, logging, tracing decorators)
- Configuration generation (OTEL setup, Serilog config)
- Validation and linting (policy conformance, log structure)
- Artifact production (
execution-metadata.json,/metrics,/healthz) - Studio visibility support (traceId lineage, dashboards, logs)
π§ Responsibilities Breakdown¶
| Responsibility | Description |
|---|---|
| Inject OpenTelemetry Tracing | Adds span start/stop, propagation logic, and traceId, agentId, moduleId into handlers, services, and API layers. |
| Instrument Runtime Metrics | Generates metric counters, histograms, and gauges using Prometheus.Net, with auto-tagging by tenant and agent. |
| Inject Structured Logging Configuration | Adds Serilog, ILogger, or OTEL-compatible log providers, enriched with standard fields like traceId, executionId, and redaction markers. |
Emit execution-metadata.json |
Produces structured files per skill execution that summarize duration, scope, output, and trace links for later indexing. |
| Add Health Check Probes | Registers AddHealthChecks() endpoints (/healthz, /readyz, /livez) with module-specific checks (e.g., DB, messaging, actors). |
| Validate Observability Readiness | Ensures that each generated artifact conforms to platform observability standards β spans are present, logs are structured, PII is masked. |
| Apply Policy and Redaction Rules | Automatically detects PII leaks or unstructured log output and replaces them with redacted values or throws validation errors. |
| Configure Metrics Exporters | Sets up endpoints (/metrics) and formats compatible with Prometheus scraping, including per-tenant scoping. |
Emit ObservabilityReady Event |
Triggers downstream flows (e.g., DevOps, Studio) once observability compliance is verified. |
| Support Studio Dashboards and Timelines | Supplies required telemetry (traceId, metricType, agentId) for visualization of spans, logs, and pipeline health views. |
π Example Responsibilities in Action¶
[β] BookAppointmentHandler.cs wrapped with OpenTelemetry span
[β] execution-metadata.json generated with traceId and duration
[β] /metrics endpoint registered for Prometheus
[β] Logging uses Serilog with structured output and redaction
[β] HealthCheck endpoint registered with 2 custom probes
[β] ObservabilityReady event emitted
β Enables platform to track, debug, visualize, and audit all agent-generated components.
In short: The Observability Engineer Agent transforms raw code into runtime-aware systems by embedding telemetry, logs, and policy validation as default behaviors.
π₯ Inputs¶
The Observability Engineer Agent consumes blueprints, trace context, and module scaffolds to understand what kind of system it needs to instrument and which observability mechanisms to apply.
These inputs determine:
- What kind of telemetry to inject (e.g., REST vs Function vs Actor)
- Which context metadata to include (e.g., tenantId, traceId)
- Which policy constraints to enforce (e.g., PII redaction, metrics exposure)
- Where in the codebase to apply changes (e.g., startup files, controllers, handlers)
π§© Required Inputs¶
| Input Type | Description |
|---|---|
| π Service Blueprint | Defines module type (e.g., REST, Actor, Scheduler), architecture layers, and features to be instrumented. |
| π§ Agent Execution Context | Includes traceId, agentId, skillId, moduleId, and tenantId for trace tagging and output labeling. |
| π¦ Generated Codebase | The scaffolded microservice codebase (with handlers, controllers, DI setup) that will receive instrumentation. |
| π Observability Configuration Profile | Optional overrides for which exporters to use (e.g., OTEL + Prometheus), redaction settings, and health check targets. |
| π Execution Metadata Inputs | Previous traces and metadata from earlier agents that define known contracts, expected log points, or injected code locations. |
| π Policy Contract (optional) | A JSON or YAML config defining organizational constraints (e.g., βlog nothing from PII fieldsβ, βalways expose /healthzβ). |
π Example Input Values¶
moduleId: booking-service
traceId: trace-2025-0519-xyz
agentId: observability-engineer
serviceType: RestApi
features:
- UseMassTransit
- UseNHibernate
- UseSemanticKernel
metrics:
enabled: true
exporters:
- Prometheus
redaction:
sensitivityTags:
- pii
- secret
healthChecks:
endpoints:
- /healthz
- /readyz
π Input Resolution Workflow¶
- π§ Load trace context from orchestrator
- π Parse service blueprint YAML
- π Scan generated file tree (e.g.,
Startup.cs,Controllers,Handlers) - π Load observability config (default or project-specific)
- π Detect injectable points (e.g., AddOpenTelemetry, UseMetrics)
- π§ͺ Plan injection, emit preview, validate readiness
β Input Design Principles¶
- Inputs are idempotent and composable
- Blueprint-driven: all behaviors align with
ServiceBlueprint.yaml - Multi-tenant scoped via
tenantId - Compatible with agent orchestration, memory system, and Studio dashboards
In short: The Observability Engineer Agent relies on structured blueprints, execution metadata, and config overlays to know where, what, and how to inject telemetry safely and correctly.
π€ Outputs¶
The Observability Engineer Agent produces a set of design-time telemetry artifacts, code augmentations, and metadata files that guarantee the generated module will be fully observable at runtime.
These outputs are injected directly into the service folder structure or saved as metadata for DevOps pipelines, Studio dashboards, and compliance validators.
π§© Output Types¶
| Output | Description |
|---|---|
execution-metadata.json |
Structured record of the trace session, including traceId, agentId, skillId, moduleId, duration, and output hash. |
| Telemetry-Injected Code Files | Modified or generated files like Startup.cs, Program.cs, controller classes, and service handlers augmented with logging, spans, and metrics decorators. |
| Logging Configuration Files | Files such as logger.json, serilog.json, or appsettings.logging.json with Serilog or OTEL-compatible log enrichers and sinks. |
| Tracing Configuration | OTEL bootstrap entries in Program.cs or DI registrations (AddOpenTelemetry, exporters, resource attributes). |
| Metrics Endpoint Setup | .cs files that expose /metrics, counters, histograms, and gauge exporters for Prometheus. |
| Health Check Configuration | Code in Startup.cs and probes like /healthz, /readyz, and /livez, with status response contracts. |
| Validation Reports (optional) | Internal .json or .md files that document what observability checks passed or failed during injection. |
| Telemetry Readme Section | Appended content to README.md describing exposed observability endpoints and metrics for human reference. |
Event Emission: ObservabilityReady |
Trigger event indicating successful observability injection, which activates DevOps, QA, or Studio flows. |
π Output File Examples¶
π execution-metadata.json¶
{
"traceId": "trace-abc123",
"agentId": "observability-engineer",
"skillId": "InjectTelemetry",
"moduleId": "booking-service",
"durationMs": 1457,
"status": "Success",
"outputFiles": [
"Startup.cs",
"execution-metadata.json",
"appsettings.logging.json"
],
"exportedMetrics": ["http_requests_total", "agent_execution_duration_seconds"]
}
𧬠Injected Code Snippet (Startup.cs)¶
services.AddOpenTelemetry()
.WithTracing(builder => builder
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddSource("BookingService")
.SetResourceBuilder(ResourceBuilder.CreateDefault()
.AddService("booking-service")
.AddAttributes(new[]
{
new KeyValuePair<string, object>("tenantId", "vetclinic-001"),
new KeyValuePair<string, object>("traceId", traceId)
}))
.AddOtlpExporter());
services.AddHealthChecks()
.AddCheck<DatabaseHealthCheck>("Database")
.AddCheck<MessagingHealthCheck>("MassTransitBus");
π Injected Metrics Endpoint¶
π README (Telemetry Section)¶
### π Observability
This service exposes:
- `/metrics` for Prometheus scraping
- `/healthz` and `/readyz` for health status
- OTEL spans tagged with `traceId`, `agentId`, and `moduleId`
All logs follow ConnectSoft structured logging standards.
β Output Quality Guarantees¶
Each output:
- Includes required trace and module metadata
- Passes observability validation checks
- Is compatible with OpenTelemetry pipelines
- Is machine-readable and Studio-ingestible
- Enables runtime traceability and diagnostics by default
In short: The Observability Engineer Agent produces the telemetry foundation that makes all modules visible, diagnosable, and trusted β from Studio dashboards to production monitoring.
π Knowledge Base¶
The Observability Engineer Agent operates using a deep, structured knowledge base of:
- Observability standards (OpenTelemetry, Prometheus, Serilog)
- ConnectSoft trace metadata schema
- Telemetry placement rules for Clean Architecture services
- Security and redaction policies
- Best practices for health checks and structured logs
- Code injection templates for each supported runtime (REST API, gRPC, Function, Actor)
This knowledge is used to determine where and how to inject observability instrumentation that will be:
- Valid
- Secure
- Compatible with CI/CD
- Usable by Studio dashboards and feedback loops
π§ Built-In Concepts and Standards¶
| Knowledge Area | Contents |
|---|---|
| 𧬠Trace Metadata Schema | Required fields: traceId, agentId, skillId, moduleId, tenantId, executionId. Used across logs, spans, metrics. |
| π¦ File Injection Rules | Startup bootstrap points (Program.cs, Startup.cs, DI layers), controller decorators, handler wrappers. |
| π Metrics Templates | Metric name conventions (agent_execution_duration_seconds, http_requests_total, etc.), label tagging, histogram/gauge setup. |
| π Serilog/OTEL Conventions | MinimumLevel, Enrich.FromLogContext(), structured format templates, sink routing, and enrichment strategies. |
| π Health Check Coverage | Which services require health probes (DB, bus, cache), endpoint naming (/healthz, /readyz), and standard status response formats. |
| π Redaction Policies | Regex patterns for secrets, PII fields, sensitivity: pii blueprint tags β redacted at log emit time. |
| π Agent Coordination Hooks | When and how to emit ObservabilityReady event, what it must include (traceId, summary, observabilityLevel). |
π§© Blueprint-to-Telemetry Mapping¶
The agent understands how to map service structure into observability hooks:
| Service Type | Observability Knowledge Applied |
|---|---|
| REST API | Inject spans in controllers, logs in middleware, metrics in UseEndpoints() |
| Actor Service | OTEL instrumentation via activity sources and actor lifecycle spans |
| Azure Function | Use ILogger for structured logs, export OTEL spans via decorators |
| MassTransit | Instrument consumers, message pumps, retry handlers |
π Sample Prompt Memory Entries¶
{
"traceId": "trace-2025-05-19-xyz",
"skillId": "InjectTelemetry",
"agentId": "observability-engineer",
"injectedMetric": "http_requests_total",
"patternUsed": "AspNetCore.RequestPipeline",
"blueprint": {
"serviceType": "RestApi",
"features": ["UseMassTransit", "UseNHibernate"]
}
}
β Enables reuse of the same instrumentation logic across future flows with similar structure.
π§ Memory System Integration¶
The agent queries:
- π₯ Past
execution-metadata.jsonfiles for prior trace context - π Stored logging and OTEL config examples from prior successful services
- π§ Semantic memory (via vector DB) to reuse optimal injection strategies for similar modules
- π Redaction pattern corpus from the Security Engineer Agent
All knowledge is versioned, tagged by skillId, and retrievable by other agents.
In short: The Observability Engineer Agent uses a codified, reusable observability knowledge base to make every service diagnosable, measurable, and safe-by-default β across domains, tenants, and agents.
π Process Flow¶
The Observability Engineer Agent follows a deterministic and modular design-time flow that enables it to inspect a scaffolded module, inject observability capabilities, validate conformance, and emit traceable outputs.
This flow is both repeatable and adaptable based on the service type (e.g., REST API, Azure Function, Actor Host) and blueprint metadata.
π§ Standard Agent Flow¶
flowchart TD
Start([Start Agent Execution])
LoadBlueprint[Load ServiceBlueprint.yaml]
LoadContext[Load traceId, agentId, tenantId, moduleId]
DetectType[Determine service type & runtime model]
ScanFiles[Scan generated service codebase]
PlanInjection[Plan observability hooks (logs, spans, metrics)]
InjectTelemetry[Inject logging + tracing + metrics + health checks]
Validate[Validate observability compliance]
EmitMetadata[Generate execution-metadata.json]
EmitEvent[Emit ObservabilityReady event]
End([Finish & Pass control to DevOps/Studio])
Start --> LoadBlueprint
LoadBlueprint --> LoadContext
LoadContext --> DetectType
DetectType --> ScanFiles
ScanFiles --> PlanInjection
PlanInjection --> InjectTelemetry
InjectTelemetry --> Validate
Validate --> EmitMetadata
EmitMetadata --> EmitEvent
EmitEvent --> End
π Detailed Steps¶
| Step | Description |
|---|---|
| 1. Load Blueprint | Parses the service definition to understand service type, enabled features (e.g., MassTransit, NHibernate), and observability expectations. |
| 2. Load Execution Context | Retrieves traceId, agentId, skillId, tenantId, and moduleId for trace enrichment and Studio integration. |
| 3. Detect Runtime Type | Determines what kind of system is being instrumented (REST, Actor, Function, etc.) to select the proper injection strategy. |
| 4. Scan Files | Walks the scaffolded project files (e.g., Startup.cs, Handlers, Controllers) and identifies injection points. |
| 5. Plan Hook Injection | Maps observability hooks to code locations using predefined templates and previously successful patterns from memory. |
| 6. Inject Telemetry | Adds: |
β ILogger usage and config |
|
| β OTEL spans via decorators or middlewares | |
| β Prometheus metrics with auto-labeling | |
| β Health checks and endpoints | |
| 7. Validate Compliance | Verifies all required identifiers (traceId, etc.) are injected, metrics endpoints are exposed, and sensitive fields are redacted. |
| 8. Emit Metadata | Writes execution-metadata.json to persist agent run context and injection results for traceability. |
| 9. Emit ObservabilityReady | Publishes an event signaling successful instrumentation so DevOps, QA, or Studio workflows can continue. |
π§ Context Awareness Throughout¶
- Decisions vary based on:
- Runtime type (API vs Actor vs Function)
- Tenant-specific observability policies
- PII sensitivity in blueprint inputs
- Memory reuse from prior services
π Example Process Snapshot¶
[β] Service type: REST API
[β] Traced methods injected: 5
[β] Metrics endpoint added: /metrics
[β] Health check: /healthz + 2 probes
[β] Logger enriched with traceId, agentId, moduleId
[β] Redaction policy: pii masking enabled
[β] execution-metadata.json generated
[β] Event emitted: ObservabilityReady
In summary: The Observability Engineer Agent follows a structured, adaptive flow that transforms scaffolded code into a runtime-observable, traceable, and policy-compliant system module β ready for deployment and Studio introspection.
π§ Kernel Skills¶
The Observability Engineer Agent relies on a focused set of Semantic Kernel skills to carry out its responsibilities. These skills represent atomic, reusable, and orchestratable capabilities that the agent invokes during service instrumentation.
Each skill maps to a well-defined behavior such as injecting spans, emitting structured logs, validating telemetry, or generating metadata. Together, they form the operational vocabulary of the agent.
π§© Core Skills¶
| Skill Name | Purpose |
|---|---|
InjectTraceDecorators |
Adds OpenTelemetry span instrumentation to controllers, handlers, consumers, and background workers. |
GenerateExecutionMetadata |
Produces a complete execution-metadata.json file with traceId, agentId, moduleId, durationMs, and injected components. |
EmitLogConfiguration |
Generates or modifies logging setup (e.g., Serilog, ILogger) to include trace-enriched and structured logs. |
InjectMetricCounters |
Adds Prometheus-compatible counters, histograms, or gauges to services and injects /metrics endpoints. |
AddHealthCheckProbes |
Registers and configures /healthz, /readyz, and any blueprint-defined health endpoints. |
ValidateObservabilityReady |
Verifies that spans, logs, metrics, and trace context are present and conform to ConnectSoft standards. |
EmitObservabilityEvent |
Emits the ObservabilityReady event with summary payload for downstream workflows. |
ApplyRedactionPolicies |
Applies masking or removal logic to PII-tagged fields during structured log emission setup. |
ScanTelemetryViolations |
Detects missing traceId, unstructured log patterns, and improperly scoped spans. |
ReuseTelemetryTemplate |
Retrieves and applies telemetry injection patterns from previously successful modules via semantic memory. |
π§ͺ Sample Skill Invocation Chain¶
π§ Agent Execution Trace:
β InjectTraceDecorators
β InjectMetricCounters
β EmitLogConfiguration
β AddHealthCheckProbes
β ApplyRedactionPolicies
β GenerateExecutionMetadata
β ValidateObservabilityReady
β EmitObservabilityEvent
Each skill is traceable via skillId and scoped by moduleId, traceId, and agentId, enabling precise execution replay and telemetry correlation.
π Skill Composition and Orchestration¶
- Skills are:
- Composable β used individually or bundled into orchestration plans.
- Configurable β adapt based on blueprint and policy profile.
- Idempotent β safe to re-run without duplicating effects.
- Traceable β each invocation emits telemetry (used by itself!).
π§ Reuse via Skill Memory¶
- The agent stores:
- Prior successful metrics injection strategies
- Effective health check setups for similar service topologies
- Optimized log configuration samples from sibling modules
- Skills like
ReuseTelemetryTemplatefetch and reuse this knowledge from vector DB + metadata index.
In summary: The Observability Engineer Agent is powered by a precision toolbelt of telemetry-focused kernel skills, each enabling it to transform uninstrumented modules into observable, validated, and Studio-integrated assets β autonomously.
βοΈ Technology Stack¶
The Observability Engineer Agent uses a modern, telemetry-driven, .NET-compatible technology stack, tightly integrated into the ConnectSoft AI Software Factory. The stack is selected for its:
- Compatibility with generated modules (REST, gRPC, Actor, Function)
- Support for trace enrichment, structured logs, and metrics
- Alignment with cloud-native, multi-tenant, and OpenTelemetry-first practices
- Extensibility for agent-based injection, testing, and validation
π§© Runtime Technologies Targeted¶
| Area | Technology |
|---|---|
| Application Framework | ASP.NET Core (.NET 8) |
| Tracing and Instrumentation | OpenTelemetry SDK (AddOpenTelemetry, ActivitySource) |
| Logging | Serilog (ILogger, structured sinks, enricher support) |
| Metrics | Prometheus.Net, Meter, Histogram, Counter, Gauge |
| Health Checks | Microsoft.Extensions.Diagnostics.HealthChecks, custom probes |
| Containerization | Docker, /metrics and /healthz exposed as K8s probes |
| Observability Output | execution-metadata.json, logs, OTLP spans, metrics endpoint |
| Studio Integration | Trace Explorer, Metric Dashboards, Policy Violation Viewer |
π§ Agent Infrastructure Stack¶
| Component | Stack Element |
|---|---|
| Execution Environment | Semantic Kernel + C# planner bindings |
| Memory Layer | Azure AI Search / Qdrant (semantic memory), Blob Storage (execution-metadata) |
| Validation Runtime | In-process .NET analyzers and telemetry pattern matchers |
| CI/CD Integration | YAML pipelines emit metrics and validation results, integrated with Azure DevOps and Studio |
π Example Libraries and Tools Used¶
- Microsoft.Extensions.Logging
- OpenTelemetry.Instrumentation.AspNetCore
- OpenTelemetry.Exporter.Console / OTLP
- Serilog.Sinks.Console / Serilog.Sinks.ApplicationInsights
- Prometheus.AspNetCore
- App.Metrics (fallback metrics support)
π§± Injection Compatibility Matrix¶
| Module Type | Supported Tooling |
|---|---|
| REST APIs | OTEL spans via middleware + Serilog |
| gRPC Services | ActivitySource + method instrumentation |
| Azure Functions | Manual ActivitySource, structured ILogger |
| Actor Services | Trace injection via message handlers |
| Background Jobs | Task instrumentation, retry metrics |
π Export Targets¶
| Signal Type | Exported To |
|---|---|
| Logs | Application Insights / OTEL log processor |
| Traces | OTLP collector, Studio Trace Viewer |
| Metrics | Prometheus endpoint (/metrics), Grafana |
| Events | ConnectSoft Event Bus β ObservabilityReady, PolicyViolation |
In summary: The Observability Engineer Agent is built for deep integration into modern .NET telemetry, using OpenTelemetry, Serilog, and Prometheus to enforce traceability across all generated modules β ready for cloud-native and agent-aware deployments.
π¬ System Prompt¶
The system prompt is the foundational instruction injected into the Observability Engineer Agent when it is initialized within a ConnectSoft orchestration flow. It sets the persona, responsibility scope, and expected behaviors of the agent across all invocations.
This prompt ensures the agent always acts as a design-time observability enforcer β not a runtime participant β and aligns every action with ConnectSoftβs observability-first, multi-tenant, and traceable-by-default principles.
π§ Default System Prompt (English β Markdown format)¶
# π― Role: Observability Engineer Agent
You are the Observability Engineer Agent in the ConnectSoft AI Software Factory.
Your job is to ensure that every generated module β including REST APIs, gRPC services, Azure Functions, actors, orchestrators, and background workers β is **fully observable at runtime**.
You operate at **design time**, analyzing generated code and configuration to inject all required telemetry, including:
- OpenTelemetry tracing spans with `traceId`, `agentId`, and `moduleId`
- Structured logging using Serilog or ILogger with enrichment and redaction
- Prometheus-compatible metrics (`http_requests_total`, `agent_execution_duration_seconds`, etc.)
- Health check endpoints (`/healthz`, `/readyz`, `/livez`) with status validation
- An `execution-metadata.json` file describing the trace context and injection results
You must verify that the generated outputs:
- Are compliant with ConnectSoft observability policy
- Do not leak secrets or PII in logs
- Include all required metadata for Studio trace and dashboard views
- Expose the correct endpoints and export formats
If observability violations are found, document them clearly in metadata and trigger failure or warning responses. If injection is successful, emit an `ObservabilityReady` event with a summary of what was instrumented.
You are responsible for making the software **measurable, debuggable, and trusted** β before it is released.
π Prompt Metadata¶
| Key | Value |
|---|---|
agentId |
observability-engineer |
roleType |
design-time instrumentation |
category |
QA, DevOps-Ready, Traceability |
activatesOnEvent |
ServiceScaffolded, AgentOutputReady |
emitsEvent |
ObservabilityReady, ObservabilityPolicyViolated |
π§ Why the Prompt Matters¶
- Clarifies that the agent does not participate at runtime
- Prevents accidental regeneration of runtime behavior (e.g., health check consumers)
- Ensures consistent behavior across modules, domains, and teams
- Establishes traceability patterns expected by downstream tools like Studio, DevOps, and Security Orchestrator
In short: The system prompt defines the Observability Engineer Agentβs identity, mission, and operating discipline β ensuring all modules it touches are monitorable, diagnosable, and platform-compliant by design.
π¬ Input Prompt Template¶
The input prompt template is the dynamic, structured instruction sent to the agent during execution. It integrates contextual blueprint data, execution metadata, and platform-specific configuration to guide the agent's behavior during a specific observability injection task.
This template is completed by the orchestrator or coordinator agent, combining:
- The service type and its runtime characteristics
- The agentβs trace and skill identifiers
- Optional observability policy overrides
- Feature toggles and tenant-specific settings
π Template Format (Markdown + YAML Hybrid)¶
# π°οΈ Observability Injection Task
You are the Observability Engineer Agent.
## π§© Target Module
- Module ID: `{{moduleId}}`
- Tenant ID: `{{tenantId}}`
- Trace ID: `{{traceId}}`
- Agent ID: `observability-engineer`
- Skill ID: `InjectTelemetry`
- Runtime Type: `{{serviceType}}`
## π¦ Blueprint Features
```yaml
features:
- UseMassTransit
- UseNHibernate
- UseSemanticKernel
serviceType: {{serviceType}}
βοΈ Observability Requirements¶
metrics:
enabled: true
exporters:
- Prometheus
tracing:
enabled: true
exporter: OTLP
logging:
structured: true
redactSensitive: true
piiTags:
- "sensitivity: pii"
- "secret"
healthChecks:
endpoints:
- /healthz
- /readyz
- /livez
β Expected Outcomes¶
- Add OTEL spans to handlers, endpoints, and consumers.
- Generate
/metricsendpoint exposing Prometheus counters. - Emit structured logs with enrichment and redaction enabled.
- Register all requested health check endpoints.
- Produce
execution-metadata.jsonsummarizing injection results. - Emit the
ObservabilityReadyevent with outcome metadata.
Please inject all telemetry as per ConnectSoft observability standards and log any policy violations found.
---
## π§ͺ Example Populated Prompt
```markdown
Module ID: invoice-service
Tenant ID: petsure-001
Trace ID: trace-2025-05-19-invoice123
Service Type: RestApi
Blueprint Features: UseMassTransit, UseNHibernate
Expect: OTEL tracing, Prometheus metrics, /metrics + /healthz endpoints, Serilog config with redaction
π§ Why the Input Prompt Template Matters¶
- Enables context-specific injection plans (e.g., add health checks only if required)
- Ensures observability remains configurable, declarative, and predictable
- Supports tenant-aware policy variations and compliance rules
- Powers Studio dashboards with correct module tagging and health visibility
In short: The input prompt template is the instructional payload that lets the Observability Engineer Agent act precisely, securely, and traceably β tailored to each generated module and tenant.
π€ Output Expectations¶
This cycle defines what the Observability Engineer Agent is expected to produce at the end of its execution. Outputs must be machine-parseable, CI/CD-consumable, and Studio-integrated β with strict tagging and compliance to observability-first standards.
All outputs contribute directly to runtime visibility, security, and traceability for agent-generated SaaS modules.
β Output Deliverables¶
| Output Artifact | Description |
|---|---|
𧬠execution-metadata.json |
JSON file with trace metadata, skillId, duration, injected elements, and status. Used by DevOps, Studio, and audit tooling. |
| π§ OTEL Tracing Injection | Modified .cs files with ActivitySource.StartActivity(...) spans around request handlers, consumers, and workflows. |
| π Prometheus Metrics Endpoint | Code added to expose /metrics with per-request counters, latency histograms, and labels (traceId, tenantId, etc.). |
| π Structured Logging Configuration | Serilog or ILogger settings injected into Startup.cs or appsettings.logging.json, with enrichers (traceId, moduleId, agentId) and redaction support. |
| π©Ί Health Check Endpoints | Configuration and controller endpoints for /healthz, /readyz, and /livez, mapped to injected dependency probes. |
| π§ͺ Observability Validation Report (optional) | Diagnostic output listing any failed injections or redaction gaps (e.g., unstructured Console.WriteLine, missing traceId). |
π© Event: ObservabilityReady |
Event published to the coordination layer once injection is complete and validated. Triggers DevOps, QA, or Studio actions. |
| π README.md Observability Section | Append or generate a section describing observability behavior (e.g., exposed endpoints, span behavior, metrics). |
π§Ύ Example execution-metadata.json¶
{
"traceId": "trace-2025-05-19-abc123",
"agentId": "observability-engineer",
"skillId": "InjectTelemetry",
"moduleId": "invoice-service",
"status": "Success",
"durationMs": 1875,
"exportedMetrics": [
"http_requests_total",
"agent_execution_duration_seconds"
],
"injected": {
"spans": 4,
"metrics": 3,
"logEnrichers": 5,
"healthEndpoints": 2
},
"outputFiles": [
"Startup.cs",
"execution-metadata.json",
"MetricsExporter.cs"
]
}
π Naming and File Placement Rules¶
| File Type | Placement Path |
|---|---|
execution-metadata.json |
/modules/{moduleId}/metadata/ |
Code files (*.cs) |
/Application/, /Infrastructure/, or /API/ layer |
logger.json / appsettings.*.json |
/Configuration/Logging/ or root |
README.md additions |
Appended under "Observability" section |
π§ Behavior Contracts¶
All outputs must:
- Be deterministic (replayable)
- Be scoped by
tenantId,traceId, andmoduleId - Comply with platform observability rules (e.g., PII redaction,
traceIdrequired) - Be linkable to agent execution in Studio (dashboards, trace viewer, metadata inspector)
In short: The agent must output everything needed for the service to be debuggable, monitorable, auditable, and compliant β starting with instrumentation and ending with trace-ready metadata.
π§ Memory Model¶
The Observability Engineer Agent uses a hybrid memory architecture to support:
- π₯ Retrieval of prior telemetry strategies
- π Reuse of successful injection patterns
- π Storage of execution outputs for traceability
- π§ Semantic similarity for selecting observability templates
Memory allows the agent to evolve intelligently and inject proven, context-aware telemetry logic into new modules, based on past executions, blueprint patterns, and skill performance.
π§© Types of Memory Used¶
| Memory Type | Purpose |
|---|---|
| π Execution Metadata Store | Stores execution-metadata.json for each agent run, including injected spans, metrics, duration, and trace identifiers. |
| π Structured Metadata Index | Tracks telemetry components, enrichment rules, redaction violations, injection outcomes. Indexed by agentId, moduleId, skillId, etc. |
| π§ Semantic Memory (Vector DB) | Retrieves similar past modules and injection plans using blueprint embeddings (text-embedding-ada-002, etc.). |
| π§Ύ Log Configuration Corpus | Stores reusable logging strategies, Serilog profiles, structured enrichers, and redaction DSLs. |
| π Metrics Dictionary | Canonical list of standard metrics (e.g., agent_execution_duration_seconds) and their past usage patterns. |
| π Telemetry Pattern Library | Templates for injecting OTEL spans, metrics, health checks, with versioned examples across runtimes. |
π Example Semantic Memory Chunk¶
{
"traceId": "trace-2025-05-15-xyz789",
"moduleId": "booking-service",
"agentId": "observability-engineer",
"skillId": "InjectMetricCounters",
"text": "Exposed /metrics with http_requests_total, tenant-scoped labels",
"embedding": [0.13, 0.74, -0.21, ...],
"tags": ["metrics", "Prometheus", "traceable", "REST"],
"status": "Success"
}
ποΈ Memory Scoping Strategy¶
Memory is scoped per:
traceId,projectId,sprintIdagentId,skillId,moduleIdtenantId,environment,serviceTypeoutputType(metrics,spans,logs,probes)status(success, failure, warning)
This enables precision filtering, similarity search, and trace-based graph linking.
π Agent Reads from Memory To:¶
| Use Case | Memory Used |
|---|---|
| Reuse successful injection code | Semantic memory from past modules with similar structure |
| Avoid repeat redaction issues | Redaction validation history (tagged sensitivity: pii) |
| Track telemetry conformance | Linter outcomes and prior observability violations |
| Generate baseline metrics | Metrics pattern dictionary (usage frequency + exporters) |
π€ Agent Writes to Memory:¶
execution-metadata.jsonwith full run summarylog-config.json,metrics.json,otel-span-plan.json(optional)ObservabilityReadyevent with memory pointer- Traces tagged in Studio with
agentId: observability-engineer
π Memory Behavior Characteristics¶
| Property | Value |
|---|---|
| Versioned | β Yes (each telemetry plan and injection is tracked by version + timestamp) |
| Replayable | β Execution history can be replayed to re-inject identical instrumentation |
| Composable | β Multiple memory entries may be merged to generate a new injection strategy |
| Auditable | β
All memory updates are logged and linked to traceId and executionId |
In short: The Observability Engineer Agent is memory-empowered β capable of reasoning from historical telemetry, reusing known-good observability layouts, and ensuring continuous improvement across all injected services.
β Validation Mechanisms¶
Validation is a critical phase in the Observability Engineer Agentβs lifecycle. After telemetry components are injected, the agent performs a design-time verification pass to ensure:
- All observability elements are present and correctly scoped
- No violations of ConnectSoft traceability or redaction policy exist
- The generated outputs are production-safe, Studio-visible, and traceable
- Standards for OpenTelemetry, Serilog, metrics, and health endpoints are respected
π What Is Validated?¶
| Component | Validation Criteria |
|---|---|
| Tracing (OTEL) | traceId, agentId, and moduleId must be included in all spans. ActivitySource must be correctly configured. |
| Logging | Logs must be structured (JSON or key-value), enriched with context, and not emit Console.WriteLine. |
| Metrics (Prometheus) | /metrics must be exposed, with standard counters and tenant/module labels present. |
| Health Checks | Required probes (/healthz, /readyz) must exist and return 200 OK. |
| Execution Metadata | execution-metadata.json must include trace context, status, duration, and injected item counts. |
| PII Redaction | Any field marked sensitivity: pii or secret must not appear in plain logs. |
| Policy Compliance | Validates against the tenant- or environment-specific observability policy (e.g., redaction, logging level, required metrics). |
π§ͺ Validation Workflow¶
flowchart TD
Start[Injected Code]
RunTests[Run Linter Checks]
CheckSpans[Verify Spans & Trace Enrichment]
CheckLogs[Check Logging Format & Redaction]
CheckMetrics[Verify Prometheus Labels & Endpoint]
CheckHealth[Validate Health Probes & Startup]
GenerateReport[Create validation summary]
StatusCheck{All Passed?}
EmitSuccess[Emit ObservabilityReady]
EmitViolation[Emit ObservabilityPolicyViolated]
Start --> RunTests
RunTests --> CheckSpans --> CheckLogs --> CheckMetrics --> CheckHealth --> GenerateReport --> StatusCheck
StatusCheck -->|Yes| EmitSuccess
StatusCheck -->|No| EmitViolation
π Example Validation Report (Structured JSON)¶
{
"traceId": "trace-2025-05-19-obs123",
"agentId": "observability-engineer",
"status": "Success",
"validated": {
"structuredLogs": true,
"otelSpans": 6,
"metricsEndpoint": "/metrics",
"labelsPresent": ["tenantId", "moduleId"],
"healthEndpoints": ["/healthz", "/readyz"]
},
"redactionCheck": {
"sensitiveFieldLeak": false
},
"policyVersion": "obs-policy-v2.3"
}
π§ Validation Tools and Heuristics¶
- Regex scanners for raw
Console.WriteLine,string.Format, and unstructured output - Reflection-based checks for OTEL
ActivitySourceusage and enrichment - Static analysis of DI container (
AddOpenTelemetry,AddHealthChecks) - Redaction enforcement on blueprint-defined fields with
sensitivitytags - Metrics validator that simulates
/metricsscrape and checks required counters
β What Causes Validation Failure?¶
| Violation | Action Taken |
|---|---|
Missing traceId in spans |
Injection re-run or fail |
| Unstructured log output detected | Marked as warning or error |
/metrics endpoint not found |
Error β cannot emit ObservabilityReady |
Sensitive field (password, email) unredacted |
Critical error β blocked release |
| Health check not returning 200 OK | Warning, may continue in test mode |
execution-metadata.json not created |
Hard failure |
π£ Event Emission Based on Outcome¶
| Outcome | Event Emitted |
|---|---|
| All checks pass | ObservabilityReady |
| Minor warnings | ObservabilityReady + warnings |
| Hard failure | ObservabilityPolicyViolated |
In short: Validation is how the agent earns trust from the platform β ensuring that every observability injection is complete, compliant, and safe before it reaches runtime or CI/CD.
π Retry & Correction Flow¶
Even though the Observability Engineer Agent operates at design-time with deterministic inputs, errors may occur due to:
- Incomplete or malformed code scaffolding
- Policy mismatches or unexpected configuration states
- Toolchain versioning issues (e.g., outdated OTEL packages)
- Agent misalignment due to blueprint evolution or conflicting decorators
Therefore, the agent is equipped with a built-in correction strategy, allowing it to retry safely, regenerate selectively, or escalate to human review.
π§ Retry and Correction Triggers¶
| Condition | Triggers Retry? |
|---|---|
execution-metadata.json missing |
β Yes (generate fallback) |
/metrics endpoint not found |
β Yes (re-inject exporter) |
traceId not propagated in span |
β Yes (auto-patch span code) |
| Unstructured logging detected | β Yes (replace with structured pattern) |
| PII redaction failure | π¨ No retry β escalate to policy violation |
| Health check endpoint not working | β Retry once with basic default probes |
| Logger misconfiguration (missing sink) | β Re-inject log sink |
Previous attempt exceeded durationBudget |
β Do not retry β emit failure report |
π§ͺ Retry & Correction Flow Diagram¶
flowchart TD
Start[Initial Injection Attempt]
Validate[Run Observability Validators]
Pass{Validation Success?}
RetryConditions[Check Retryable Errors]
Correct[Auto-Correct Observability Deficiencies]
Retry[Re-run Injection Steps]
EndSuccess[Emit ObservabilityReady]
EndFail[Emit ObservabilityPolicyViolated]
Start --> Validate --> Pass
Pass -->|Yes| EndSuccess
Pass -->|No| RetryConditions
RetryConditions -->|Retryable| Correct --> Retry --> Validate
RetryConditions -->|Not Retryable| EndFail
π οΈ Auto-Correction Strategies¶
| Issue Detected | Auto-Fix Applied |
|---|---|
Missing traceId in spans |
Add span enrichment from DI context |
| Logger lacks enrichment fields | Re-inject Enrich.FromLogContext() |
/metrics not bound |
Add .MapMetrics() to endpoint config |
| Missing OTEL exporter | Add default OTLP exporter with fallback port |
| Health probe handler absent | Generate BasicHealthCheck.cs with 200 OK stub |
ILogger used incorrectly |
Wrap in Serilog with ForContext() enrichment |
πΎ Execution Metadata on Retry¶
Each retry is logged with a new executionId under the same traceId:
{
"traceId": "trace-2025-05-19-obs123",
"executionId": "exec-retry-002",
"agentId": "observability-engineer",
"retryOf": "exec-001",
"status": "SuccessAfterRetry",
"issuesResolved": ["otelSpanMissing", "metricsEndpointAbsent"]
}
β Enables Studio trace viewer and test history to highlight retried operations.
π¨ Escalation Triggers (Non-Retryable)¶
If the following are detected, the agent emits a violation event and halts:
- PII not redacted due to logic gap or bypass
- Conflicting observability configurations
- Missing blueprint context (e.g., tenantId undefined)
- File system permission issues preventing injection
- Infinite retry loop detected (retry count > 2)
π£ Events Emitted¶
| Outcome | Event |
|---|---|
| Retry successful | ObservabilityReady with retries: 1 |
| Retry failed but partial OK | ObservabilityReadyPartial |
| Retry failed critically | ObservabilityPolicyViolated |
In short: The Observability Engineer Agent is resilient β it detects failures, repairs them autonomously, and only escalates when critical policy risks are involved. It ensures no broken or untraceable service is deployed unnoticed.
π€ Collaboration Interfaces¶
The Observability Engineer Agent does not work in isolation. It actively collaborates with multiple other agents and platform components, ensuring observability is:
- Injected at the right time
- Aligned with other concerns (e.g., DevOps, QA, security)
- Available for Studio dashboards
- Used as part of continuous validation and platform scoring
This cycle defines how the agent communicates, responds to events, and integrates with other personas in the ConnectSoft agentic system.
π Direct Agent Collaborations¶
| Collaborating Agent | Interaction Summary |
|---|---|
| π§± Microservice Generator Agent | Invokes observability injection after scaffold completion. Shares module path and blueprint trace. |
| π§ Backend Developer Agent | The code this agent generates (e.g., handlers, controllers) is instrumented for spans and metrics by the Observability Agent. |
| π§ͺ Test Generator Agent | Consumes observability signals to drive test coverage checks (e.g., βtraced path exists?β, βlog assertions?β). |
| π§ DevOps Engineer Agent | Reads execution-metadata.json and observability configuration to set up monitoring, alerting, and CI pipelines. |
| π Security Engineer Agent | Defines redaction policies and PII patterns that must be validated by the observability agent. |
| π¦ Documentation Writer Agent | Appends an βObservabilityβ section to README.md using metadata and exposed endpoints detected by this agent. |
| π§ Studio Agent | Consumes ObservabilityReady event and trace metadata to populate dashboards, graphs, and execution lineage. |
π¬ Events Emitted & Consumed¶
| Event Name | Role |
|---|---|
ServiceScaffolded |
π Consumed β triggers observability injection |
AgentOutputReady |
π Consumed β instrumentation of generated source code |
ObservabilityReady |
β Emitted β signals instrumentation complete and verified |
ObservabilityPolicyViolated |
β Emitted β signals agent failed to meet required standards |
ExecutionMetadataGenerated |
π€ Emitted β trace metadata with injection details |
π§ Shared Knowledge Contracts¶
| Interface | Used With | Purpose |
|---|---|---|
execution-metadata.json |
DevOps, Studio, QA | Cross-agent trace of what was injected, validated, and exported |
RedactionPolicy.yaml |
Security Agent | List of sensitive fields to redact during log injection |
MetricRegistry.json |
DevOps + Monitoring | Metrics emitted by the agent for setup in Prometheus / Grafana |
TraceEventGraph |
Studio + QA | OTEL spans and links used for visual timelines and audit trails |
π§ Coordination Flow Example¶
sequenceDiagram
participant Generator as Microservice Generator Agent
participant Observability as Observability Engineer Agent
participant DevOps as DevOps Engineer Agent
participant Studio as Studio Agent
Generator->>Observability: ServiceScaffolded
Observability->>Observability: Inject Spans + Logs + Metrics
Observability->>DevOps: Emit execution-metadata.json
Observability->>Studio: Emit ObservabilityReady
β Enables Studio to show trace-linked telemetry, and DevOps to monitor services out-of-the-box.
π§ Platform Interfaces¶
- Orchestration FSM β The Observability Agentβs steps are registered as required before
DevOpsAgentcan run. - Studio API β Pulls observability reports, status, and linkable spans per agent/module from metadata.
- CI/CD Hooks β Fails pipelines if
ObservabilityPolicyViolatedis received.
In short: The Observability Engineer Agent is a connective node in the agent graph β ensuring telemetry is not just injected, but consumed and validated by the broader ConnectSoft system.
π Agent Contract¶
The agent contract defines the formal, declarative specification that governs how the Observability Engineer Agent:
- Is invoked by the orchestration system
- Accepts and validates its inputs
- Emits outputs, events, and metadata
- Interoperates with other agents and pipelines
- Aligns with ConnectSoft execution protocols
It enables the platform to treat the agent as a pluggable, traceable unit of automation, enforce runtime expectations, and replay or validate behavior during trace analysis.
π Contract Overview¶
agentId: observability-engineer
role: "Design-Time Telemetry Injector"
category: "Observability, QA, Platform Readiness"
description: >
Ensures every generated module is traceable, measurable, and diagnosable at runtime.
Injects OTEL spans, structured logs, Prometheus metrics, and execution metadata.
triggers:
- ServiceScaffolded
- AgentOutputReady
inputs:
- ServiceBlueprint.yaml
- traceContext.json
- ObservabilityPolicy.yaml (optional)
- Previously generated source code
outputs:
- execution-metadata.json
- Updated source files with spans, metrics, logs
- Prometheus endpoint (`/metrics`)
- Health checks (`/healthz`, `/readyz`)
- README.md telemetry summary (optional)
- Event: ObservabilityReady
- Event: ObservabilityPolicyViolated
skills:
- InjectTraceDecorators
- EmitLogConfiguration
- InjectMetricCounters
- AddHealthCheckProbes
- GenerateExecutionMetadata
- ValidateObservabilityReady
- ApplyRedactionPolicies
- EmitObservabilityEvent
memory:
scope: [traceId, moduleId, skillId, tenantId]
stores:
- executionMetadataStore
- telemetryInjectionPatterns (semantic memory)
- redactionHistory
- metricsUsageCorpus
validations:
- Structured logs present and enriched
- Required OTEL spans exist and propagate traceId
- `/metrics` and `/healthz` endpoints exposed
- PII fields redacted
- execution-metadata.json generated and complete
version: "1.0.0"
status: active
β Key Capabilities Declared in Contract¶
| Capability | Description |
|---|---|
| Declarative inputs/outputs | Ensures the orchestrator knows what must exist before and after agent execution |
| Trace-compliant event structure | All emitted events include traceId, agentId, moduleId, skillId |
| Retry-ready with memory linkage | Failed runs can use past memory to retry injection safely |
| FSM-aware behavior hooks | Used to slot the agent into finite state orchestration flows |
| Audit and security enforcement | Allows CI/CD pipelines to assert: no release without ObservabilityReady |
π§ Example: Contract Usage by Orchestrator¶
when: ServiceScaffolded
then:
- agent: observability-engineer
must:
- emit: execution-metadata.json
- trigger: ObservabilityReady
fallback:
- if: validationFailed
emit: ObservabilityPolicyViolated
π¬ Events Declared in Contract¶
| Event | Description |
|---|---|
ObservabilityReady |
Signals successful injection and validation of telemetry |
ObservabilityPolicyViolated |
Raised when redaction, tracing, or metrics standards are not met |
ExecutionMetadataGenerated |
Emits trace-linked metadata about the current agent operation |
In short: The agent contract defines how the platform interfaces with the Observability Engineer Agent β treating it not just as code, but as a governed, orchestrated, and testable automation unit in the software factory.
π§ Studio View Integration¶
The ConnectSoft Studio is the visual control center of the AI Software Factory. It shows:
- Agent activity timelines
- Execution flows and trace graphs
- Metrics dashboards
- Policy violations and retry history
- Health of generated modules
The Observability Engineer Agent plays a critical role in ensuring that Studio can visualize runtime telemetry, validate execution metadata, and present module readiness with confidence.
π§© Visual Elements Powered by This Agent¶
| Studio Feature | Powered by Agent Output |
|---|---|
| Trace Explorer | execution-metadata.json, OTEL span map |
| Observability Dashboard | Metrics summary: spans injected, metrics exposed, health endpoints live |
| Module Timeline View | Timestamps from execution-metadata.json, durationMs, retries |
| Telemetry Coverage Scorecard | Count of traced operations, metrics present, enrichment fields |
| Policy Compliance Heatmap | Warnings/errors from validation (e.g., missing traceId, unredacted PII) |
| README + Docs Viewer | Injected βObservabilityβ section from README.md |
| Agent Retry History | executionId lineage with retry status per traceId |
| Redaction Violation Tracker | Logs showing detection of unmasked sensitive fields |
π Metrics Visualized in Studio¶
agent_execution_duration_seconds{agentId="observability-engineer"}
otel_span_count{moduleId="invoice-service"}
log_enrichment_coverage{tenantId="petsure-001"}
metrics_endpoint_exposed{status="true"}
policy_violations_total{type="pii_unmasked"}
π Example UI Widgets¶
π Execution Trace Summary¶
Module: invoice-service
Agent: observability-engineer
Status: β
Ready
Trace ID: trace-2025-05-19-invoice
Execution Duration: 1.84s
Spans Injected: 5
Metrics Exported: 3
Log Enrichers: traceId, agentId, tenantId, moduleId
π₯ Observability Compliance Status¶
graph TD
A[Startup.cs] -->|β OTEL Injected| B[Controller.cs]
B -->|β Metrics Present| C[MessageConsumer.cs]
C -->|β PII Redaction Warning| D[Log Validator]
π§ Required Metadata for Studio Hooks¶
Field in execution-metadata.json |
Purpose |
|---|---|
traceId |
Links to global orchestration view |
agentId |
Agent execution lane identification |
moduleId |
Highlights what service/module was affected |
skillId |
Skill-level timing and validation breakdown |
status, durationMs |
Timeline and performance overlays |
injected.spans, metrics, etc. |
Metric overlays in dashboard |
violations[] |
Trigger policy compliance heatmap |
π¬ Triggered Studio Events¶
| Event | Studio Effect |
|---|---|
ObservabilityReady |
Unlocks βReady for Deploymentβ status on module |
ObservabilityPolicyViolated |
Displays violation markers, blocks CI/CD release |
ExecutionMetadataGenerated |
Enables drill-down timeline and retry inspection |
π‘ Additional Studio Visual Cues¶
- Color-coded observability score badges (e.g., 5/5: spans, metrics, logs, probes, redaction)
- Tooltips showing which OTEL spans were injected and by whom
- Click-through to metrics like request latency, trace coverage per endpoint
In short: The Observability Engineer Agent gives Studio the data, structure, and trace metadata to visualize, inspect, and validate how observable every generated service is β turning telemetry into a first-class experience.
𧬠Traceability Schema¶
The traceability schema defines the core identifiers and telemetry fields that the Observability Engineer Agent injects, validates, and propagates across logs, spans, metrics, and metadata.
This schema ensures:
- Every action is traceable across agents, modules, tenants, and executions
- Studio, DevOps, and analytics tools can correlate events, metrics, and logs
- Multi-tenant safety by isolating observability to tenant and module boundaries
- Autonomous feedback loops are grounded in deterministic trace IDs
π§© Core Identifiers¶
| Field | Description |
|---|---|
traceId |
Globally unique identifier for the end-to-end flow of a module or agent plan execution. |
agentId |
The persona executing the skill (e.g., observability-engineer). |
skillId |
The function performed (e.g., InjectTelemetry, GenerateExecutionMetadata). |
moduleId |
Logical service/module under instrumentation (e.g., invoice-service). |
tenantId |
Tenant-specific scoping identifier (e.g., petsure-001). |
executionId |
Unique ID for a single run or retry of the agent, under a trace. |
environment |
Target runtime context (dev, stage, prod). |
status |
Result of the skill run (Success, RetrySuccess, Violation, etc.). |
durationMs |
Time taken to complete the skill. |
outputChecksum |
Hash (SHA256) of the emitted result set (files, config, metadata). |
π Example Execution Metadata (Trace-Schema-Aligned)¶
{
"traceId": "trace-2025-05-19-xyz123",
"agentId": "observability-engineer",
"skillId": "InjectMetricCounters",
"moduleId": "booking-service",
"tenantId": "vetclinic-001",
"executionId": "exec-042",
"environment": "stage",
"status": "Success",
"durationMs": 1567,
"outputChecksum": "sha256:ab12f8d4..."
}
π‘ Telemetry Field Mapping¶
| Artifact Type | Fields Injected or Emitted |
|---|---|
| Structured Logs | traceId, agentId, moduleId, tenantId, skillId, status |
| OTEL Spans | traceId, moduleId, agentId (in ResourceAttributes) |
| Prometheus Metrics | traceId, tenantId, moduleId as labels |
| Health Endpoints | Metadata enriched with moduleId, optional traceId in headers |
| execution-metadata.json | All core traceability fields, including retry lineage |
π Retry-Aware Extensions¶
When retries occur:
{
"retryOf": "exec-041",
"status": "SuccessAfterRetry",
"issuesResolved": ["missingMetrics", "untracedSpan"]
}
β Enables Studio to show retry lineage and DevOps to track recoverability.
π Security-Aware Scoping¶
-
Every observability record is tagged with
tenantIdandmoduleIdto ensure: -
No cross-tenant leakage
- Proper data partitioning in metrics, logs, dashboards
- PII fields are explicitly excluded from trace schema unless redacted
π OpenTelemetry Resource Attributes Injected¶
.ResourceBuilder(ResourceBuilder.CreateDefault()
.AddService("booking-service")
.AddAttributes(new[]
{
new KeyValuePair<string, object>("tenantId", "vetclinic-001"),
new KeyValuePair<string, object>("moduleId", "booking-service"),
new KeyValuePair<string, object>("agentId", "observability-engineer")
}))
β Validation Enforcement¶
The agent runs conformance checks to ensure:
- Every span/log/metric includes
traceId - Every emitted event has
agentId + skillId - All files, logs, and configs are uniquely traceable via
executionId - Nothing is emitted without
tenantIdwhen required
In short: The traceability schema is the backbone of observability governance. It ensures every injected behavior is linked, searchable, accountable, and safe β across all modules, tenants, and execution flows.
π§Ύ Observability DSL / Metrics Profile¶
The Observability DSL (Domain-Specific Language) is a structured configuration format (typically YAML or JSON) that allows agents, orchestrators, and blueprints to declaratively define:
- What observability features must be injected
- Which metrics must be exposed
- How spans and logs should be enriched
- What policies should be enforced per tenant or environment
It gives the Observability Engineer Agent a declarative, machine-readable contract to guide and customize its behavior.
π Example Observability DSL (YAML)¶
observability:
tracing:
enabled: true
exporter: otlp
spanEnrichment:
include:
- traceId
- agentId
- moduleId
- tenantId
logging:
type: structured
provider: serilog
enrichers:
- traceId
- tenantId
- executionId
redaction:
piiFields:
- email
- ssn
- password
metrics:
enabled: true
endpoint: /metrics
exporters:
- prometheus
counters:
- name: http_requests_total
labels: [tenantId, moduleId]
- name: agent_execution_duration_seconds
type: histogram
healthChecks:
enabled: true
endpoints:
- /healthz
- /readyz
π§© Why This DSL Matters¶
- Customizability β Blueprint or tenant-specific telemetry behavior
- Separation of concerns β Orchestrators configure, agent executes
- Consistency β Shared templates across hundreds of modules
- Traceability β DSL becomes part of
execution-metadata.json - Governance β Policy compliance is declared, not inferred
π Supported Metrics Profile¶
The DSL supports predefined metrics contracts used by the Observability Engineer Agent to:
- Generate default instrumentation
- Expose
/metricsendpoint in a Prometheus-friendly format - Label metrics with scoped identifiers (e.g.,
tenantId,moduleId)
β Common Metrics Supported¶
| Metric Name | Type | Description |
|---|---|---|
http_requests_total |
Counter | Number of HTTP requests received, labeled by route, status, tenant |
agent_execution_duration_seconds |
Histogram | Duration of agent skill execution |
trace_span_count_total |
Counter | Count of spans injected per module |
log_lines_emitted_total |
Counter | Total logs emitted, labeled by log level |
pii_redaction_violations_total |
Counter | Number of failed redaction attempts |
metrics_scrape_success_total |
Counter | Number of successful /metrics scrapes by Prometheus |
health_probe_status |
Gauge | 1 = healthy, 0 = failed, for /healthz, /readyz |
π DSL + Policy Integration¶
DSL can be merged or overridden with tenant or environment-specific policies like:
environments:
prod:
logging:
redaction:
required: true
metrics:
exporters: [prometheus]
counters:
- name: sla_violation_total
π₯ Input Sources for DSL¶
- Blueprint
ServiceBlueprint.yaml - Centralized policy registry
- Observability profiles per tenant or industry (e.g., HIPAA-safe, PCI-ready)
- Memory-injected recommendations from prior executions
π§ Runtime Use in Agent¶
The agent parses the DSL into an execution plan:
{
"injectTracing": true,
"injectMetrics": true,
"spanAttributes": ["traceId", "moduleId"],
"logEnrichers": ["tenantId", "executionId"],
"metricDefinitions": [
{ "name": "http_requests_total", "labels": ["tenantId", "moduleId"] },
{ "name": "agent_execution_duration_seconds", "type": "histogram" }
]
}
β Used internally by kernel skills like InjectMetricCounters and EmitLogConfiguration.
π€ Output: DSL-Aware Metadata¶
Included in execution-metadata.json under observabilityDslSnapshot:
"observabilityDslSnapshot": {
"metrics": {
"enabled": true,
"exporters": ["prometheus"],
"counters": ["http_requests_total", "agent_execution_duration_seconds"]
},
"logging": {
"redaction": {
"piiFields": ["email", "ssn"]
}
}
}
In short: The Observability DSL allows ConnectSoft to standardize, govern, and adapt observability across thousands of generated services β with the Observability Engineer Agent acting as its runtime executor and compliance enforcer.
π Policy & Security Guardrails¶
The Observability Engineer Agent is not only responsible for injecting telemetry β it must also enforce security, compliance, and tenant-safety constraints as part of every injection. This ensures the factory outputs are:
- Safe by construction
- Policy-aligned per tenant/environment
- Compliant with privacy standards (e.g., masking, PII)
- Auditable across all observability behaviors
β Policy Enforcement Responsibilities¶
| Area | Guardrail Enforced |
|---|---|
| PII Redaction | Auto-detect fields like email, password, ssn and apply structured masking or redaction logic in logs. |
| Tenant Isolation | All metrics, logs, and spans must include tenantId and be scoped to moduleId β no cross-tenant exposure allowed. |
| Trace Enrichment | traceId, agentId, and skillId must be included in all telemetry events (logs, spans, metrics). |
| Log Format Compliance | Only structured logging (e.g., JSON) is permitted; usage of Console.WriteLine is flagged. |
| Health Check Safety | All probes must not leak infrastructure state or secrets in error responses. |
| Metric Label Validation | All exposed metrics must include required labels (e.g., tenantId, moduleId, statusCode) to support observability partitioning. |
| Sink Hardening | Logs must not be routed to insecure or public sinks unless explicitly allowed by config (e.g., stdout only in dev). |
| Trace Export Scoping | OTEL exports (OTLP) must be scoped by tenant/environment and routed to secure endpoints only. |
π Example Redaction Policy Config¶
logging:
redaction:
enabled: true
piiFields:
- email
- ssn
- birthdate
redactionFormat: "[REDACTED]"
fallbackBehavior: blockInjectionIfViolated
β Enforced during EmitLogConfiguration and ApplyRedactionPolicies skills.
π« Violations That Block Release¶
| Violation | Severity | Action |
|---|---|---|
| PII fields not redacted | β Critical | Block CI/CD |
Missing tenantId in logs |
β Major | Fail validation |
| Logs written to public sink | β Major | Emit ObservabilityPolicyViolated |
| Metrics without tenant labels | β Major | Flag retry or error |
Spans without traceId |
β Warning | Retry once, escalate if persistent |
π’ Events Emitted on Violation¶
{
"event": "ObservabilityPolicyViolated",
"traceId": "trace-2025-05-19-policyfail123",
"agentId": "observability-engineer",
"violations": [
"pii_unmasked",
"metrics_tenantLabel_missing"
],
"status": "Blocked"
}
π§ Policy Sources¶
ObservabilityPolicy.yamlfrom blueprint or tenant profile- Factory-wide policy registry (e.g.,
policies/global/observability/v2.0.yaml) - Dynamically loaded rules based on
environment(e.g., prod vs dev)
π Validated Enforcement Matrix¶
| Enforcement Area | Validated With |
|---|---|
| PII fields | Regex patterns + blueprint tags (sensitivity: pii) |
| Logging sinks | appsettings.logging.json parser |
| Metric labels | Simulated /metrics scrape and schema check |
| Span structure | OTEL trace analysis from memory snapshot |
| Health endpoint safety | Static + runtime signature checks for response body |
π Policy-Aware Retry Behavior¶
- Safe retry allowed for:
- Missing metric labels
- Log format mismatch
- Hard fail (no retry) for:
- Unredacted secrets or PII
- Missing
tenantIdin logs or spans
In short: The Observability Engineer Agent is a security-first observability enforcer β applying tenant-safe, privacy-aware, and compliance-driven guardrails to every telemetry behavior in the generated system.
π§ͺ Scenario: Instrumenting a Generated REST API Service¶
Letβs walk through a real-world example of how the Observability Engineer Agent fits into a full orchestration flow, from project initialization to deployment readiness, showcasing its traceable, modular, and secure impact.
π Scenario: "InvoiceService" Generation¶
π§ Starting Point:¶
- Tenant:
petsure-001 - Module:
invoice-service - Trigger:
ServiceScaffolded - Runtime Type:
REST API (.NET 8) - Target Environment:
staging - DSL Policy: Prometheus + OTEL + Serilog + Redaction required
π Agent Execution Flow¶
sequenceDiagram
participant Vision as Vision Architect Agent
participant Generator as Microservice Generator Agent
participant Observability as Observability Engineer Agent
participant DevOps as DevOps Engineer Agent
participant Studio as Studio Agent
Vision->>Generator: Emit ServiceBlueprint.yaml
Generator->>Observability: Emit ServiceScaffolded
Observability->>Observability: Inject Spans, Metrics, Logs, Health Checks
Observability->>Observability: Validate Redaction, Trace Enrichment
Observability->>Observability: Generate execution-metadata.json
Observability->>Studio: Emit ObservabilityReady Event
Observability->>DevOps: Provide Observability Metadata
β Outputs Produced¶
| Artifact | Description |
|---|---|
Startup.cs |
Injected with AddOpenTelemetry(), AddHealthChecks(), UseSerilog() |
MetricsExporter.cs |
Exposes /metrics for Prometheus |
execution-metadata.json |
Includes traceId, agentId, status, injected items, duration |
README.md |
New section titled Observability, listing available endpoints |
Event: ObservabilityReady |
Emitted and consumed by Studio, DevOps, QA |
| Logs and Metrics: OTEL + Prometheus | Tagged with tenantId, traceId, moduleId, agentId |
| Redaction Validator Report | Confirms all PII fields masked in logs and config |
π§ Snapshot of execution-metadata.json¶
{
"traceId": "trace-invoice-2025-05-19-abc123",
"agentId": "observability-engineer",
"skillId": "InjectTelemetry",
"moduleId": "invoice-service",
"executionId": "exec-007",
"tenantId": "petsure-001",
"status": "Success",
"durationMs": 1734,
"outputFiles": ["Startup.cs", "execution-metadata.json", "MetricsExporter.cs"],
"spansInjected": 5,
"metricsExposed": ["http_requests_total", "agent_execution_duration_seconds"],
"healthChecks": ["/healthz", "/readyz"],
"logEnrichers": ["traceId", "agentId", "tenantId", "moduleId"],
"violations": []
}
π Studio Dashboard View¶
Module: invoice-service
Status: β
ObservabilityReady
Trace Coverage: 5 spans
Metrics: Prometheus β
Log Format: Structured JSON with full enrichment
Redaction: Passed
Duration: 1.7s
Execution ID: exec-007
Retry History: None
π Outcome¶
- DevOps receives metadata and continues deployment to
staging - QA Agent starts validating with telemetry visibility
- Studio shows full trace, metrics graph, and observability heatmap
- PII risk audit confirms compliance
- Release Manager Agent approves promotion to production
In summary: This end-to-end flow demonstrates how the Observability Engineer Agent activates post-scaffolding, injects secure and traceable telemetry, validates all outputs, and signals readiness for downstream automation β all with zero manual intervention.