Skip to content

APIs

Target Architecture — Final-State Design

This page describes the final-state public API surface of the Observability & Feedback Platform. All endpoints are tenant-scoped, OpenIddict-authenticated, and emit the canonical telemetry dimensions. Request and response shapes are versioned per the versioning policy.

The platform exposes nine public REST APIs through the Factory Studio BFF and the platform gateway. Endpoints are grouped by purpose: read APIs for telemetry (traces, logs, metrics, quality, cost) and write APIs for definitions and signals (dashboards, alert rules, incidents, feedback items). Internal services (SloService, TelemetryCorrelationService) expose no public API and are driven by events and workers.

API Catalog

Method & Path Service Kind Purpose
GET /traces/{traceId} TraceService Public (read) Fetch a correlated end-to-end trace.
POST /logs/search LogQueryService Public (read) Search structured logs, tenant-scoped.
POST /metrics/query MetricAggregationService Public (read) Query aggregated metric series.
POST /dashboards DashboardService Public (write) Create / update a dashboard definition.
POST /alerts/rules AlertRuleService Public (write) Create / update an alert rule.
POST /incidents IncidentService Public (write) Open an incident.
POST /feedback-items FeedbackService Public (write) Create a feedback item.
GET /quality/projects/{projectId} QualityScoreService Public (read) Get quality scores for a project.
GET /cost/projects/{projectId} CostTelemetryService Public (read) Get cost telemetry for a project.

Authorization

All endpoints require an OpenIddict-issued bearer token (see Security). Authorization is enforced on three axes:

  • Tenant scope — the token's tenantId claim must match the resource's tenant; cross-tenant access is rejected before any store is touched.
  • Scope claimsobservability.read for read endpoints, observability.write for definition/signal endpoints, observability.incidents for incident operations.
  • Role mapping — Factory Studio surfaces (Runtime Center, Cost Center, QA Center) map to roles runtime.operator, cost.analyst, and quality.reviewer respectively.

Every request is stamped with the required telemetry dimensions (traceId, executionId, tenantId, projectId, moduleId, agentId, skillId, artifactId, workflowId, environment, version) propagated from the caller or generated at the edge.

Versioning

  • APIs are versioned by URL prefix (/v1/...) at the gateway; the table above omits the prefix for brevity.
  • Response payloads are forward-compatible: clients must tolerate unknown fields.
  • Breaking changes ship as a new version path; the prior version is supported for one deprecation window.
  • API versioning is independent of event eventType versioning (see Events).

Public REST APIs

GET /traces/{traceId}

Returns the correlated trace — spans across factory services, agents, and generated SaaS — anchored by traceId.

{
  "traceId": "trace-9f1c2b7d",
  "tenantId": "connectsoft",
  "projectId": "proj-booking-saas",
  "environment": "production",
  "rootSpan": "POST /reservations",
  "startedAt": "2026-06-11T09:00:00Z",
  "durationMs": 412,
  "status": "ok",
  "spans": [
    {
      "spanId": "span-01",
      "parentSpanId": null,
      "name": "POST /reservations",
      "moduleId": "module-reservations-api",
      "durationMs": 412,
      "dimensions": { "agentId": null, "version": "1.4.2" }
    },
    {
      "spanId": "span-02",
      "parentSpanId": "span-01",
      "name": "ReservationsRepository.Save",
      "moduleId": "module-reservations-api",
      "durationMs": 88
    }
  ],
  "links": { "logs": "/logs/search?traceId=trace-9f1c2b7d", "incidents": [] }
}

POST /logs/search

Searches structured Serilog logs in Log Analytics, scoped to the caller's tenant.

// Request
{
  "tenantId": "connectsoft",
  "projectId": "proj-booking-saas",
  "filters": { "level": "Error", "moduleId": "module-reservations-api" },
  "traceId": "trace-9f1c2b7d",
  "timeRange": { "from": "2026-06-11T08:00:00Z", "to": "2026-06-11T10:00:00Z" },
  "limit": 100
}

// Response
{
  "total": 2,
  "results": [
    {
      "timestamp": "2026-06-11T09:00:00Z",
      "level": "Error",
      "message": "Reservation save failed: optimistic concurrency",
      "traceId": "trace-9f1c2b7d",
      "executionId": "exec-771",
      "moduleId": "module-reservations-api",
      "exception": "ConcurrencyException"
    }
  ]
}

POST /metrics/query

Queries aggregated metric series with grouping by required dimensions.

// Request
{
  "tenantId": "connectsoft",
  "metric": "request.latency.p95",
  "groupBy": ["moduleId", "environment"],
  "timeRange": { "from": "2026-06-11T00:00:00Z", "to": "2026-06-11T12:00:00Z" },
  "step": "5m"
}

// Response
{
  "metric": "request.latency.p95",
  "unit": "ms",
  "series": [
    {
      "key": { "moduleId": "module-reservations-api", "environment": "production" },
      "points": [
        { "t": "2026-06-11T09:00:00Z", "v": 412 },
        { "t": "2026-06-11T09:05:00Z", "v": 388 }
      ]
    }
  ]
}

POST /dashboards

Creates or updates a reusable, multi-tenant dashboard definition.

// Request
{
  "tenantId": "connectsoft",
  "name": "Reservations Runtime Health",
  "scope": { "projectId": "proj-booking-saas" },
  "panels": [
    { "type": "timeseries", "metric": "request.latency.p95", "groupBy": ["moduleId"] },
    { "type": "stat", "metric": "slo.error_budget.remaining", "sloId": "slo-reservations-availability" }
  ]
}

// Response
{ "dashboardId": "dash-3f21", "version": 1, "createdAt": "2026-06-11T09:00:00Z" }

POST /alerts/rules

Creates or updates an alert rule evaluated by the AlertEvaluationWorker.

// Request
{
  "tenantId": "connectsoft",
  "name": "Reservations p95 latency high",
  "scope": { "projectId": "proj-booking-saas", "moduleId": "module-reservations-api" },
  "condition": { "metric": "request.latency.p95", "operator": "gt", "threshold": 800, "forMinutes": 5 },
  "severity": "warning",
  "actions": ["open-incident", "notify-runtime-operator"]
}

// Response
{ "alertRuleId": "alert-9a02", "enabled": true, "version": 1 }

POST /incidents

Opens an incident, optionally linked to an alert, SLO breach, or trace.

// Request
{
  "tenantId": "connectsoft",
  "projectId": "proj-booking-saas",
  "title": "Reservations API elevated error rate",
  "severity": "high",
  "source": { "type": "alert", "alertRuleId": "alert-9a02" },
  "traceId": "trace-9f1c2b7d"
}

// Response
{
  "incidentId": "inc-5521",
  "status": "open",
  "openedAt": "2026-06-11T09:01:00Z",
  "severity": "high"
}

POST /feedback-items

Creates a durable feedback item from a runtime signal, human, or agent.

// Request
{
  "tenantId": "connectsoft",
  "projectId": "proj-booking-saas",
  "artifactId": "artifact-reservations-repo",
  "source": "incident",
  "sourceId": "inc-5521",
  "category": "reliability",
  "sentiment": "negative",
  "summary": "Generated repository lacked retry on optimistic concurrency.",
  "traceId": "trace-9f1c2b7d"
}

// Response
{
  "feedbackItemId": "fb-7781",
  "status": "captured",
  "createdAt": "2026-06-11T09:30:00Z"
}

GET /quality/projects/{projectId}

Returns the computed quality scores for a project, sliced by dimension.

{
  "projectId": "proj-booking-saas",
  "tenantId": "connectsoft",
  "computedAt": "2026-06-11T10:00:00Z",
  "overall": 0.86,
  "dimensions": {
    "reliability": 0.79,
    "performance": 0.91,
    "cost_efficiency": 0.88,
    "maintainability": 0.85
  },
  "topArtifacts": [
    { "artifactId": "artifact-reservations-repo", "score": 0.72, "openFeedback": 1 }
  ]
}

GET /cost/projects/{projectId}

Returns cost telemetry attributed to a project, with anomaly flags.

{
  "projectId": "proj-booking-saas",
  "tenantId": "connectsoft",
  "period": "2026-06",
  "currency": "USD",
  "total": 1284.50,
  "breakdown": {
    "model_inference": 740.10,
    "compute": 410.40,
    "storage": 134.00
  },
  "anomalies": [
    { "category": "model_inference", "detectedAt": "2026-06-09T00:00:00Z", "deltaPct": 142.0 }
  ]
}

Internal & gRPC APIs

  • gRPC — high-volume span and metric ingestion from the TraceIngestionWorker and runtime SDKs uses the OTLP/gRPC protocol directly into the App Insights collector; the platform does not re-expose a custom ingestion gRPC surface.
  • Internal contractsSloService and TelemetryCorrelationService are reached only via events and internal cluster-local calls; they are not on the public gateway.
  • All internal calls carry the same telemetry dimensions and traceId propagation as public calls.