Observability¶
Target Architecture — Final-State Design
This page describes the final-state observability of the Marketplace Platform. All services emit structured logs via ConnectSoft.Extensions.Logging.Serilog, metrics and traces via ConnectSoft.Extensions.Observability/Telemetry (OpenTelemetry), and feed the Observability & Feedback platform. Every signal carries the canonical event envelope identity dimensions.
The Marketplace is instrumented so that any publish, search, compatibility evaluation, or installation can be traced end to end and correlated with the originating agent task or user action. Telemetry is the feedback loop that keeps the asset ecosystem healthy and governable at multi-tenant scale.
Required dimensions¶
Every log line, metric, and span carries these dimensions so signals correlate across the factory:
| Dimension | Source | Purpose |
|---|---|---|
tenantId |
envelope / token | Tenant isolation and per-tenant SLOs |
traceId |
envelope | End-to-end trace stitching |
correlationId |
envelope | Workflow/saga correlation |
assetId, assetType |
domain | Per-asset and per-type analytics |
version |
domain | Per-version health |
publisherId |
domain | Publisher quality and reputation |
installationId |
domain | Installation lifecycle tracing |
requestedBy (task-/user) |
envelope | Attribution of autonomous vs human actions |
service, environment |
runtime | Service- and environment-scoped views |
Logs¶
- Structured Serilog with the envelope identity enriched into the log context; no secret material is ever logged (see Security).
- Key logged events: submission received, quality verdict, signing, version released, compatibility verdict, dependency resolution result, license grant/deny, install applied/rolled back, dead-letter.
- Logs are shipped to the central log store and correlated with traces by
traceId.
Metrics¶
| Metric | Type | Dimensions | Use |
|---|---|---|---|
marketplace_catalog_search_latency_ms |
histogram | tenantId, assetType | Search responsiveness |
marketplace_publish_duration_ms |
histogram | tenantId, publisherId, assetType | Publishing pipeline health |
marketplace_quality_scan_result_total |
counter | result(pass/fail), assetType | Quality gate effectiveness |
marketplace_compatibility_eval_total |
counter | verdict | Compatibility outcomes |
marketplace_install_total |
counter | result(installed/failed/rolledback), assetType | Installation success rate |
marketplace_install_duration_ms |
histogram | tenantId, assetType | Install latency |
marketplace_dependency_conflicts_total |
counter | assetType | Dependency health |
marketplace_license_grant_total |
counter | result(granted/denied), pricingModel | Commerce health |
marketplace_active_licenses |
gauge | tenantId, assetId | Entitlement footprint |
marketplace_dead_letter_total |
counter | topic, eventType | Poison-message detection |
marketplace_review_submitted_total |
counter | rating | Feedback volume |
Traces¶
- OpenTelemetry spans cover the full publishing and installation sagas: a single trace links submit → quality scan → policy gate → sign → release → index, and request → compatibility → resolve → license → apply → installed.
- gRPC and Service Bus hops propagate
traceparentandcs-correlation-idso cross-service spans stitch into one trace. - Traces link to the emitted events by
eventId/causationId, enabling navigation from a span to the canonical event and back.
Dashboards¶
- Marketplace Health — search latency, publish duration, install success rate, dead-letter trend.
- Publishing Pipeline — submissions, quality pass/fail by publisher and asset type, time-to-release.
- Installation Funnel — Requested → Evaluating → Resolving → LicenseCheck → Applying → Installed conversion, with failure/rollback breakdown.
- Commerce — license grants/denials, active licenses, revenue-relevant pricing-model mix (via billing).
- Publisher Quality — per-publisher quality, ratings, reputation, incident history.
- Per-Tenant — install volume, active assets, and SLO compliance per tenant.
Dashboards ship as Dashboard Packs (one of the nine asset types), so the Marketplace dogfoods its own distribution model.
Alerts¶
| Alert | Condition | Action |
|---|---|---|
| Install success rate drop | success rate < SLO over 15m | Page on-call; inspect installation saga |
| Dead-letter growth | marketplace_dead_letter_total rising |
Investigate poison messages; replay from envelope |
| Quality scan failure spike | fail ratio > threshold per publisher | Review publisher submissions; consider trust-tier review |
| Compatibility incompatible spike | verdict=Incompatible surge for an asset |
Flag version; notify publisher |
| Search latency SLO breach | p95 latency > target | Scale search; check index health |
| License denial spike | denials surge | Check billing integration; entitlement issues |
| Signature/hash verification failure | any verification failure | Security incident; quarantine package and publisher |
Feedback loop¶
All marketplace events are ingested by the Observability & Feedback and Knowledge platforms. Adoption, quality, and incident signals feed catalog ranking, publisher trust tiers, and governance decisions — closing the loop from "what was published/installed" back to "what should be promoted, deprecated, or suspended."
Continue to Deployment, Workflows, and Security.