Security¶
Target Architecture — Final-State Design
Because the Agent Mesh runs autonomous agents that read knowledge, call models, invoke tools, and produce code, it is a high-value security surface. Its model is least-privilege, tenant-isolated, fully audited, and policy-enforced — every agent acts only within an explicit permission scope, every model call obeys a tenant model policy, and every action is an auditable event. The mesh defers cross-cutting policy to the Governance, Security & Compliance Platform.
Authentication¶
- Workload identity — every mesh service and every caller authenticates with an Azure AD / Entra managed identity; there are no shared static credentials between services.
- Service-to-service — internal calls (runtime → registry, runtime → router) use mutual workload identity over the internal network; internal endpoints are not exposed through the public gateway.
- External callers — Control Plane, Factory Studio, and operators reach public endpoints through the factory API gateway with token-based authentication.
- Provider auth — model provider credentials (Azure OpenAI / OpenAI / Ollama) are held by the Integration Platform, never by individual agents.
Authorization¶
Authorization is layered and re-evaluated on every operation — registration never grants standing access.
| Layer | Enforced where | Checks |
|---|---|---|
| Tenant scope | All services | tenantId claim matches resource tenant. |
| Operation scope | API + handlers | Caller role (registrar, orchestrator, runtime, operator) permits the action. |
| Agent permission scope | ToolAdapterService, ModelRouterService, Knowledge |
The acting AgentVersion's scope permits the tool, model policy, and classification. |
| Model policy | ModelRouterService |
Provider/model selection complies with the task modelPolicyId and tenant routing. |
Agent permission scoping¶
Every AgentVersion carries a least-privilege PermissionScope (tools, model policy, knowledge classification ceiling, resource actions). The mesh re-checks scope at each tool and model invocation, so a compromised or misbehaving agent cannot exceed its grant. Scopes are authorized by Governance at registration and audited on use. See Agent Registry.
Model policy enforcement¶
A modelPolicyId binds a task to allowed providers, models, regions, and cost ceilings. The ModelRouterService rejects any selection outside the policy, supports per-tenant model routing (e.g. a tenant restricted to in-region Azure OpenAI or to self-hosted Ollama for data-residency), and records the policy decision on every ModelInvocation.
Tenant Isolation¶
tenantIdis part of every primary/secondary index used for tenant-scoped queries; cross-tenant reads are rejected at the handler.- Agent pools, context caches (Redis), and Blob payload containers are partitioned per tenant.
- Events carry
tenantIdand broker subscriptions are tenant-filtered via thecs-tenant-idapplication property. - Context packages are access-checked per tenant by Knowledge governance before delivery.
Secret Handling¶
- Secrets and provider credentials live in a managed secret store (Azure Key Vault), referenced by identity — never embedded in agent definitions, skills, prompts, or logs.
- Configuration wiring is handled by the configuration/secret pipeline; the mesh resolves secret references at runtime via managed identity.
- Prompt and response payloads are classified before storage; secrets detected in payloads are redacted (see threat considerations).
Audit¶
- Every state transition and every model/tool invocation is an event in the canonical envelope, giving a complete, replayable audit trail keyed on
traceId. ModelInvocationandToolInvocationrecords (with policy decisions) provide a per-call audit of what an agent did, with which model/tool, under which policy.- Audit events flow to the Observability & Feedback Platform and Governance for compliance reporting.
Threat Considerations¶
| Threat | Mitigation |
|---|---|
| Prompt injection via retrieved context or tool output | Context is governed and classification-filtered by Knowledge; tool outputs are treated as untrusted and validated; skills bound prompt construction. |
| Prompt / output safety | Inputs and outputs pass safety screening; validation includes policy rules; unsafe content fails validation and escalates. |
| Excessive agency (agent over-reaching) | Least-privilege PermissionScope re-checked per call; bounded correction attempts; non-idempotent tools run once. |
| Data exfiltration | Classification ceilings per agent; per-tenant model routing and data residency; redaction of sensitive payloads before storage. |
| Cost / token abuse | Token budgets in context packages; cost ceilings in model policy; telemetry-driven anomaly detection. |
| Cross-tenant leakage | tenantId guards on every store and subscription; partitioned caches and payload containers. |
| Poison / replay | Idempotency on eventId; dead-letter quarantine via the PoisonTaskWorker with full envelope preservation. |
| Model provider compromise | Provider credentials isolated in the Integration Platform; failover and policy-constrained provider selection. |