๐ง Knowledge Management Agent Specification¶
๐ฏ Purpose¶
The Knowledge Management Agent is the semantic memory system of the ConnectSoft AI Software Factory.
Its primary goal is to:
Ingest, embed, enrich, and index all semantically important project knowledge โ ensuring agents and humans can access context-rich, traceable, and reusable information across microservices, features, workflows, and conversations.
๐ Strategic Position in the Platform¶
The Knowledge Management Agent sits at the center of ConnectSoftโs semantic intelligence layer, enabling:
- ๐ Cross-agent memory reuse โ any agent (e.g., Developer, Generator, Test) can retrieve contextual artifacts
- ๐ง Memory persistence โ structured, versioned knowledge across builds and sprints
- ๐ Knowledge graph construction โ links between agents, outputs, decisions, templates, and user documentation
- ๐ Retrieval-augmented generation (RAG) โ powering context-aware completions and autonomous reasoning
- ๐งฉ Meta-coordination โ memory structure acts as the backbone of planning, traceability, and reuse
๐บ๏ธ Where the Agent Operates in the Factory¶
flowchart TD
subgraph Artifact Producers
Docs[๐ Documentation Agents]
Dev[๐จโ๐ป Developer Agents]
Arch[๐๏ธ Architecture Agents]
QA[๐งช QA/Test Agents]
end
subgraph Artifact Consumers
Planner[๐งญ Vision/Planning Agents]
Generator[๐ ๏ธ Generator Agents]
Reviewer[๐ Reviewer Agents]
end
Docs --> KM[๐ง Knowledge Management Agent]
Dev --> KM
Arch --> KM
QA --> KM
KM --> Planner
KM --> Generator
KM --> Reviewer
๐ Real-World Examples of Its Use¶
| Use Case | Description |
|---|---|
| ๐งฑ Template Reuse | Embeds and indexes all *.template.cs files for future AI scaffolding |
| ๐ Documentation Memory | Extracts knowledge from *.md files and user-facing guides |
| โ๏ธ Feature Traceability | Maps feature specs โ code outputs โ generated tests |
| ๐ฌ Prompt Enrichment | Helps other agents inject contextual snippets from past runs or documents |
| ๐งช Test Coverage Memory | Links QA agents to past scenarios, failed cases, or test descriptions |
๐ Anchored by ConnectSoft Principles¶
| Principle | Relevance |
|---|---|
| Modularization | Each knowledge item is semantically scoped to a module, domain, or agent cluster |
| Observability-First | Every ingestion emits MemoryEntryCreated with traceId, agentId, artifactId |
| AI-First Development | Knowledge is actively indexed for reuse by Semantic Kernel agents |
| DDD + Clean Architecture | Embeds concepts, entities, and bounded contexts as retrievable memory units |
๐ก Philosophy¶
โKnowledge not stored, linked, and retrievable is wasted effort.โ
The Knowledge Management Agent ensures no insight, artifact, or instruction is lost โ enabling autonomous agents to reason across time, projects, and modular boundaries.
โ Summary¶
The Knowledge Management Agent:
- ๐ง Acts as the semantic memory core of the entire platform
- ๐ Ingests and indexes all agent outputs
- ๐งฉ Enables AI agents to reuse and reason with past context
- ๐ Structures knowledge into retrievable, traceable units
- ๐งญ Powers memory-aware planning, generation, and validation across 3000+ modules
๐ Responsibilities¶
The Knowledge Management Agent is responsible for transforming transient agent output into long-term, queryable, and semantically organized memory โ spanning all stages of the ConnectSoft AI Software Factory.
๐ฆ Core Responsibilities¶
| Responsibility | Description |
|---|---|
| ๐งฉ Ingest Knowledge Artifacts | Accept files, messages, logs, and structured data from any agent or service |
| ๐ง Embed Semantically Relevant Content | Generate vector representations (embeddings) for memory recall and similarity search |
| ๐ท๏ธ Tag & Classify Artifacts | Extract metadata (e.g., domain, type, related agent, output purpose, module) |
| ๐ Link Knowledge to Trace Context | Record trace ID, agent ID, build ID, and edition ID for each memory unit |
| ๐๏ธ Organize by Knowledge Domain | Structure content across templates, features, flows, code snippets, test plans, prompts |
| ๐งพ Version & Track Knowledge Units | Store changes across builds and provide deltas/patches if needed |
| ๐ Support Retrieval & RAG Queries | Respond to retrieval requests with similarity-ranked results or filtered metadata matches |
| ๐งช Validate and Deduplicate Memory | Ensure quality and avoid noisy, redundant, or malformed records |
| ๐ค Emit Memory Events | Emit events like MemoryEntryCreated, MemoryUpdated, KnowledgeGraphExtended |
| ๐ Collaborate with Memory Consumers | Expose APIs, SK skills, and prompt templates to agents that consume knowledge (e.g., Generator, Reviewer, Vision Architect) |
๐งพ Extended Responsibilities¶
| Area | Description |
|---|---|
| ๐ Document Ingestion | Parse *.md, *.spec.yaml, *.feature, and design docs |
| ๐งฑ Template Archiving | Ingest and tag all generated or reusable ConnectSoft templates |
| ๐ฌ Prompt History Management | Record and link prompts + completions across runs for auditability |
| ๐ Knowledge Coverage Reporting | Provide Studio dashboards with insight into knowledge coverage by agent, module, or domain |
| ๐ Change Monitoring | Detect and flag when new knowledge conflicts or overrides previous memory entries |
| ๐งฉ Knowledge Graph Expansion | Support advanced linking across memory: who generated what, why, when, for which tenant/module/feature |
๐ง Knowledge Domains Tracked¶
| Domain | Examples |
|---|---|
| ๐ฆ Templates | .cs, .md, .json, .sql, .http, etc. |
| ๐งฌ Features | Prompt plans, user stories, epics, decisions |
| ๐๏ธ Architecture | Diagrams, DDD bounded contexts, clean architecture layouts |
| ๐ Documentation | Guides, READMEs, test instructions, contract definitions |
| ๐งช Test Coverage | Test cases, regressions, scenario matrices |
| ๐ค Agent Intelligence | Prompt templates, execution flows, skills used |
| ๐ Trace Context | traceId, agentId, buildId, editionId |
โ Summary¶
The Knowledge Management Agent:
- Accepts all outputs across the software lifecycle
- Extracts meaningful metadata and semantic embeddings
- Links artifacts to modular, traceable memory
- Powers downstream retrieval, reuse, and reasoning
- Emits and maintains structured, versioned knowledge across agents
This ensures every AI agent in ConnectSoftโs ecosystem can access, contribute to, and benefit from shared intelligence at scale.
๐ฅ Inputs Consumed¶
This section outlines what types of inputs the Knowledge Management Agent accepts, how theyโre structured, and what metadata or semantic content it extracts during ingestion.
The agent supports a modality-agnostic, format-flexible ingestion pipeline across all ConnectSoft modules, microservices, agents, and environments.
๐ Accepted Input Types¶
| Type | Description |
|---|---|
*.md |
Documentation files: READMEs, architecture docs, test guides, design decisions |
*.cs |
Code artifacts (especially templates, generators, orchestrators, domain entities) |
*.feature |
SpecFlow / BDD test specifications |
*.json, *.yaml |
Configuration, prompt plans, API contracts, memory schemas |
*.http, .sql, .sh |
API test files, query templates, scripts |
prompt.log.jsonl |
Prompt + completion logs from previous agent executions |
execution-trace.json |
End-to-end trace outputs from the orchestration layer |
trace-logs.json, memory-metrics.json |
Observability logs and metrics related to knowledge usage |
agent-output.* |
Outputs from other agents (Architect, QA, TestGen, Developer) including plans, specs, fixes, metrics |
๐ง Semantic Metadata Extracted¶
| Metadata | Purpose |
|---|---|
agentId |
Who created the artifact |
traceId |
Which execution it belongs to |
buildId / moduleId |
Which feature or service it is linked to |
artifactType |
Template, test, prompt, document, plan, script, etc. |
domainContext |
Architecture layer, DDD context, edition-specific scope |
language |
Code (C#, SQL, YAML, Markdown) or prompt language |
dependencies |
Files/modules it references or imports |
embeddingVector |
Semantic SK/OpenAI vector for similarity search |
versionId |
Version hash or build number from source control or factory run |
๐ Sample Input Artifact (Simplified)¶
File: BookingService.template.cs
Tags Extracted:
{
"traceId": "proj-882-v3",
"agentId": "MicroserviceGeneratorAgent",
"moduleId": "BookingService",
"artifactType": "template",
"language": "C#",
"domainContext": "Appointments::ApplicationLayer",
"versionId": "v5.2.0",
"edition": "vetclinic-blue"
}
๐ง Derived Inputs (via SK plugins or Orchestration)¶
| Derived Input | How Itโs Used |
|---|---|
| File-to-prompt conversion | Converts code or docs into embedding-ready chunks |
| Prompt memory index | Extracts reusable prompt tokens + completions |
| Interlinked dependency graphs | Establishes context โ source โ output traceability |
| Artifact lineage history | Tracks source โ transform โ generator mapping chain |
๐ Ingestion Modes¶
| Mode | Trigger |
|---|---|
| Real-time | Triggered by agent execution events (e.g., AgentCompletedExecution) |
| Batch | Periodic sweep of project directory or blob storage |
| Manual | Human upload of new docs, architecture, or test plans |
| Retrospective | Bootstrapping from historical repositories or GitHub commits |
โ Summary¶
The Knowledge Management Agent ingests:
- Files (
.cs,.md,.yaml,.feature,.json) - Agent output traces, prompt logs, execution metadata
- Edition-, module-, and feature-scoped knowledge artifacts
- All tagged with trace IDs, agent IDs, build/version IDs, and domain context
It performs semantically rich ingestion across ConnectSoftโs modular AI ecosystem.
๐ค Outputs Produced¶
This section defines the structured outputs emitted by the Knowledge Management Agent after processing inputs. These outputs are consumed by downstream agents for retrieval, generation, planning, traceability, and auditing.
The outputs ensure that every artifact โ from code to prompts to test plans โ becomes a queryable, versioned, semantically linked knowledge unit.
๐ฆ Primary Output Artifacts¶
| Output File | Description |
|---|---|
memory-entry.json |
Canonical metadata representation of the ingested artifact |
embedding-vector.json |
OpenAI/SK vector embedding for similarity-based retrieval |
knowledge-index.yaml |
Summary index of all knowledge units by module, agent, edition |
trace-link-map.json |
Links between artifact, its generating agent, traceId, and domain context |
memory-metrics.json |
Telemetry of ingestion (e.g., number of tokens embedded, duplication checks passed) |
memory-events.log |
Structured log with MemoryEntryCreated, MemoryEntryUpdated, MemoryTagged |
memory-validation-report.yaml |
Any warnings, errors, or fix suggestions from ingestion pipeline |
studio.knowledge.status.json |
Feed for Studio dashboard (knowledge coverage per module, agent, edition) |
๐ Example: memory-entry.json¶
{
"artifactId": "template-booking-service-2025-05-15",
"traceId": "proj-888-v1",
"agentId": "MicroserviceGeneratorAgent",
"moduleId": "BookingService",
"artifactType": "template",
"language": "C#",
"domainContext": "Appointments::ApplicationLayer",
"tags": ["template", "booking", "appointments", "microservice"],
"edition": "vetclinic-premium",
"embeddingId": "vec-8f3b72ac",
"version": "v5.3.0",
"ingestedAt": "2025-05-15T17:08:00Z"
}
๐ Example: trace-link-map.json¶
{
"traceId": "proj-888-v1",
"artifactId": "test-cancel-appointment.feature",
"generatedBy": "TestCaseGeneratorAgent",
"linkedInputs": ["feature-plan.yaml", "booking-service.cs"],
"relatedModules": ["Appointments", "Notifications"],
"edition": "vetclinic-lite"
}
๐ memory-metrics.json Fields¶
| Field | Description |
|---|---|
tokensProcessed |
Total tokens embedded from input file |
embeddingSize |
Length of resulting vector |
storageLocation |
Where the knowledge artifact is persisted |
deduplicationResult |
Pass / warning / collision |
tagQualityScore |
Heuristic on tag accuracy / completeness (0โ1) |
validationErrors |
List of schema or metadata warnings (if any) |
๐งฉ Outputs for Downstream Agents¶
| Output | Used By | Purpose |
|---|---|---|
embedding-vector.json |
Generator Agents, Vision Architect | Contextual code/text retrieval |
memory-entry.json |
Reviewer Agent | Reasoning about origin, trace, and structure |
trace-link-map.json |
Orchestrator | Validate artifact lineage and agent attribution |
studio.knowledge.status.json |
Studio Dashboard | Visualize memory coverage, quality, and domain links |
โ Summary¶
The Knowledge Management Agent produces:
- ๐ Canonical memory entry files per ingested artifact
- ๐ Vector embeddings for semantic search
- ๐ Knowledge coverage reports and metrics
- ๐ Trace-link graphs for full AI artifact lineage
- ๐ค Live dashboards and logs for observability and governance
These outputs transform static artifacts into semantic knowledge units โ reusable across every phase of the AI Software Factory.
๐ง Knowledge Base¶
This section describes the pre-existing memory and embedded knowledge available to the Knowledge Management Agent before any new ingestion occurs โ ensuring it starts with a rich understanding of the ConnectSoft platform, its structure, and factory-wide patterns.
๐ Pre-Embedded Core Knowledge Domains¶
| Domain | Description |
|---|---|
| ๐งฑ Templates Library | Semantic representation of all base project templates (ConnectSoft.MicroserviceTemplate, *.template.cs) |
| ๐ฆ Modular Architecture Guide | Vectorized understanding of bounded contexts, domain layers, and modularization strategy |
| ๐ Documentation Corpus | Embedded project-wide *.md files from /docs/, including architecture, DDD, and principles |
| ๐งช Test Specification Language | Known grammar and patterns for BDD .feature files, test cases, and scenario tagging |
| ๐ Agent Execution Flows | Historical traces from agent-execution-flow.md, pre-labeled by cluster and role |
| ๐ฌ Prompt Libraries | Pre-ingested prompt templates, macros, and completions from agents like ProductManagerAgent, VisionArchitectAgent, etc. |
| โ๏ธ Technology Stack Specification | Structured understanding of the ConnectSoft platform stack (.NET 8, Azure, NHibernate, MassTransit, SK) |
| ๐ง Knowledge System Metadata | All definitions and schemas from knowledge-and-memory-system.md, including storage patterns, tags, embeddings, and memory events |
๐ Example: Template Knowledge Entry (Preloaded)¶
{
"artifactId": "template-orchestration-layer",
"type": "template",
"domainContext": "Orchestration::StartupPipeline",
"tags": ["orchestrator", "di", "middleware", "hostBuilder"],
"embeddingId": "vec-template-001",
"description": "Standard orchestration entry point used by all generated microservices",
"sourceFile": "orchestration-host.template.cs"
}
๐ง Built-In Conceptual Models¶
| Concept | Description |
|---|---|
AgentCluster |
Maps all agents by role: Architect, Developer, QA, Generator |
TraceLinkModel |
Schema to relate trace โ artifact โ agent โ module |
EmbeddingChunker |
Strategy for tokenizing long files while preserving semantic boundaries |
EditionScopeModel |
Rules to index knowledge differently based on edition-level customizations |
๐งฉ Inherited Context from Other Agents¶
| Agent | What It Shares |
|---|---|
Vision Architect Agent |
Prompt plan structure, strategy maps, requirement blueprints |
Test Generator Agent |
BDD structure patterns, test plan flows, test-to-trace mappings |
Microservice Generator Agent |
Templates, architecture assembly logic, skeleton project metadata |
Documentation Agent |
Markdown flow and documentation frame types |
๐งพ Prebuilt Memory Structures¶
| Name | Purpose |
|---|---|
core-memory-index |
Preloaded memory entries keyed by artifactType + domainContext |
core-embedding-cache |
Base vector DB for fast retrieval before first ingestion cycle |
agent-execution-schema.json |
Known schema of agent inputs, outputs, traceIds, and lineage paths |
memory-event-types.json |
Types of memory lifecycle events (create, update, invalidate, promote) |
studio.knowledge.index.json |
Initial dashboard tiles mapped to core artifacts and trace clusters |
โ Summary¶
Before ingestion begins, the Knowledge Management Agent:
- Already understands the structure, vocabulary, and semantic patterns of ConnectSoftโs platform
- Possesses prebuilt template knowledge, documentation embeddings, and agent role maps
- Maintains internal schemas and memory models to anchor new data
- Can bootstrap other agents with intelligent context, even before a full knowledge ingestion pass
This makes the agent fast, smart, and reusable from the very first trace, powering a truly context-rich and autonomous factory ecosystem.
๐ Process Flow¶
This section defines the end-to-end lifecycle of the Knowledge Management Agentโs execution โ from input detection to memory enrichment and event emission.
Each step is modular, observable, and aligned with ConnectSoftโs AI-First, Traceable, and Memory-Centric principles.
๐ High-Level Execution Flow¶
flowchart TD
START[๐ฅ Input Artifact Received]
PARSE[๐ Parse & Analyze Structure]
TAG[๐ท๏ธ Extract Metadata & Domain Context]
EMBED[๐ง Generate Semantic Embedding Vector]
VALIDATE[โ
Validate Schema & Deduplication]
STORE[๐พ Persist Memory Entry + Vector + Metadata]
INDEX[๐ Update Knowledge Index & Trace Links]
EMIT[๐ค Emit MemoryEntryCreated Event]
STUDIO[๐ฅ๏ธ Push Knowledge Status to Studio]
END[๐ Agent Completes]
START --> PARSE --> TAG --> EMBED --> VALIDATE --> STORE --> INDEX --> EMIT --> STUDIO --> END
๐งฉ Phase-by-Phase Breakdown¶
| Step | Description |
|---|---|
| 1. Parse | Structure-specific parsing (.cs, .md, .yaml, .json) to normalize content |
| 2. Tag | Extracts metadata: traceId, agentId, domainContext, editionId, artifactType, etc. |
| 3. Embed | Calls Semantic Kernel / Azure OpenAI embedding skill to generate vector |
| 4. Validate | Ensures uniqueness, metadata schema compliance, and semantic density (non-empty chunks) |
| 5. Store | Persists structured memory in long-term storage (JSON, Azure Search, blob index) |
| 6. Index | Updates internal YAML/graph-based memory maps (knowledge-index.yaml, trace-link-map.json) |
| 7. Emit | Sends MemoryEntryCreated event with metadata, tags, and vectorId |
| 8. Studio Sync | Updates studio.knowledge.status.json to visualize coverage and memory depth |
๐ Example MemoryEntryCreated Event¶
{
"eventType": "MemoryEntryCreated",
"artifactId": "doc-clean-architecture-v1",
"traceId": "proj-811-v4",
"agentId": "DocumentationAgent",
"moduleId": "PlatformArchitecture",
"embeddingId": "vec-cb39f2c1",
"timestamp": "2025-05-15T17:34:21Z"
}
๐ Re-Entry Triggers¶
| Trigger | Behavior |
|---|---|
| New artifact from trace | Execute full flow |
| Artifact already exists with version delta | Execute diff-based enrichment flow (MemoryEntryUpdated) |
| Conflicting artifact ID | Execute deduplication + retry flow |
| Re-ingestion by human prompt | Execute enrichment mode (add metadata or annotations) |
๐ง Side Processes¶
- ๐ Embedding retry with fallback model (e.g., if Azure OpenAI fails)
- ๐ Metrics collector updates
memory-metrics.json - ๐งช Validation failures logged to
memory-validation-report.yaml
๐ฆ Intermediate Artifacts¶
| File | Purpose |
|---|---|
parsed-structure.json |
Intermediate representation used for embedding |
chunked-artifact.json |
Tokenized segments of long files or docs |
tag-map.yaml |
Applied tags by position or section of file |
memory-ingestion-log.jsonl |
Step-by-step debug-friendly audit trail per file |
โ Summary¶
The Knowledge Management Agent executes a structured, event-driven pipeline:
- Parses โ Tags โ Embeds โ Validates โ Stores โ Indexes
- Emits memory events and updates all downstream consumers (agents, dashboards, planners)
- Guarantees traceable, versioned, and semantically enriched knowledge ingestion at scale
This ensures no agent output is wasted โ every artifact becomes a retrievable, queryable memory unit for autonomous reuse.
๐งฉ Skills and Kernel Functions¶
This section details the Semantic Kernel (SK) skills used by the Knowledge Management Agent to perform semantic enrichment, metadata tagging, vector embedding, validation, and trace linking.
These skills make the agent composable, observable, and programmable โ allowing it to operate autonomously or as part of a larger orchestration.
๐ง Core Skills List¶
| Skill | Purpose |
|---|---|
EmbedArtifactSkill |
Generates vector embedding from code, text, or prompt input |
TagArtifactSkill |
Extracts domain context, module, agent, edition, and tags |
ChunkArtifactSkill |
Tokenizes and chunks large inputs for embedding (context-aware windowing) |
ValidateArtifactSkill |
Ensures semantic + schema correctness, deduplication, and trace completeness |
StoreMemoryEntrySkill |
Persists structured memory into file, DB, or blob-based storage layer |
GenerateTraceLinkSkill |
Links artifact to trace, agent, and originating inputs (for lineage reconstruction) |
EmitMemoryEventSkill |
Emits events like MemoryEntryCreated, MemoryEntryUpdated, MemoryTagged |
UpdateKnowledgeIndexSkill |
Refreshes summary index and Studio memory dashboards |
ClassifyArtifactSkill |
Uses prompt completion to assign type labels: prompt, plan, doc, test, etc. |
SimilaritySearchSkill |
Retrieves semantically related memory entries by embedding distance |
๐ Example: TagArtifactSkill Output¶
{
"artifactId": "doc-event-driven-architecture",
"tags": ["architecture", "events", "services", "asynchronous"],
"domainContext": "PlatformArchitecture::Messaging",
"agentId": "EnterpriseArchitectAgent",
"traceId": "proj-900-v2",
"edition": "core"
}
๐งช Example Prompt Template (used by ClassifyArtifactSkill)¶
You are a classifier for ConnectSoft artifacts. Given the content below, label the artifact:
- Artifact type (one of: template, test, plan, prompt, architecture, documentation)
- Relevant domain or layer (e.g., DomainLayer, ApplicationLayer, Messaging)
--- Begin Content ---
<content_chunk>
--- End Content ---
๐ Skill Composition Flow¶
flowchart LR
A[Receive Artifact] --> B[TagArtifactSkill]
B --> C[ClassifyArtifactSkill]
C --> D[ChunkArtifactSkill]
D --> E[EmbedArtifactSkill]
E --> F[ValidateArtifactSkill]
F --> G[StoreMemoryEntrySkill]
G --> H[GenerateTraceLinkSkill]
H --> I[EmitMemoryEventSkill]
I --> J[UpdateKnowledgeIndexSkill]
๐ Shared/Exported Skills for Other Agents¶
| Skill | Consumer | Use |
|---|---|---|
SimilaritySearchSkill |
Generator, Planner, Reviewer Agents | Memory-based RAG |
GenerateTraceLinkSkill |
Orchestrator, Vision Agent | Blueprint and trace planning |
EmbedArtifactSkill |
Prompt Engineering Agent | Enrich prompt components |
TagArtifactSkill |
Test Generator Agent | Classify test specs and their targets |
๐ง Skill Observability Metadata¶
Each skill emits:
executionId,traceId,artifactIddurationMs,tokenCount,embeddingSizeskillName,status,validationResult
โ Logged into memory-ingestion-log.jsonl
โ Summary¶
The Knowledge Management Agent leverages a modular, reusable set of Semantic Kernel skills to:
- Embed, tag, classify, and store artifacts
- Link artifacts to traceable memory
- Serve downstream agents via similarity search and metadata queries
This skill structure enables agent-level autonomy, traceability, and precise integration across the entire AI Software Factory.
๐ ๏ธ Technologies Used¶
This section documents the technology stack powering the Knowledge Management Agent, aligned with ConnectSoftโs core principles: AI-first, cloud-native, modular, and observable.
The stack supports embedding, indexing, querying, traceability, and long-term semantic memory persistence.
๐ง Core AI & Embedding Infrastructure¶
| Component | Description |
|---|---|
| Semantic Kernel (SK) | Agent orchestration and skill execution engine (C#) |
| Azure OpenAI | Embedding model provider (text-embedding-ada-002 or custom SK-compatible models) |
| SK Plugins | For EmbedArtifact, TagArtifact, SimilaritySearch, TraceLinking |
| Prompt Templates | YAML/JSON or .prompt files used for tagging/classification logic |
| ModelContext Protocol (MCP) | Shared trace, prompt, and metadata schema; enables deterministic state exchange |
| Memory Middleware | Connects agents to knowledge store and emits observability events (MemoryEntryCreated, etc.) |
๐๏ธ Memory Storage & Retrieval Layer¶
| Component | Use |
|---|---|
| Azure AI Search | Vector store and semantic search backend |
| Blob Storage (Azure Storage) | Stores raw artifacts, embedding metadata, and memory-entry.json files |
| CosmosDB / Table Storage | Indexing memory metadata and version history (knowledge-index.yaml, trace-link-map.json) |
| SK MemoryStore (in-memory/dev) | In-memory memory layer used for testing, stubbing, or pre-ingestion caching |
๐ก Event & Observability Infrastructure¶
| Component | Purpose |
|---|---|
| Azure Event Grid / Service Bus | Emits MemoryEntryCreated, MemoryUpdated, MemoryTagged events |
| Application Insights / OpenTelemetry | Logs skillName, executionId, token count, ingestion failures |
| Memory Metrics Emitter | Publishes metrics like embedding size, deduplication rate, tag quality |
| Trace ID Tracker (via MCP) | Ensures all knowledge events are tied to traceId and agentId lineage |
๐งฑ Platform & Runtime¶
| Layer | Technology |
|---|---|
| ๐ฅ๏ธ Runtime | .NET 8, ASP.NET Core, C# |
| ๐ง SDKs | Azure.AI.OpenAI, Azure.Search.Documents, Microsoft.SemanticKernel |
| ๐งช Testing | MSTest, xUnit (embedding test plans), SpecFlow (feature-driven ingestion validation) |
| ๐ CI/CD | Azure Pipelines or GitHub Actions for memory syncs and batch re-indexing jobs |
๐งฐ Supporting Tooling¶
| Tool | Use |
|---|---|
dotnet-memory-tools |
CLI for local vector DB interaction, memory entry inspection |
embedding-debug-viewer |
Internal tool for visualizing memory vector similarity in Studio |
studio-memory-status.json |
Artifact used by Studio Dashboard to visualize memory coverage and trace density |
memory-canonicalizer.cs |
Library that normalizes file content before embedding (strips comments, dedents, etc.) |
๐ Security, Access, and Edition Isolation¶
| Mechanism | Purpose |
|---|---|
editionId scoping in blob keys and vector filters |
Ensures tenants/editions donโt leak memory across boundaries |
agentId + buildId signing in memory metadata |
Ensures memory lineage traceability and override protection |
RBAC over Azure Search + Storage |
Restricts who/what can read or write knowledge entries |
MemoryValidationPipeline.cs |
Static validation and schema enforcement for ingested entries |
โ Summary¶
The Knowledge Management Agent uses:
- ๐ค Semantic Kernel + Azure OpenAI for semantic enrichment
- ๐ฆ Azure-native storage and indexing for long-term traceable memory
- ๐ง Vector stores, trace maps, and skill plugins to power memory retrieval
- ๐ Observability-first instrumentation for diagnostics and Studio visibility
- ๐ Edition-aware, secure memory structures across 3000+ modules and agents
This creates a robust, modular, and extensible infrastructure for autonomous, context-aware knowledge reuse in ConnectSoftโs AI Software Factory.
๐งพ System Prompt¶
This section defines the system prompt used to initialize the Knowledge Management Agent. The system prompt sets the agentโs identity, mission, operational scope, and constraints โ ensuring consistency, traceability, and alignment with ConnectSoftโs memory-first architecture.
๐ง System Prompt Template¶
You are the Knowledge Management Agent for the ConnectSoft AI Software Factory.
Your role is to ingest, embed, tag, classify, and persist semantically valuable information from all agent outputs, project files, templates, prompts, test specifications, architecture documents, trace logs, and plans.
You must:
- Parse input artifacts and extract relevant metadata (traceId, agentId, domain context, editionId, moduleId)
- Generate vector embeddings using Semantic Kernel or Azure OpenAI models
- Tag each artifact with useful keywords and domain classification
- Validate memory entries for schema correctness and duplication
- Store structured entries in long-term memory (files, vector DBs, indexes)
- Emit `MemoryEntryCreated` or `MemoryUpdated` events with traceable metadata
- Maintain trace-link mappings and enrich the project knowledge index
- Enable semantic memory retrieval for downstream agents across all project modules
You operate using Clean Architecture and DDD principles.
Your knowledge output must be deterministic, reproducible, versioned, and observable.
Only emit memory events after successful ingestion and validation.
Every artifact must be anchored in:
- A traceId
- An agentId
- A domain context or bounded context
- A declared artifact type (template, doc, prompt, test, etc.)
You support Studio dashboard visibility by updating memory status files.
You do not hallucinate new content; you only enrich existing input.
Knowledge is your product. Context is your constraint. Traceability is your duty.
๐ Purpose of the System Prompt¶
| Goal | Mechanism |
|---|---|
| Set agent boundaries | Restricts behavior to enrichment, not generation |
| Enforce traceability | Requires traceId, agentId, editionId, etc. on every entry |
| Promote deterministic output | Requires schema validation and reproducible embeddings |
| Maintain modular separation | Operates per artifact, per edition, per context |
| Align with ConnectSoft factory principles | Clean Architecture, Event-Driven, Observability-First |
๐งญ Personality Traits Encoded¶
| Trait | Purpose |
|---|---|
| ๐ Semantic guardian | Protects and enriches memory across time |
| ๐ง Knowledge-first | Everything is indexed, nothing is lost |
| ๐งฉ Interconnected | Builds a knowledge graph from modular components |
| ๐ Trace-safe | No untagged or unverifiable output is allowed |
| ๐ Observability-driven | Outputs feed dashboards, trace audits, and agent backplanes |
โ Summary¶
The system prompt of the Knowledge Management Agent:
- Frames its identity as ConnectSoftโs memory engine
- Enforces semantic enrichment + traceability as non-negotiables
- Defines a bounded, observable scope of operations
- Powers consistent execution across all modules, editions, and agent clusters
This enables the agent to act with clarity, consistency, and confidence, embedding institutional memory into every build.
๐งพ Input Prompt Template¶
This section defines the input prompt template used by the Knowledge Management Agent when it needs to classify, tag, or summarize incoming artifacts through a prompt-completion flow (e.g., via Semantic Kernel + OpenAI).
The prompt is designed to be deterministic, context-aware, and aligned with ConnectSoftโs modular architecture and DDD boundaries.
๐ Input Prompt Template โ Artifact Classification & Metadata Extraction¶
You are a classification and metadata extraction assistant for the ConnectSoft AI Software Factory.
Your task is to analyze the content of the following artifact and return a structured JSON object containing:
- `artifactType`: What kind of artifact is this? (e.g., template, prompt, test-case, plan, documentation, api-contract)
- `domainContext`: Which architectural or domain area does it belong to? (e.g., Identity::ApplicationLayer, Messaging::InfrastructureLayer)
- `tags`: List of meaningful tags (max 10) that describe the content, purpose, and intent
- `language`: Source language (e.g., C#, Markdown, YAML, JSON, Gherkin)
- `targetAgents` (optional): If this artifact is primarily used by specific agent types (e.g., DeveloperAgent, DocumentationAgent), list them
Respond in valid JSON only.
--- Begin Artifact ---
{{artifact_content_chunk}}
--- End Artifact ---
๐ Example Completion Result¶
{
"artifactType": "template",
"domainContext": "Appointments::ApplicationLayer",
"tags": ["booking", "appointments", "service", "template", "async", "cancellationToken"],
"language": "C#",
"targetAgents": ["MicroserviceGeneratorAgent", "TestGeneratorAgent"]
}
๐ง Supported Completion Modes¶
| Mode | Purpose |
|---|---|
classification |
Determine type and domain of unknown artifact |
tagging |
Generate keyword-level semantic labels |
prompt summarization |
Reduce long prompts into concise descriptions |
metadata reinforcement |
Fill missing fields in memory-entry.json |
๐งช Prompt Parameters Controlled via Orchestration¶
| Parameter | Example |
|---|---|
artifactTypeHint |
"template", "test", "doc" (optional override) |
chunkWindowSize |
512 tokens default |
temperature |
0.0 for deterministic metadata |
forceLanguage |
Override for ambiguous formats (.txt with YAML inside) |
๐ Prompt Usage Scenarios¶
| Trigger | Usage |
|---|---|
Unknown .md file from docs/ |
Determine if it's architecture, business, or test |
| YAML plan with embedded SK | Extract domain context and target agents |
| Prompt plan from ProductManagerAgent | Tag with topic, edition, and reusable block info |
Raw .cs file |
Identify layer (domain, application), target agent, and tags |
โ Summary¶
The Knowledge Management Agent uses structured prompt templates to:
- Extract artifact type, domain context, tags, and language
- Ensure metadata completeness during ingestion
- Power semantic classification even when file naming is ambiguous
- Support consistent schema-based outputs for every knowledge entry
This enables accurate memory indexing across thousands of modular artifacts โ ensuring clarity, context, and cross-agent reusability.
๐ค Output Expectations¶
This section defines the expected structure, format, and quality of outputs produced by the Knowledge Management Agent.
Every output must be machine-readable, traceable, semantically tagged, and conform to the ConnectSoft knowledge ingestion schema.
๐ฆ Primary Output: memory-entry.json¶
Each artifact ingested results in a structured knowledge unit that includes:
{
"artifactId": "doc-clean-architecture-v1",
"traceId": "proj-811-v4",
"agentId": "DocumentationAgent",
"moduleId": "PlatformArchitecture",
"artifactType": "documentation",
"language": "Markdown",
"tags": ["clean architecture", "ddd", "layers", "guidelines"],
"domainContext": "PlatformArchitecture::ApplicationLayer",
"editionId": "core",
"embeddingId": "vec-934f5b87",
"version": "v5.3.0",
"ingestedAt": "2025-05-15T18:00:00Z"
}
๐ Output Format Standards¶
| Field | Format |
|---|---|
artifactId |
Snake/kebab-cased ID, unique per file/version (template-booking-service-v5_3) |
traceId, agentId, moduleId |
Mandatory โ ensure full lineage |
tags |
Array of lowercase strings, max 10 per entry |
domainContext |
Must be namespaced: Feature::Layer (e.g., Messaging::DomainLayer) |
embeddingId |
UUID or hashed ID of vector entry in Azure AI Search |
language |
Inferred from file extension or prompt analysis |
ingestedAt |
UTC timestamp (ISO 8601) |
๐ Additional Outputs¶
| File | Description |
|---|---|
embedding-vector.json |
Vector format depends on provider (SK, Azure OpenAI) |
trace-link-map.json |
One per trace; maps artifacts to upstream agents/decisions |
studio.knowledge.status.json |
Summary for Studio dashboard (coverage % per agent/module) |
memory-validation-report.yaml |
Warnings, fix suggestions for malformed/missing fields |
memory-events.log |
Stream of emitted ingestion events (e.g., MemoryEntryCreated) |
๐งช Output Quality Requirements¶
| Quality Rule | Enforcement |
|---|---|
| โ Traceable | Must contain traceId, agentId, domainContext |
| ๐ง Semantically tagged | Minimum of 3 tags; must reflect content not just filename |
| ๐งพ Deterministic | Identical input must produce same fingerprint and classification |
| ๐งฉ Non-duplicated | Re-ingestion should reuse or diff existing entry via artifactId |
| ๐ Version-aware | Different build versions of same artifact tracked independently |
| โ Schema-compliant | Validated before emitting events or storing in index |
๐งฐ Examples of Output Failures (Rejected Entries)¶
| Issue | Fix |
|---|---|
Missing traceId |
Rejected, logged in validation report |
Tags are empty or too generic (["code", "test"]) |
Prompt reclassification |
domainContext not namespaced |
Inferred via fallback skill |
| Artifact exceeds max embedding window | Chunked via ChunkArtifactSkill |
โ Summary¶
All outputs from the Knowledge Management Agent must be:
- ๐ Structured (
memory-entry.json) - ๐ง Semantically enriched (tags, context, domain)
- ๐ Trace-linked (traceId, agentId, editionId)
- ๐ Observable (emits ingestion logs, memory metrics)
- ๐งพ Reusable across modules, editions, and agents
This guarantees high-quality, AI-ready memory that powers semantic retrieval, traceability, and contextual reasoning at scale.
๐ง Memory: Short-Term and Long-Term¶
This section outlines the memory architecture of the Knowledge Management Agent โ distinguishing between short-term (ephemeral) and long-term (persistent) memory layers, and how they support semantic enrichment, traceability, and cross-agent context reuse.
๐ง Memory Types¶
| Type | Description |
|---|---|
| Short-Term Memory (STM) | Ephemeral, in-context memory for current execution: used for chaining SK skills and batching artifacts |
| Long-Term Memory (LTM) | Persistent, retrievable memory: stores structured, tagged, embedded artifacts for retrieval by other agents |
๐ฆ Short-Term Memory (STM)¶
| Scope | Lifetime |
|---|---|
| One ingestion flow or agent session | Exists only during execution |
| In-memory chunk map, token logs, context stack | Cleared post-ingestion or on flush trigger |
Used by ChunkArtifactSkill, EmbedArtifactSkill, SimilaritySearchSkill |
|
Implemented via MemoryContext.cs, SKContext, or DI container session-scoped services |
๐ STM Example¶
{
"currentTraceId": "proj-812-v2",
"chunkWindow": 512,
"activeArtifactId": "doc-observability-principles",
"recentTags": ["observability", "otel", "logging"],
"agentRole": "DocumentationAgent"
}
๐งฑ Long-Term Memory (LTM)¶
| Layer | Purpose |
|---|---|
memory-entry.json (per artifact) |
Canonical metadata + classification |
embedding-vector.json |
Persisted vector stored in Azure AI Search |
trace-link-map.json |
Artifact โ trace โ agent graph |
flaky-tests-index.yaml (if relevant) |
Carries test memory for QA clusters |
knowledge-index.yaml |
Global listing of all indexed knowledge units |
studio.knowledge.status.json |
Aggregated view for dashboard metrics |
cosmosdb.table(artifactId) |
Optional key-value store for tag history or version chaining |
๐ง LTM Queryability¶
| Method | Description |
|---|---|
| Vector similarity search | Top-k recall by embedding distance (semantic match) |
| Metadata filter | E.g., โall artifacts from TestGeneratorAgent in BookingServiceโ |
| Edition-contextual retrieval | Only memory scoped to editionId: vetclinic-lite |
| Time-anchored range | Show artifacts from last 7 days or build v5.3.0 only |
๐ STM โ LTM Lifecycle¶
flowchart LR
STM[Short-Term Context] --> CHUNK[ChunkArtifactSkill]
CHUNK --> EMBED[EmbedArtifactSkill]
EMBED --> META[TagArtifactSkill]
META --> LTM[StoreMemoryEntrySkill]
๐ Studio Dashboards & Memory Metrics¶
| Metric | Source |
|---|---|
memoryCoverageByModule |
Count of artifacts tagged per domainContext |
averageEmbeddingSize |
From vector DB ingestion stats |
retrievalRecallRate |
Used by downstream Generator agents |
redundancyRatio |
Duplicate memory rate during re-ingestion |
โ Summary¶
The Knowledge Management Agent supports two levels of memory:
- ๐ง Short-Term: Used for skill chaining, execution scope, and token-aware processing
- ๐ง Long-Term: Structured, retrievable, and trace-linked memory that powers semantic reuse across the entire platform
This dual-layer memory system enables semantic persistence, agent collaboration, and autonomous recall across builds, editions, and microservices.
โ Validation Logic¶
This section defines how the Knowledge Management Agent performs semantic, structural, and traceability validation on each knowledge unit before persisting it into long-term memory or emitting events.
Validation ensures memory is always accurate, non-redundant, and safe for downstream use across agents and pipelines.
โ Validation Lifecycle¶
flowchart TD
PARSE[๐ Parse Artifact] --> TAG[๐ท๏ธ Tag + Metadata]
TAG --> EMBED[๐ง Embed Vector]
EMBED --> VALIDATE[โ
ValidateArtifactSkill]
VALIDATE -->|Pass| STORE[๐พ Store in Memory]
VALIDATE -->|Fail| REPORT[๐ Write to memory-validation-report.yaml]
๐งช Validation Categories¶
| Category | Checks Performed |
|---|---|
| Traceability | Must have traceId, agentId, artifactId, and domainContext |
| Schema Compliance | Must conform to memory-entry.schema.json |
| Embedding Health | Vector is non-null, has expected dimensionality (e.g., 1536 for OpenAI) |
| Token Thresholds | Chunk sizes must not exceed configured limits (e.g., 1000 tokens) |
| Tag Completeness | Must contain 3โ10 meaningful tags |
| Edition Scoping | editionId must match known edition keyspace if present |
| Duplicate Detection | Check if artifactId with same content exists โ resolve as update or skip |
๐ Sample Validation Report Entry¶
artifactId: test-case-cancel-booking
errors:
- missing traceId
- tag count too low (1 tag detected)
- domainContext not namespaced (value = "Domain")
status: rejected
timestamp: 2025-05-15T18:21:00Z
๐ Deduplication Logic¶
| Check | Action |
|---|---|
| Exact hash match (content + traceId) | Skip storage, log as known |
Same artifactId + different version |
Store as versioned update (MemoryEntryUpdated) |
| Overlapping tag set + different module | Validate semantic distance โ suggest merge or skip |
| Duplicate across editions | Separate if editionId differs; else link to shared entry |
๐ Output: memory-validation-report.yaml¶
Every batch run or ingestion process outputs this summary file.
| Field | Purpose |
|---|---|
artifactId |
The artifact being evaluated |
validationErrors[] |
List of failed rules |
resolvedAction |
Skip, retry, mark for human review |
confidence |
(Optional) score from classification if ambiguous |
suggestedFixes[] |
Optional remediation hints (e.g., add tag, reclassify) |
๐งช Validation Skill: ValidateArtifactSkill¶
Runs last in the pipeline. Emits status (valid, warning, invalid), associated logs, and validationResultId for traceability.
Also updates memory-metrics.json with validationStatus: pass|fail|warn.
๐งฉ Fix Forward Patterns¶
| Issue | Fix |
|---|---|
| Missing domain context | Use fallback prompt to reclassify |
| Tags too generic | Trigger tag rerun with zero-temperature prompt |
Missing editionId |
Default to core if none applicable |
Non-namespaced artifactType |
Rewrite to lower-kebab-cased type (e.g., prompt-plan) |
โ Summary¶
The Knowledge Management Agent validates each knowledge unit across:
- ๐ Trace and schema conformance
- ๐ง Semantic tag richness and uniqueness
- ๐งพ Embedding and dimensionality correctness
- ๐ Duplication and version tracking
This ensures every output is reliable, retrievable, and trusted โ enabling safe reuse across the entire ConnectSoft AI Software Factory.
๐ Retry / Correction Flow¶
This section defines how the Knowledge Management Agent handles ingestion failures, invalid outputs, semantic mismatches, and retriable operations during its pipeline execution. It supports automated correction where possible and emits structured reports when human input is required.
๐ Retry Triggers and Conditions¶
| Trigger | Description |
|---|---|
| โ Embedding Failure | Azure OpenAI or SK embedding service fails (timeout, model unavailability) |
| โ ๏ธ Validation Error | Required fields missing (e.g., traceId, domainContext, tags) |
| ๐ซ Duplicate Artifact Detected | Artifact exists with same hash โ needs merge or skip decision |
| โ Chunking Failed | Tokenization exceeded limit or returned empty chunks |
| ๐ค Classification Ambiguous | Prompt failed to classify artifact type or domain context |
| ๐ฌ Prompt Completion Timeout | Tag generation or summarization LLM timed out or incomplete |
๐ Retry Logic Flow¶
flowchart TD
INGEST[Artifact Received] --> PROCESS[Skill Pipeline Run]
PROCESS --> VALIDATE
VALIDATE -->|Fail| RETRY[RetryHandler]
RETRY --> FIX1[Try Reclassify]
RETRY --> FIX2[Re-chunk Smaller]
RETRY --> FIX3[Retry Embedding]
FIX3 --> REVALIDATE[Re-run Validation]
REVALIDATE -->|Pass| STORE
REVALIDATE -->|Fail| ESCALATE[Mark as Requires Review]
๐งฉ Auto-Correction Steps¶
| Step | Action |
|---|---|
| Retry embedding | Uses fallback model or delay before retry |
| Re-chunking | Reduces chunk size to avoid token overflow |
| Tag regeneration | Re-prompts with adjusted classification parameters (e.g., temp=0, max_tokens=256) |
| Schema patching | Auto-fills editionId = "core" or applies default domain context if known |
| Hash rebasing | Changes version hash to avoid overwrite in cross-edition ingestion |
๐ Correction Metadata in memory-validation-report.yaml¶
artifactId: test-scenario-missing-login
status: retried
retriesAttempted: 2
corrections:
- embedding retried
- classification tag regenerated
validationResult: passed
originalFailure: missing embedding + invalid traceId
๐ซ Escalation Path (if retry fails)¶
| Condition | Escalation |
|---|---|
| Retries exhausted (3 attempts) | Logged with requires-human-review: true |
| Ambiguous or contradictory metadata | Output added to manual-review-needed.md |
| Missing core identifiers | Skipped entirely; flagged in validation report |
| Conflicting domain assignments | Added to conflict-resolution-queue.yaml |
๐ฆ Output Signals and Events¶
| Signal | Emitted When |
|---|---|
MemoryEntryRetrying |
First failure detected, attempting correction |
MemoryEntryCorrected |
Retry succeeded, now passed validation |
MemoryEntryRejected |
Correction failed, entry skipped |
MemoryEntryEscalated |
Requires human triage, included in review dashboard |
๐ง Retry Metrics (logged to memory-metrics.json)¶
| Metric | Description |
|---|---|
retrySuccessRate |
% of retries that passed |
maxRetriesReached |
Count of artifacts with retry cap hit |
retryAverageDurationMs |
Time taken to resolve a retry case |
auto-corrected-fields |
Count of missing tags, traceIds, or metadata filled during correction |
โ Summary¶
The Knowledge Management Agent includes a resilient retry and correction flow that:
- ๐ง Detects and retries recoverable ingestion failures
- ๐ง Automatically corrects classification, embedding, and metadata gaps
- ๐ซ Escalates only truly ambiguous or unsolvable cases
- ๐งพ Logs every correction path and emits observability events
This ensures semantic memory remains clean, complete, and reusable โ even in the face of partial or malformed inputs.
๐ค Collaboration Interfaces¶
This section defines how the Knowledge Management Agent interacts with other agents, services, and orchestration layers in the ConnectSoft AI Software Factory.
It enables cross-agent memory ingestion, retrieval, trace enrichment, and feedback sharing โ forming the foundation of shared knowledge across the entire system.
๐งฉ Agent Collaboration Map¶
flowchart TD
subgraph Producers
Arch[๐ Architecture Agents]
Dev[๐ป Developer Agents]
Doc[๐ Documentation Agent]
Gen[๐ง Generator Agents]
QA[๐งช QA & Test Agents]
end
subgraph Consumers
Plan[๐ Vision & Planning Agents]
Rev[๐ Reviewer Agents]
Orchestrator[๐งญ Orchestrator]
end
Arch --> KM[๐ง Knowledge Management Agent]
Dev --> KM
Doc --> KM
Gen --> KM
QA --> KM
KM --> Plan
KM --> Rev
KM --> Orchestrator
๐ Types of Collaborations¶
| Role | Description |
|---|---|
| Artifact Producers | Agents that create structured outputs: templates, test plans, docs, specs |
| Memory Consumers | Agents that retrieve or reference stored knowledge for generation or reasoning |
| Orchestration Layer | Coordinates execution, triggers ingestion, and validates memory events |
๐ Collaboration Interfaces by Agent¶
| Agent | Collaboration Details |
|---|---|
| VisionArchitectAgent | Retrieves past vision plans, strategic goal maps, blueprint fragments |
| TestGeneratorAgent | Pushes BDD scenarios and test metadata โ KM stores as memory-entry.json |
| ProductManagerAgent | Embeds prompt plans and decision logs for trace-based reuse |
| DocumentationAgent | Stores .md documents, indexes for retrieval in Studio |
| Generator Agents (Code) | Push generated templates, retrieve semantic memory via SimilaritySearchSkill |
| QAEngineerAgent | Stores qa-summary.json and regression metadata, links to trace |
| HumanOps Agent | May inspect memory for context in debug-handoff workflows |
| Studio Agent | Queries memory to build visual dashboards and trace graphs |
๐ Interface Types¶
| Interface | Mechanism |
|---|---|
SemanticKernelSkill |
StoreMemoryEntrySkill, SimilaritySearchSkill, TraceLinkSkill |
HTTP API (internal) |
/memory/entry/{artifactId} for agent-to-agent lookups |
Event Bus |
Emits MemoryEntryCreated, MemoryUpdated, MemoryTagged for consumption |
Blob Index/Vector Search |
Queryable from orchestrator or consumers using OpenAI/Azure AI Search SDK |
Studio Memory Status Export |
JSON feed (studio.knowledge.status.json) consumed by dashboard UI |
๐ Example: Memory Entry Ingestion from Generator Agent¶
{
"agentId": "MicroserviceGeneratorAgent",
"traceId": "proj-850-v1",
"artifactType": "template",
"moduleId": "NotificationService",
"tags": ["template", "notifications", "service", "async"],
"embeddingId": "vec-a8fba99f"
}
โ Available for retrieval by ReviewerAgent, VisionArchitectAgent, or TestGeneratorAgent.
โ Collaboration Rules¶
| Rule | Purpose |
|---|---|
| โ๏ธ Trace Required | Every artifact must be linked to trace and agent |
| ๐ Read-Write Roles | Producers write, Consumers query only |
| ๐ RBAC Optional | Edition-aware filtering can restrict visibility for some agents |
| ๐ Retrieval Optimized | Embeddings + metadata filters for fast queries |
| ๐ง Feedback Loop | Consumers can push tags or annotations back into memory (MemoryTagged event) |
โ Summary¶
The Knowledge Management Agent:
- ๐ค Interfaces with every agent to store, index, and expose contextual memory
- ๐ค Enables trace-based collaboration across planning, generation, testing, and validation
- ๐ Supports bi-directional trace enrichment and query workflows
- ๐ Powers Studio dashboards and AI planning agents with embedded institutional memory
This enables a modular, agent-driven knowledge mesh, where all decisions and outputs are contextual, reusable, and interconnected.
๐ Observability Hooks¶
This section defines the observability model for the Knowledge Management Agent โ covering emitted events, logs, metrics, dashboards, and diagnostic metadata. These hooks ensure the agentโs behavior is traceable, auditable, and integrable with Studio, CI/CD, and other agents.
๐ก Observability Events¶
| Event Name | Trigger | Payload Fields |
|---|---|---|
MemoryEntryCreated |
After valid ingestion | artifactId, traceId, agentId, tags, embeddingId |
MemoryEntryUpdated |
Artifact re-ingested with version delta | artifactId, versionFrom, versionTo, changeSummary |
MemoryTagged |
Manual or auto-tagging applied | artifactId, tagsAdded, sourceAgentId |
MemoryEntryRejected |
Validation failed after retries | artifactId, reason, traceId, validationResultId |
These events are published to Azure Event Grid, Service Bus, or internal EventStore, depending on environment.
๐ Sample: MemoryEntryCreated Event¶
{
"eventType": "MemoryEntryCreated",
"artifactId": "doc-event-driven-mindset",
"traceId": "proj-872-v2",
"agentId": "EnterpriseArchitectAgent",
"embeddingId": "vec-cdb83ae1",
"tags": ["architecture", "events", "messaging", "ddd"],
"timestamp": "2025-05-15T18:37:00Z"
}
๐ Metrics Collected¶
| Metric | Description |
|---|---|
memoryEntriesIngested |
Total artifacts processed and stored |
embeddingAverageSize |
Vector length (e.g., 1536 for OpenAI) |
validationPassRate |
Ratio of successfully validated artifacts |
retrySuccessRate |
How often retry flow succeeded |
tagDensityScore |
Avg. # of meaningful tags per artifact |
traceCoverageRatio |
% of project traces with linked memory |
artifactTypeDistribution |
Breakdown of ingested artifacts by type |
duplicateSuppressionRate |
% of entries skipped due to deduplication |
๐ฅ๏ธ Studio Dashboard Hooks¶
| Dashboard Tile | Data Source |
|---|---|
| ๐ง Knowledge Coverage by Module | Aggregated studio.knowledge.status.json |
| ๐ Memory Update Activity | Count of MemoryEntryUpdated events per sprint |
| ๐งฉ Tag Heatmap | Visual tag cloud built from most common tags by domain |
| ๐ Search Quality Preview | Top search results from recent queries with relevancy metrics |
| ๐งพ Validation Error Panel | Outputs from memory-validation-report.yaml with fixes suggested |
๐ Log Files¶
| File | Description |
|---|---|
memory-ingestion-log.jsonl |
Line-by-line log of each ingestion step: parse, tag, embed, validate |
memory-metrics.json |
Exported counters, histograms, validation stats |
memory-validation-report.yaml |
Full list of failed validations with context |
studio.knowledge.status.json |
Summary of coverage, edition impact, agent participation |
๐งฉ OpenTelemetry Instrumentation¶
| Span Name | Description |
|---|---|
MemoryAgent.IngestArtifact |
Main ingestion span (traced by traceId + artifactId) |
MemoryAgent.EmbedArtifactSkill |
Embedding vector creation sub-span |
MemoryAgent.ValidateArtifactSkill |
Validation span (logs error if fails) |
MemoryAgent.EmitEvent |
Event publication latency and confirmation |
๐ฆ Integration Targets¶
| Consumer | Usage |
|---|---|
| Orchestrator | Confirms memory entry emission before continuing agent cascade |
| Studio | Displays coverage, validation errors, memory lineage maps |
| HumanOps Agent | Reads logs for escalated debug-handled artifacts |
| CI/CD Pipelines | Optional: warn if memory delta is unexpectedly low (possible regression) |
โ Summary¶
The Knowledge Management Agent:
- Emits rich observability signals (
events,metrics,logs,OpenTelemetry spans) - Powers dashboards, pipelines, and audits through trace-linked semantic metadata
- Supports live feedback, QA memory monitoring, and studio visualization
- Enables end-to-end trust in memory-based generation, validation, and planning
This ensures memory is not just accurate โ it's transparent, explainable, and measurable.
๐งโ๐ป Human Intervention Hooks¶
This section outlines how human operators โ such as architects, quality leads, or HumanOps agents โ can interact with or override the behavior of the Knowledge Management Agent when automatic ingestion fails, classification is ambiguous, or manual tagging and curation is desired.
๐ฏ When Human Intervention Is Needed¶
| Scenario | Trigger |
|---|---|
| โ Artifact fails validation after max retries | Listed in memory-validation-report.yaml |
| โ Classification ambiguity | ClassifyArtifactSkill returns low confidence or null type |
| ๐งฉ Domain or tags are misapplied | Semantic mismatch detected by consumer agent or reviewer |
| โ Overwritten or conflicting versions | artifactId appears in conflicting modules or editions |
| ๐ Re-ingestion produces duplicate embeddings with inconsistent metadata | Requires merge decision |
| ๐ Developer or architect manually submits undocumented artifact | Needs human classification and tagging |
๐ ๏ธ HumanOps-Driven Inputs¶
| Input | Description |
|---|---|
manual-review-needed.md |
Markdown-based summary of memory items flagged for manual triage |
studio.knowledge.annotations.json |
Allows architects to inject tags, fix domain mappings, reclassify |
artifact-manual-ingestion.yaml |
Curated knowledge units uploaded manually with full metadata |
knowledge-conflict-resolution.yaml |
Resolved overrides for edition/multi-agent artifacts |
trace-enrichment.json |
Humans add traceId/agentId to โorphanedโ artifacts post-facto |
๐ Example: manual-review-needed.md¶
## ๐ง Manual Review โ Memory Ingestion Issues
1. **Artifact:** test-scenario-retry-appointment
- **Issue:** Unclassified test type; conflicting domain context
- **Suggested Fix:** Add domain: `Appointments::DomainLayer`; Type: `test-case`
- **Path:** /tests/scenarios/booking-retry.feature
- **traceId:** (missing)
2. **Artifact:** doc-legacy-workflow.md
- **Issue:** No traceId or agentId; manually uploaded
- **Action:** Tag as `PlatformHistory::Documentation`
๐ฅ๏ธ Studio Hooks¶
| Feature | Description |
|---|---|
| ๐ก โNeeds Reviewโ tag on tile | Appears on memory unit without clear classification |
| ๐ Inline Tag Editor | Allows adding/removing tags in Studio UI |
| ๐งญ Domain Reclassifier | Dropdown to select correct bounded context |
| โ โMark as Reviewedโ button | Updates MemoryEntryValidatedByHuman event |
| ๐งพ Annotation Panel | View and add studio.knowledge.annotations.json entries directly |
๐ Feedback Flow¶
flowchart TD
REJECTED[โ Artifact Fails Validation]
REJECTED --> MANUAL[๐ Added to Review Queue]
HUMAN[๐งโ๐ป HumanOps Annotates]
HUMAN --> ANNOTATIONS[๐ฅ Updates Annotations File]
ANNOTATIONS --> REINGEST[๐ Agent Re-ingests with Human Hints]
REINGEST --> MemoryEntryCorrected
๐ง HumanOps Actions Supported¶
| Action | Result |
|---|---|
Add traceId, agentId, editionId |
Enables retry and linkage |
| Reclassify artifact type | Updates artifactType and re-indexes |
| Adjust domain context | Moves entry to proper bounded context |
| Inject manual tags | Overwrites or appends to auto-generated tags |
| Submit fix for validation error | Clears from validation report and proceeds to memory entry creation |
๐ Outputs from Human Edits¶
| File | Effect |
|---|---|
studio.knowledge.annotations.json |
Source of manual tags and corrections |
memory-entry.json |
Updated with merged metadata from annotations |
MemoryEntryCorrected |
Event emitted upon successful re-ingestion after human input |
conflict-resolution.yaml |
Used in multi-edition or agent artifact re-alignment |
โ Summary¶
The Knowledge Management Agent:
- ๐งโ๐ป Supports structured human input when automatic ingestion fails
- ๐งพ Provides tooling for architects and HumanOps to correct memory metadata
- ๐งฉ Allows manual tagging, classification, and domain realignment
- ๐ค Resumes ingestion after intervention, preserving traceability
This creates a human-AI collaboration loop that ensures even edge-case or legacy artifacts are captured in the ConnectSoft knowledge graph โ with auditability and context preserved.
๐งพ Traceability & Governance¶
This section defines how the Knowledge Management Agent ensures full traceability, accountability, and governance for every memory action โ from ingestion to update โ aligning with ConnectSoftโs principles of observability-first, auditability, and multi-tenant safety.
๐ Traceability Requirements for Every Memory Entry¶
Each memory-entry.json must include:
| Field | Required |
|---|---|
artifactId |
โ Unique identifier for the artifact |
traceId |
โ Factory-wide execution trace linking to source run |
agentId |
โ Which agent created or submitted the artifact |
editionId |
โ Which tenant/edition the knowledge applies to |
moduleId |
โ Which microservice/module this memory belongs to |
version |
โ Build or semantic version of the artifact |
embeddingId |
โ ID linking to the vector representation |
ingestedAt |
โ UTC timestamp of ingestion or update |
artifactType, domainContext, tags |
โ Required classification metadata |
๐ Sample: Full Traceable Entry¶
{
"artifactId": "template-notification-service-v5_3_0",
"traceId": "proj-888-v4",
"agentId": "MicroserviceGeneratorAgent",
"moduleId": "NotificationService",
"domainContext": "Messaging::ApplicationLayer",
"artifactType": "template",
"editionId": "vetclinic-premium",
"embeddingId": "vec-789abc45",
"version": "v5.3.0",
"tags": ["notifications", "service", "template"],
"ingestedAt": "2025-05-15T18:52:00Z"
}
๐๏ธ Governance Rules Enforced¶
| Policy | Enforcement |
|---|---|
| โ No orphaned memory | Reject entries without traceId, agentId, or moduleId |
| ๐ Edition-aware indexing | Memory is stored in edition-specific collections or partitions |
| ๐งพ Signed memory updates | Every update includes previous version ID and diff summary |
| ๐งโโ๏ธ Immutable history | Once stored, a memory version cannot be deleted โ only superseded |
| ๐ Audit trails available | All ingestion events are timestamped and logged with diff metadata |
๐งฉ Multi-Tenant and Edition Governance¶
| Strategy | Description |
|---|---|
editionId namespacing |
Stored in blob keys, search filters, and index documents |
| RBAC + scoped queries | Consumers may only retrieve memory for allowed editions |
| Isolated update workflows | Edition-specific annotations and overrides do not affect others |
| Memory overlays | Same artifact across editions stored as separate entries with linkage metadata (memory-overlay-map.yaml) |
๐ Update and Diff Tracking¶
| Scenario | Governance Behavior |
|---|---|
artifactId exists, version differs |
MemoryEntryUpdated emitted, prior entry archived |
| Re-tagging occurs | Manual or auto-tagging triggers signed MemoryTagged event |
| Version rollback requested | Studio or Orchestrator may flag entry for rollback display (not deletion) |
๐ฅ๏ธ Audit & Review Access¶
| Tool | Capability |
|---|---|
memory-ingestion-log.jsonl |
Step-by-step audit of ingestion, tagging, embedding, event emission |
memory-validation-report.yaml |
Captures any rejected entries and why |
studio.knowledge.status.json |
Shows coverage by trace, agent, module, edition |
artifact-diff-tracker.yaml |
Optional: shows structural delta between versions (for visual review) |
๐ Governance Event Timeline¶
timeline
Ingestion: 2025-05-15T18:52Z : MemoryEntryCreated
Update: 2025-05-16T08:31Z : MemoryEntryUpdated
Tag Add: 2025-05-16T09:00Z : MemoryTagged
Studio View Refreshed: 2025-05-16T09:01Z
โ Summary¶
The Knowledge Management Agent:
- โ Ensures every memory unit is trace-linked, version-controlled, and edition-aware
- ๐ Protects knowledge integrity through immutable versioning and audit logging
- ๐ค Enables Studio, Orchestrator, and downstream agents to trust every retrieved artifact
- ๐ง Supports cross-edition overlays and governed multi-agent updates
- ๐งพ Provides a verifiable memory trail across factory runs, sprints, and modules
This makes ConnectSoftโs knowledge layer accountable, transparent, and production-grade.
๐ผ๏ธ Overview Diagram: Memory Flow¶
This section presents a high-level diagram showing the Knowledge Management Agentโs position in the semantic memory ecosystem, tracing how artifacts move from agent outputs into validated, traceable, and reusable long-term memory โ and how other agents consume this knowledge for autonomous generation, validation, and reasoning.
๐ Memory Flow Diagram¶
flowchart TD
subgraph Agent Producers
A1[๐งฑ Architecture Agent]
A2[๐ป Developer Agent]
A3[๐ Documentation Agent]
A4[๐งช QA/Test Agent]
A5[๐ง Generator Agent]
end
subgraph Knowledge Management Agent
K1[๐ฅ Artifact Ingestion]
K2[๐ท๏ธ Tag + Classify]
K3[๐ง Embed Vector]
K4[โ
Validate]
K5[๐พ Store Entry + Metadata]
K6[๐ก Emit Events + Logs]
end
subgraph Long-Term Memory
M1[๐ memory-entry.json]
M2[๐ embedding-vector.json]
M3[๐ trace-link-map.json]
M4[๐ studio.knowledge.status.json]
end
subgraph Consumers
C1[๐งญ Orchestrator]
C2[๐ง Generator Agents]
C3[๐ Studio Dashboard]
C4[๐ Reviewer Agent]
C5[๐งโ๐ป HumanOps Agent]
end
A1 --> K1
A2 --> K1
A3 --> K1
A4 --> K1
A5 --> K1
K1 --> K2 --> K3 --> K4 --> K5 --> K6
K5 --> M1
K5 --> M2
K5 --> M3
K6 --> M4
M1 --> C2
M2 --> C2
M3 --> C4
M4 --> C3
M1 --> C1
M1 --> C5
๐ง Flow Summary¶
-
Artifact Producers generate:
-
Code templates, test plans, architecture specs, documentation, prompts
-
KM Agent performs:
-
Tagging, classification, vectorization, validation
-
Memory is stored as:
-
Embeddings + metadata + trace-linked records
-
Consumers retrieve memory to:
-
Generate new features, validate coverage, populate dashboards, and close the trace loop
๐งฉ Role in Factory Flow¶
| Phase | KM Agent Role |
|---|---|
| ๐งญ Vision & Planning | Supplies prior goals, features, architecture |
| ๐งฑ Architecture Design | Retains reusable patterns and specs |
| ๐ ๏ธ Generation | Enables prompt/context enrichment for test/code |
| ๐งช QA/Validation | Tracks regressions, test memory, edition coverage |
| ๐ Documentation | Links all outputs into retrievable explainers |
| ๐ Observability | Feeds Studio knowledge graphs and dashboards |
๐ฏ Benefits of the Memory Flow¶
- ๐ Reusable intelligence across 3000+ services and editions
- ๐ Traceable lineage of all agent outputs
- ๐ง Contextual prompt grounding for generation agents
- ๐ Cross-agent understanding of architecture, test, and plan decisions
- โ Auditable memory trail for production-grade SaaS automation
โ Summary¶
This diagram illustrates the Knowledge Management Agentโs role as:
- ๐ The hub of semantic ingestion
- ๐พ The gatekeeper of reusable memory
- ๐ก The emitter of traceable knowledge events
- ๐ The foundation for AI-driven decision reuse, validation, and planning
It visually maps the heart of ConnectSoftโs Memory-First Software Factory.
๐ Summary & Final Blueprint¶
This final section consolidates the Knowledge Management Agentโs design, capabilities, trace integration, and strategic role across the ConnectSoft AI Software Factory โ and outlines future extensions to evolve it as an autonomous knowledge steward.
๐ง Final Blueprint Summary¶
๐ Core Mission¶
โTurn all agent output into structured, semantic, traceable, and reusable knowledge.โ
The Knowledge Management Agent is not just a logger. Itโs a semantic infrastructure that ensures:
- No knowledge is lost
- Every artifact is context-aware
- Memory becomes the foundation for reasoning and reuse
๐งฑ Agent Lifecycle (Summary)¶
flowchart TD
Ingest[๐ฅ Artifact Ingested] --> Tag[๐ท๏ธ Classify + Tag]
Tag --> Embed[๐ง Vector Embedding]
Embed --> Validate[โ
Validate & Deduplicate]
Validate --> Store[๐พ Store & Index]
Store --> Emit[๐ก Emit Events + Update Studio]
๐ Core Capabilities Recap¶
| Area | Description |
|---|---|
| ๐ฅ Ingestion | Accepts artifacts from any agent: code, test, plan, doc, prompt |
| ๐ง Semantic Enrichment | Tags, classifies, embeds, chunks, and versions each artifact |
| ๐ Traceability | Links memory to traceId, agentId, moduleId, editionId |
| ๐พ Long-Term Storage | Vector store + structured metadata + trace-link graphs |
| ๐ค Event Emission | Emits creation, update, tagging, and rejection events |
| ๐ Retrieval | Enables semantic search, edition-aware filtering, and prompt grounding |
| ๐ Observability | Powers Studio dashboards and CI/CD validation metrics |
| ๐งโ๐ป Human Collaboration | Supports annotations, overrides, and manual ingestion paths |
๐ Memory Artifact System¶
| Artifact | Purpose |
|---|---|
memory-entry.json |
Canonical metadata and trace for each artifact |
embedding-vector.json |
Semantic vector for retrieval and reasoning |
trace-link-map.json |
Lineage graph between agent, trace, and output |
memory-validation-report.yaml |
Tracks validation issues and correction outcomes |
studio.knowledge.status.json |
Displays coverage, gaps, and edition insights |
๐ Factory-Wide Impact¶
| Factory Stage | KM Agent Role |
|---|---|
| ๐งญ Planning | Retrieves strategic memory for alignment |
| ๐๏ธ Architecture | Reuses existing blueprints and domain layers |
| ๐ ๏ธ Generation | Provides prompt grounding and reusable patterns |
| ๐งช QA & Testing | Links regressions, test coverage, and flakiness memory |
| ๐ Documentation | Stores reusable explainers, release notes, test guides |
| ๐ Observability | Tracks knowledge coverage, growth, and resolution trends |
๐ฎ Future Expansion¶
| Feature | Description |
|---|---|
| ๐ง Knowledge Graph API | Structured querying of memory as interconnected domain graph |
| ๐งฌ Memory Diff Engine | Git-like diff view of knowledge changes across sprints |
| ๐งพ Prompt Patch Log | Detect when prompt completions or decisions evolve over time |
| ๐ Memory Explorer UI | Human-facing browser to navigate memory entries by edition, module, or tag |
| ๐ค Autonomous Knowledge Curator | AI agent that audits, prunes, and optimizes the knowledge graph proactively |
โ Final Statement¶
The Knowledge Management Agent transforms ConnectSoftโs factory from a code generator into a self-aware, memory-driven software intelligence system.
It is the backbone of continuity, the reasoner of trace, and the semantic source of truth across all modular, agentic automation flows.
Without it, agents forget. With it, they evolve.