🧠 Knowledge Management Agent Specification¶

🎯 Purpose¶

The Knowledge Management Agent is the semantic memory system of the ConnectSoft AI Software Factory.

Its primary goal is to:

Ingest, embed, enrich, and index all semantically important project knowledge — ensuring agents and humans can access context-rich, traceable, and reusable information across microservices, features, workflows, and conversations.

📌 Strategic Position in the Platform¶

The Knowledge Management Agent sits at the center of ConnectSoft’s semantic intelligence layer, enabling:

🔁 Cross-agent memory reuse — any agent (e.g., Developer, Generator, Test) can retrieve contextual artifacts
🧠 Memory persistence — structured, versioned knowledge across builds and sprints
📚 Knowledge graph construction — links between agents, outputs, decisions, templates, and user documentation
🔍 Retrieval-augmented generation (RAG) — powering context-aware completions and autonomous reasoning
🧩 Meta-coordination — memory structure acts as the backbone of planning, traceability, and reuse

🗺️ Where the Agent Operates in the Factory¶

flowchart TD
    subgraph Artifact Producers
      Docs[📄 Documentation Agents]
      Dev[👨‍💻 Developer Agents]
      Arch[🏗️ Architecture Agents]
      QA[🧪 QA/Test Agents]
    end

    subgraph Artifact Consumers
      Planner[🧭 Vision/Planning Agents]
      Generator[🛠️ Generator Agents]
      Reviewer[🔍 Reviewer Agents]
    end

    Docs --> KM[🧠 Knowledge Management Agent]
    Dev --> KM
    Arch --> KM
    QA --> KM
    KM --> Planner
    KM --> Generator
    KM --> Reviewer

Hold "Alt" / "Option" to enable pan & zoom

📘 Real-World Examples of Its Use¶

Use Case	Description
🧱 Template Reuse	Embeds and indexes all `*.template.cs` files for future AI scaffolding
📜 Documentation Memory	Extracts knowledge from `*.md` files and user-facing guides
⚙️ Feature Traceability	Maps feature specs → code outputs → generated tests
💬 Prompt Enrichment	Helps other agents inject contextual snippets from past runs or documents
🧪 Test Coverage Memory	Links QA agents to past scenarios, failed cases, or test descriptions

🔗 Anchored by ConnectSoft Principles¶

Principle	Relevance
Modularization	Each knowledge item is semantically scoped to a module, domain, or agent cluster
Observability-First	Every ingestion emits `MemoryEntryCreated` with `traceId`, `agentId`, `artifactId`
AI-First Development	Knowledge is actively indexed for reuse by Semantic Kernel agents
DDD + Clean Architecture	Embeds concepts, entities, and bounded contexts as retrievable memory units

💡 Philosophy¶

“Knowledge not stored, linked, and retrievable is wasted effort.”

The Knowledge Management Agent ensures no insight, artifact, or instruction is lost — enabling autonomous agents to reason across time, projects, and modular boundaries.

✅ Summary¶

The Knowledge Management Agent:

🧠 Acts as the semantic memory core of the entire platform
🔁 Ingests and indexes all agent outputs
🧩 Enables AI agents to reuse and reason with past context
📚 Structures knowledge into retrievable, traceable units
🧭 Powers memory-aware planning, generation, and validation across 3000+ modules

📋 Responsibilities¶

The Knowledge Management Agent is responsible for transforming transient agent output into long-term, queryable, and semantically organized memory — spanning all stages of the ConnectSoft AI Software Factory.

📦 Core Responsibilities¶

Responsibility	Description
🧩 Ingest Knowledge Artifacts	Accept files, messages, logs, and structured data from any agent or service
🧠 Embed Semantically Relevant Content	Generate vector representations (embeddings) for memory recall and similarity search
🏷️ Tag & Classify Artifacts	Extract metadata (e.g., domain, type, related agent, output purpose, module)
🔗 Link Knowledge to Trace Context	Record trace ID, agent ID, build ID, and edition ID for each memory unit
🗂️ Organize by Knowledge Domain	Structure content across templates, features, flows, code snippets, test plans, prompts
🧾 Version & Track Knowledge Units	Store changes across builds and provide deltas/patches if needed
🔍 Support Retrieval & RAG Queries	Respond to retrieval requests with similarity-ranked results or filtered metadata matches
🧪 Validate and Deduplicate Memory	Ensure quality and avoid noisy, redundant, or malformed records
📤 Emit Memory Events	Emit events like `MemoryEntryCreated`, `MemoryUpdated`, `KnowledgeGraphExtended`
🔁 Collaborate with Memory Consumers	Expose APIs, SK skills, and prompt templates to agents that consume knowledge (e.g., Generator, Reviewer, Vision Architect)

🧾 Extended Responsibilities¶

Area	Description
📄 Document Ingestion	Parse `.md`, `.spec.yaml`, `*.feature`, and design docs
🧱 Template Archiving	Ingest and tag all generated or reusable ConnectSoft templates
💬 Prompt History Management	Record and link prompts + completions across runs for auditability
📊 Knowledge Coverage Reporting	Provide Studio dashboards with insight into knowledge coverage by agent, module, or domain
🔄 Change Monitoring	Detect and flag when new knowledge conflicts or overrides previous memory entries
🧩 Knowledge Graph Expansion	Support advanced linking across memory: who generated what, why, when, for which tenant/module/feature

🧠 Knowledge Domains Tracked¶

Domain	Examples
📦 Templates	`.cs`, `.md`, `.json`, `.sql`, `.http`, etc.
🧬 Features	Prompt plans, user stories, epics, decisions
🏗️ Architecture	Diagrams, DDD bounded contexts, clean architecture layouts
📜 Documentation	Guides, READMEs, test instructions, contract definitions
🧪 Test Coverage	Test cases, regressions, scenario matrices
🤖 Agent Intelligence	Prompt templates, execution flows, skills used
🔗 Trace Context	`traceId`, `agentId`, `buildId`, `editionId`

✅ Summary¶

The Knowledge Management Agent:

Accepts all outputs across the software lifecycle
Extracts meaningful metadata and semantic embeddings
Links artifacts to modular, traceable memory
Powers downstream retrieval, reuse, and reasoning
Emits and maintains structured, versioned knowledge across agents

This ensures every AI agent in ConnectSoft’s ecosystem can access, contribute to, and benefit from shared intelligence at scale.

📥 Inputs Consumed¶

This section outlines what types of inputs the Knowledge Management Agent accepts, how they’re structured, and what metadata or semantic content it extracts during ingestion.

The agent supports a modality-agnostic, format-flexible ingestion pipeline across all ConnectSoft modules, microservices, agents, and environments.

📂 Accepted Input Types¶

Type	Description
`*.md`	Documentation files: READMEs, architecture docs, test guides, design decisions
`*.cs`	Code artifacts (especially templates, generators, orchestrators, domain entities)
`*.feature`	SpecFlow / BDD test specifications
`.json`, `.yaml`	Configuration, prompt plans, API contracts, memory schemas
`*.http`, `.sql`, `.sh`	API test files, query templates, scripts
`prompt.log.jsonl`	Prompt + completion logs from previous agent executions
`execution-trace.json`	End-to-end trace outputs from the orchestration layer
`trace-logs.json`, `memory-metrics.json`	Observability logs and metrics related to knowledge usage
`agent-output.*`	Outputs from other agents (Architect, QA, TestGen, Developer) including plans, specs, fixes, metrics

🧠 Semantic Metadata Extracted¶

Metadata	Purpose
`agentId`	Who created the artifact
`traceId`	Which execution it belongs to
`buildId` / `moduleId`	Which feature or service it is linked to
`artifactType`	Template, test, prompt, document, plan, script, etc.
`domainContext`	Architecture layer, DDD context, edition-specific scope
`language`	Code (`C#`, `SQL`, `YAML`, `Markdown`) or prompt language
`dependencies`	Files/modules it references or imports
`embeddingVector`	Semantic SK/OpenAI vector for similarity search
`versionId`	Version hash or build number from source control or factory run

📘 Sample Input Artifact (Simplified)¶

File: BookingService.template.cs Tags Extracted:

{
  "traceId": "proj-882-v3",
  "agentId": "MicroserviceGeneratorAgent",
  "moduleId": "BookingService",
  "artifactType": "template",
  "language": "C#",
  "domainContext": "Appointments::ApplicationLayer",
  "versionId": "v5.2.0",
  "edition": "vetclinic-blue"
}

🧠 Derived Inputs (via SK plugins or Orchestration)¶

Derived Input	How It’s Used
File-to-prompt conversion	Converts code or docs into embedding-ready chunks
Prompt memory index	Extracts reusable prompt tokens + completions
Interlinked dependency graphs	Establishes context → source → output traceability
Artifact lineage history	Tracks source → transform → generator mapping chain

🔄 Ingestion Modes¶

Mode	Trigger
Real-time	Triggered by agent execution events (e.g., `AgentCompletedExecution`)
Batch	Periodic sweep of project directory or blob storage
Manual	Human upload of new docs, architecture, or test plans
Retrospective	Bootstrapping from historical repositories or GitHub commits

✅ Summary¶

The Knowledge Management Agent ingests:

Files (.cs, .md, .yaml, .feature, .json)
Agent output traces, prompt logs, execution metadata
Edition-, module-, and feature-scoped knowledge artifacts
All tagged with trace IDs, agent IDs, build/version IDs, and domain context

It performs semantically rich ingestion across ConnectSoft’s modular AI ecosystem.

📤 Outputs Produced¶

This section defines the structured outputs emitted by the Knowledge Management Agent after processing inputs. These outputs are consumed by downstream agents for retrieval, generation, planning, traceability, and auditing.

The outputs ensure that every artifact — from code to prompts to test plans — becomes a queryable, versioned, semantically linked knowledge unit.

📦 Primary Output Artifacts¶

Output File	Description
`memory-entry.json`	Canonical metadata representation of the ingested artifact
`embedding-vector.json`	OpenAI/SK vector embedding for similarity-based retrieval
`knowledge-index.yaml`	Summary index of all knowledge units by module, agent, edition
`trace-link-map.json`	Links between artifact, its generating agent, traceId, and domain context
`memory-metrics.json`	Telemetry of ingestion (e.g., number of tokens embedded, duplication checks passed)
`memory-events.log`	Structured log with `MemoryEntryCreated`, `MemoryEntryUpdated`, `MemoryTagged`
`memory-validation-report.yaml`	Any warnings, errors, or fix suggestions from ingestion pipeline
`studio.knowledge.status.json`	Feed for Studio dashboard (knowledge coverage per module, agent, edition)

📘 Example: `memory-entry.json`¶

{
  "artifactId": "template-booking-service-2025-05-15",
  "traceId": "proj-888-v1",
  "agentId": "MicroserviceGeneratorAgent",
  "moduleId": "BookingService",
  "artifactType": "template",
  "language": "C#",
  "domainContext": "Appointments::ApplicationLayer",
  "tags": ["template", "booking", "appointments", "microservice"],
  "edition": "vetclinic-premium",
  "embeddingId": "vec-8f3b72ac",
  "version": "v5.3.0",
  "ingestedAt": "2025-05-15T17:08:00Z"
}

📘 Example: `trace-link-map.json`¶

{
  "traceId": "proj-888-v1",
  "artifactId": "test-cancel-appointment.feature",
  "generatedBy": "TestCaseGeneratorAgent",
  "linkedInputs": ["feature-plan.yaml", "booking-service.cs"],
  "relatedModules": ["Appointments", "Notifications"],
  "edition": "vetclinic-lite"
}

📈 `memory-metrics.json` Fields¶

Field	Description
`tokensProcessed`	Total tokens embedded from input file
`embeddingSize`	Length of resulting vector
`storageLocation`	Where the knowledge artifact is persisted
`deduplicationResult`	Pass / warning / collision
`tagQualityScore`	Heuristic on tag accuracy / completeness (0–1)
`validationErrors`	List of schema or metadata warnings (if any)

🧩 Outputs for Downstream Agents¶

Output	Used By	Purpose
`embedding-vector.json`	Generator Agents, Vision Architect	Contextual code/text retrieval
`memory-entry.json`	Reviewer Agent	Reasoning about origin, trace, and structure
`trace-link-map.json`	Orchestrator	Validate artifact lineage and agent attribution
`studio.knowledge.status.json`	Studio Dashboard	Visualize memory coverage, quality, and domain links

✅ Summary¶

The Knowledge Management Agent produces:

📁 Canonical memory entry files per ingested artifact
🔍 Vector embeddings for semantic search
📊 Knowledge coverage reports and metrics
🔗 Trace-link graphs for full AI artifact lineage
📤 Live dashboards and logs for observability and governance

These outputs transform static artifacts into semantic knowledge units — reusable across every phase of the AI Software Factory.

🧠 Knowledge Base¶

This section describes the pre-existing memory and embedded knowledge available to the Knowledge Management Agent before any new ingestion occurs — ensuring it starts with a rich understanding of the ConnectSoft platform, its structure, and factory-wide patterns.

📚 Pre-Embedded Core Knowledge Domains¶

Domain	Description
🧱 Templates Library	Semantic representation of all base project templates (`ConnectSoft.MicroserviceTemplate`, `*.template.cs`)
📦 Modular Architecture Guide	Vectorized understanding of bounded contexts, domain layers, and modularization strategy
📜 Documentation Corpus	Embedded project-wide `*.md` files from `/docs/`, including architecture, DDD, and principles
🧪 Test Specification Language	Known grammar and patterns for BDD `.feature` files, test cases, and scenario tagging
🔁 Agent Execution Flows	Historical traces from `agent-execution-flow.md`, pre-labeled by cluster and role
💬 Prompt Libraries	Pre-ingested prompt templates, macros, and completions from agents like `ProductManagerAgent`, `VisionArchitectAgent`, etc.
⚙️ Technology Stack Specification	Structured understanding of the ConnectSoft platform stack (`.NET 8`, `Azure`, `NHibernate`, `MassTransit`, `SK`)
🧠 Knowledge System Metadata	All definitions and schemas from `knowledge-and-memory-system.md`, including storage patterns, tags, embeddings, and memory events

📘 Example: Template Knowledge Entry (Preloaded)¶

{
  "artifactId": "template-orchestration-layer",
  "type": "template",
  "domainContext": "Orchestration::StartupPipeline",
  "tags": ["orchestrator", "di", "middleware", "hostBuilder"],
  "embeddingId": "vec-template-001",
  "description": "Standard orchestration entry point used by all generated microservices",
  "sourceFile": "orchestration-host.template.cs"
}

🧠 Built-In Conceptual Models¶

Concept	Description
`AgentCluster`	Maps all agents by role: Architect, Developer, QA, Generator
`TraceLinkModel`	Schema to relate trace → artifact → agent → module
`EmbeddingChunker`	Strategy for tokenizing long files while preserving semantic boundaries
`EditionScopeModel`	Rules to index knowledge differently based on edition-level customizations

🧩 Inherited Context from Other Agents¶

Agent	What It Shares
`Vision Architect Agent`	Prompt plan structure, strategy maps, requirement blueprints
`Test Generator Agent`	BDD structure patterns, test plan flows, test-to-trace mappings
`Microservice Generator Agent`	Templates, architecture assembly logic, skeleton project metadata
`Documentation Agent`	Markdown flow and documentation frame types

🧾 Prebuilt Memory Structures¶

Name	Purpose
`core-memory-index`	Preloaded memory entries keyed by `artifactType` + `domainContext`
`core-embedding-cache`	Base vector DB for fast retrieval before first ingestion cycle
`agent-execution-schema.json`	Known schema of agent inputs, outputs, traceIds, and lineage paths
`memory-event-types.json`	Types of memory lifecycle events (create, update, invalidate, promote)
`studio.knowledge.index.json`	Initial dashboard tiles mapped to core artifacts and trace clusters

✅ Summary¶

Before ingestion begins, the Knowledge Management Agent:

Already understands the structure, vocabulary, and semantic patterns of ConnectSoft’s platform
Possesses prebuilt template knowledge, documentation embeddings, and agent role maps
Maintains internal schemas and memory models to anchor new data
Can bootstrap other agents with intelligent context, even before a full knowledge ingestion pass

This makes the agent fast, smart, and reusable from the very first trace, powering a truly context-rich and autonomous factory ecosystem.

🔄 Process Flow¶

This section defines the end-to-end lifecycle of the Knowledge Management Agent’s execution — from input detection to memory enrichment and event emission.

Each step is modular, observable, and aligned with ConnectSoft’s AI-First, Traceable, and Memory-Centric principles.

🔁 High-Level Execution Flow¶

flowchart TD
    START[📥 Input Artifact Received]
    PARSE[🔍 Parse & Analyze Structure]
    TAG[🏷️ Extract Metadata & Domain Context]
    EMBED[🧠 Generate Semantic Embedding Vector]
    VALIDATE[✅ Validate Schema & Deduplication]
    STORE[💾 Persist Memory Entry + Vector + Metadata]
    INDEX[📂 Update Knowledge Index & Trace Links]
    EMIT[📤 Emit MemoryEntryCreated Event]
    STUDIO[🖥️ Push Knowledge Status to Studio]
    END[🏁 Agent Completes]

    START --> PARSE --> TAG --> EMBED --> VALIDATE --> STORE --> INDEX --> EMIT --> STUDIO --> END

Hold "Alt" / "Option" to enable pan & zoom

🧩 Phase-by-Phase Breakdown¶

Step	Description
1. Parse	Structure-specific parsing (`.cs`, `.md`, `.yaml`, `.json`) to normalize content
2. Tag	Extracts metadata: `traceId`, `agentId`, `domainContext`, `editionId`, `artifactType`, etc.
3. Embed	Calls Semantic Kernel / Azure OpenAI embedding skill to generate vector
4. Validate	Ensures uniqueness, metadata schema compliance, and semantic density (non-empty chunks)
5. Store	Persists structured memory in long-term storage (JSON, Azure Search, blob index)
6. Index	Updates internal YAML/graph-based memory maps (`knowledge-index.yaml`, `trace-link-map.json`)
7. Emit	Sends `MemoryEntryCreated` event with metadata, tags, and vectorId
8. Studio Sync	Updates `studio.knowledge.status.json` to visualize coverage and memory depth

📘 Example MemoryEntryCreated Event¶

{
  "eventType": "MemoryEntryCreated",
  "artifactId": "doc-clean-architecture-v1",
  "traceId": "proj-811-v4",
  "agentId": "DocumentationAgent",
  "moduleId": "PlatformArchitecture",
  "embeddingId": "vec-cb39f2c1",
  "timestamp": "2025-05-15T17:34:21Z"
}

🔄 Re-Entry Triggers¶

Trigger	Behavior
New artifact from trace	Execute full flow
Artifact already exists with version delta	Execute diff-based enrichment flow (`MemoryEntryUpdated`)
Conflicting artifact ID	Execute deduplication + retry flow
Re-ingestion by human prompt	Execute enrichment mode (add metadata or annotations)

🧠 Side Processes¶

🔁 Embedding retry with fallback model (e.g., if Azure OpenAI fails)
📊 Metrics collector updates memory-metrics.json
🧪 Validation failures logged to memory-validation-report.yaml

📦 Intermediate Artifacts¶

File	Purpose
`parsed-structure.json`	Intermediate representation used for embedding
`chunked-artifact.json`	Tokenized segments of long files or docs
`tag-map.yaml`	Applied tags by position or section of file
`memory-ingestion-log.jsonl`	Step-by-step debug-friendly audit trail per file

✅ Summary¶

The Knowledge Management Agent executes a structured, event-driven pipeline:

Parses → Tags → Embeds → Validates → Stores → Indexes
Emits memory events and updates all downstream consumers (agents, dashboards, planners)
Guarantees traceable, versioned, and semantically enriched knowledge ingestion at scale

This ensures no agent output is wasted — every artifact becomes a retrievable, queryable memory unit for autonomous reuse.

🧩 Skills and Kernel Functions¶

This section details the Semantic Kernel (SK) skills used by the Knowledge Management Agent to perform semantic enrichment, metadata tagging, vector embedding, validation, and trace linking.

These skills make the agent composable, observable, and programmable — allowing it to operate autonomously or as part of a larger orchestration.

🧠 Core Skills List¶

Skill	Purpose
`EmbedArtifactSkill`	Generates vector embedding from code, text, or prompt input
`TagArtifactSkill`	Extracts domain context, module, agent, edition, and tags
`ChunkArtifactSkill`	Tokenizes and chunks large inputs for embedding (context-aware windowing)
`ValidateArtifactSkill`	Ensures semantic + schema correctness, deduplication, and trace completeness
`StoreMemoryEntrySkill`	Persists structured memory into file, DB, or blob-based storage layer
`GenerateTraceLinkSkill`	Links artifact to trace, agent, and originating inputs (for lineage reconstruction)
`EmitMemoryEventSkill`	Emits events like `MemoryEntryCreated`, `MemoryEntryUpdated`, `MemoryTagged`
`UpdateKnowledgeIndexSkill`	Refreshes summary index and Studio memory dashboards
`ClassifyArtifactSkill`	Uses prompt completion to assign type labels: prompt, plan, doc, test, etc.
`SimilaritySearchSkill`	Retrieves semantically related memory entries by embedding distance

📘 Example: `TagArtifactSkill` Output¶

{
  "artifactId": "doc-event-driven-architecture",
  "tags": ["architecture", "events", "services", "asynchronous"],
  "domainContext": "PlatformArchitecture::Messaging",
  "agentId": "EnterpriseArchitectAgent",
  "traceId": "proj-900-v2",
  "edition": "core"
}

🧪 Example Prompt Template (used by `ClassifyArtifactSkill`)¶

You are a classifier for ConnectSoft artifacts. Given the content below, label the artifact:
- Artifact type (one of: template, test, plan, prompt, architecture, documentation)
- Relevant domain or layer (e.g., DomainLayer, ApplicationLayer, Messaging)

--- Begin Content ---
<content_chunk>
--- End Content ---

🔁 Skill Composition Flow¶

flowchart LR
    A[Receive Artifact] --> B[TagArtifactSkill]
    B --> C[ClassifyArtifactSkill]
    C --> D[ChunkArtifactSkill]
    D --> E[EmbedArtifactSkill]
    E --> F[ValidateArtifactSkill]
    F --> G[StoreMemoryEntrySkill]
    G --> H[GenerateTraceLinkSkill]
    H --> I[EmitMemoryEventSkill]
    I --> J[UpdateKnowledgeIndexSkill]

Hold "Alt" / "Option" to enable pan & zoom

🔗 Shared/Exported Skills for Other Agents¶

Skill	Consumer	Use
`SimilaritySearchSkill`	Generator, Planner, Reviewer Agents	Memory-based RAG
`GenerateTraceLinkSkill`	Orchestrator, Vision Agent	Blueprint and trace planning
`EmbedArtifactSkill`	Prompt Engineering Agent	Enrich prompt components
`TagArtifactSkill`	Test Generator Agent	Classify test specs and their targets

🧠 Skill Observability Metadata¶

Each skill emits:

executionId, traceId, artifactId
durationMs, tokenCount, embeddingSize
skillName, status, validationResult

→ Logged into memory-ingestion-log.jsonl

✅ Summary¶

The Knowledge Management Agent leverages a modular, reusable set of Semantic Kernel skills to:

Embed, tag, classify, and store artifacts
Link artifacts to traceable memory
Serve downstream agents via similarity search and metadata queries

This skill structure enables agent-level autonomy, traceability, and precise integration across the entire AI Software Factory.

🛠️ Technologies Used¶

This section documents the technology stack powering the Knowledge Management Agent, aligned with ConnectSoft’s core principles: AI-first, cloud-native, modular, and observable.

The stack supports embedding, indexing, querying, traceability, and long-term semantic memory persistence.

🧠 Core AI & Embedding Infrastructure¶

Component	Description
Semantic Kernel (SK)	Agent orchestration and skill execution engine (C#)
Azure OpenAI	Embedding model provider (`text-embedding-ada-002` or custom SK-compatible models)
SK Plugins	For `EmbedArtifact`, `TagArtifact`, `SimilaritySearch`, `TraceLinking`
Prompt Templates	YAML/JSON or `.prompt` files used for tagging/classification logic
ModelContext Protocol (MCP)	Shared trace, prompt, and metadata schema; enables deterministic state exchange
Memory Middleware	Connects agents to knowledge store and emits observability events (`MemoryEntryCreated`, etc.)

🗂️ Memory Storage & Retrieval Layer¶

Component	Use
Azure AI Search	Vector store and semantic search backend
Blob Storage (Azure Storage)	Stores raw artifacts, embedding metadata, and `memory-entry.json` files
CosmosDB / Table Storage	Indexing memory metadata and version history (`knowledge-index.yaml`, `trace-link-map.json`)
SK MemoryStore (in-memory/dev)	In-memory memory layer used for testing, stubbing, or pre-ingestion caching

📡 Event & Observability Infrastructure¶

Component	Purpose
Azure Event Grid / Service Bus	Emits `MemoryEntryCreated`, `MemoryUpdated`, `MemoryTagged` events
Application Insights / OpenTelemetry	Logs `skillName`, `executionId`, token count, ingestion failures
Memory Metrics Emitter	Publishes metrics like embedding size, deduplication rate, tag quality
Trace ID Tracker (via MCP)	Ensures all knowledge events are tied to `traceId` and `agentId` lineage

🧱 Platform & Runtime¶

Layer	Technology
🖥️ Runtime	.NET 8, ASP.NET Core, C#
🔧 SDKs	`Azure.AI.OpenAI`, `Azure.Search.Documents`, `Microsoft.SemanticKernel`
🧪 Testing	MSTest, xUnit (embedding test plans), SpecFlow (feature-driven ingestion validation)
🔁 CI/CD	Azure Pipelines or GitHub Actions for memory syncs and batch re-indexing jobs

🧰 Supporting Tooling¶

Tool	Use
`dotnet-memory-tools`	CLI for local vector DB interaction, memory entry inspection
`embedding-debug-viewer`	Internal tool for visualizing memory vector similarity in Studio
`studio-memory-status.json`	Artifact used by Studio Dashboard to visualize memory coverage and trace density
`memory-canonicalizer.cs`	Library that normalizes file content before embedding (strips comments, dedents, etc.)

🔐 Security, Access, and Edition Isolation¶

Mechanism	Purpose
`editionId` scoping in blob keys and vector filters	Ensures tenants/editions don’t leak memory across boundaries
`agentId + buildId` signing in memory metadata	Ensures memory lineage traceability and override protection
`RBAC over Azure Search + Storage`	Restricts who/what can read or write knowledge entries
`MemoryValidationPipeline.cs`	Static validation and schema enforcement for ingested entries

✅ Summary¶

The Knowledge Management Agent uses:

🤖 Semantic Kernel + Azure OpenAI for semantic enrichment
📦 Azure-native storage and indexing for long-term traceable memory
🧠 Vector stores, trace maps, and skill plugins to power memory retrieval
📊 Observability-first instrumentation for diagnostics and Studio visibility
🌐 Edition-aware, secure memory structures across 3000+ modules and agents

This creates a robust, modular, and extensible infrastructure for autonomous, context-aware knowledge reuse in ConnectSoft’s AI Software Factory.

🧾 System Prompt¶

This section defines the system prompt used to initialize the Knowledge Management Agent. The system prompt sets the agent’s identity, mission, operational scope, and constraints — ensuring consistency, traceability, and alignment with ConnectSoft’s memory-first architecture.

🧠 System Prompt Template¶

You are the Knowledge Management Agent for the ConnectSoft AI Software Factory.

Your role is to ingest, embed, tag, classify, and persist semantically valuable information from all agent outputs, project files, templates, prompts, test specifications, architecture documents, trace logs, and plans.

You must:
- Parse input artifacts and extract relevant metadata (traceId, agentId, domain context, editionId, moduleId)
- Generate vector embeddings using Semantic Kernel or Azure OpenAI models
- Tag each artifact with useful keywords and domain classification
- Validate memory entries for schema correctness and duplication
- Store structured entries in long-term memory (files, vector DBs, indexes)
- Emit `MemoryEntryCreated` or `MemoryUpdated` events with traceable metadata
- Maintain trace-link mappings and enrich the project knowledge index
- Enable semantic memory retrieval for downstream agents across all project modules

You operate using Clean Architecture and DDD principles.
Your knowledge output must be deterministic, reproducible, versioned, and observable.
Only emit memory events after successful ingestion and validation.

Every artifact must be anchored in:
- A traceId
- An agentId
- A domain context or bounded context
- A declared artifact type (template, doc, prompt, test, etc.)

You support Studio dashboard visibility by updating memory status files.
You do not hallucinate new content; you only enrich existing input.

Knowledge is your product. Context is your constraint. Traceability is your duty.

🔐 Purpose of the System Prompt¶

Goal	Mechanism
Set agent boundaries	Restricts behavior to enrichment, not generation
Enforce traceability	Requires `traceId`, `agentId`, `editionId`, etc. on every entry
Promote deterministic output	Requires schema validation and reproducible embeddings
Maintain modular separation	Operates per artifact, per edition, per context
Align with ConnectSoft factory principles	Clean Architecture, Event-Driven, Observability-First

🧭 Personality Traits Encoded¶

Trait	Purpose
📚 Semantic guardian	Protects and enriches memory across time
🧠 Knowledge-first	Everything is indexed, nothing is lost
🧩 Interconnected	Builds a knowledge graph from modular components
🔒 Trace-safe	No untagged or unverifiable output is allowed
🔍 Observability-driven	Outputs feed dashboards, trace audits, and agent backplanes

✅ Summary¶

The system prompt of the Knowledge Management Agent:

Frames its identity as ConnectSoft’s memory engine
Enforces semantic enrichment + traceability as non-negotiables
Defines a bounded, observable scope of operations
Powers consistent execution across all modules, editions, and agent clusters

This enables the agent to act with clarity, consistency, and confidence, embedding institutional memory into every build.

🧾 Input Prompt Template¶

This section defines the input prompt template used by the Knowledge Management Agent when it needs to classify, tag, or summarize incoming artifacts through a prompt-completion flow (e.g., via Semantic Kernel + OpenAI).

The prompt is designed to be deterministic, context-aware, and aligned with ConnectSoft’s modular architecture and DDD boundaries.

📘 Input Prompt Template – Artifact Classification & Metadata Extraction¶

You are a classification and metadata extraction assistant for the ConnectSoft AI Software Factory.

Your task is to analyze the content of the following artifact and return a structured JSON object containing:
- `artifactType`: What kind of artifact is this? (e.g., template, prompt, test-case, plan, documentation, api-contract)
- `domainContext`: Which architectural or domain area does it belong to? (e.g., Identity::ApplicationLayer, Messaging::InfrastructureLayer)
- `tags`: List of meaningful tags (max 10) that describe the content, purpose, and intent
- `language`: Source language (e.g., C#, Markdown, YAML, JSON, Gherkin)
- `targetAgents` (optional): If this artifact is primarily used by specific agent types (e.g., DeveloperAgent, DocumentationAgent), list them

Respond in valid JSON only.

--- Begin Artifact ---
{{artifact_content_chunk}}
--- End Artifact ---

🔍 Example Completion Result¶

{
  "artifactType": "template",
  "domainContext": "Appointments::ApplicationLayer",
  "tags": ["booking", "appointments", "service", "template", "async", "cancellationToken"],
  "language": "C#",
  "targetAgents": ["MicroserviceGeneratorAgent", "TestGeneratorAgent"]
}

🧠 Supported Completion Modes¶

Mode	Purpose
`classification`	Determine type and domain of unknown artifact
`tagging`	Generate keyword-level semantic labels
`prompt summarization`	Reduce long prompts into concise descriptions
`metadata reinforcement`	Fill missing fields in `memory-entry.json`

🧪 Prompt Parameters Controlled via Orchestration¶

Parameter	Example
`artifactTypeHint`	"template", "test", "doc" (optional override)
`chunkWindowSize`	512 tokens default
`temperature`	0.0 for deterministic metadata
`forceLanguage`	Override for ambiguous formats (`.txt` with YAML inside)

📂 Prompt Usage Scenarios¶

Trigger	Usage
Unknown `.md` file from `docs/`	Determine if it's architecture, business, or test
YAML plan with embedded SK	Extract domain context and target agents
Prompt plan from ProductManagerAgent	Tag with topic, edition, and reusable block info
Raw `.cs` file	Identify layer (domain, application), target agent, and tags

✅ Summary¶

The Knowledge Management Agent uses structured prompt templates to:

Extract artifact type, domain context, tags, and language
Ensure metadata completeness during ingestion
Power semantic classification even when file naming is ambiguous
Support consistent schema-based outputs for every knowledge entry

This enables accurate memory indexing across thousands of modular artifacts — ensuring clarity, context, and cross-agent reusability.

📤 Output Expectations¶

This section defines the expected structure, format, and quality of outputs produced by the Knowledge Management Agent.

Every output must be machine-readable, traceable, semantically tagged, and conform to the ConnectSoft knowledge ingestion schema.

📦 Primary Output: `memory-entry.json`¶

Each artifact ingested results in a structured knowledge unit that includes:

{
  "artifactId": "doc-clean-architecture-v1",
  "traceId": "proj-811-v4",
  "agentId": "DocumentationAgent",
  "moduleId": "PlatformArchitecture",
  "artifactType": "documentation",
  "language": "Markdown",
  "tags": ["clean architecture", "ddd", "layers", "guidelines"],
  "domainContext": "PlatformArchitecture::ApplicationLayer",
  "editionId": "core",
  "embeddingId": "vec-934f5b87",
  "version": "v5.3.0",
  "ingestedAt": "2025-05-15T18:00:00Z"
}

📘 Output Format Standards¶

Field	Format
`artifactId`	Snake/kebab-cased ID, unique per file/version (`template-booking-service-v5_3`)
`traceId`, `agentId`, `moduleId`	Mandatory — ensure full lineage
`tags`	Array of lowercase strings, max 10 per entry
`domainContext`	Must be namespaced: `Feature::Layer` (e.g., `Messaging::DomainLayer`)
`embeddingId`	UUID or hashed ID of vector entry in Azure AI Search
`language`	Inferred from file extension or prompt analysis
`ingestedAt`	UTC timestamp (ISO 8601)

📂 Additional Outputs¶

File	Description
`embedding-vector.json`	Vector format depends on provider (SK, Azure OpenAI)
`trace-link-map.json`	One per trace; maps artifacts to upstream agents/decisions
`studio.knowledge.status.json`	Summary for Studio dashboard (coverage % per agent/module)
`memory-validation-report.yaml`	Warnings, fix suggestions for malformed/missing fields
`memory-events.log`	Stream of emitted ingestion events (e.g., `MemoryEntryCreated`)

🧪 Output Quality Requirements¶

Quality Rule	Enforcement
❗ Traceable	Must contain `traceId`, `agentId`, `domainContext`
🧠 Semantically tagged	Minimum of 3 tags; must reflect content not just filename
🧾 Deterministic	Identical input must produce same fingerprint and classification
🧩 Non-duplicated	Re-ingestion should reuse or diff existing entry via `artifactId`
🔒 Version-aware	Different build versions of same artifact tracked independently
✅ Schema-compliant	Validated before emitting events or storing in index

🧰 Examples of Output Failures (Rejected Entries)¶

Issue	Fix
Missing `traceId`	Rejected, logged in validation report
Tags are empty or too generic (`["code", "test"]`)	Prompt reclassification
`domainContext` not namespaced	Inferred via fallback skill
Artifact exceeds max embedding window	Chunked via `ChunkArtifactSkill`

✅ Summary¶

All outputs from the Knowledge Management Agent must be:

📄 Structured (memory-entry.json)
🧠 Semantically enriched (tags, context, domain)
🔗 Trace-linked (traceId, agentId, editionId)
📊 Observable (emits ingestion logs, memory metrics)
🧾 Reusable across modules, editions, and agents

This guarantees high-quality, AI-ready memory that powers semantic retrieval, traceability, and contextual reasoning at scale.

🧠 Memory: Short-Term and Long-Term¶

This section outlines the memory architecture of the Knowledge Management Agent — distinguishing between short-term (ephemeral) and long-term (persistent) memory layers, and how they support semantic enrichment, traceability, and cross-agent context reuse.

🧠 Memory Types¶

Type	Description
Short-Term Memory (STM)	Ephemeral, in-context memory for current execution: used for chaining SK skills and batching artifacts
Long-Term Memory (LTM)	Persistent, retrievable memory: stores structured, tagged, embedded artifacts for retrieval by other agents

📦 Short-Term Memory (STM)¶

Scope	Lifetime
One ingestion flow or agent session	Exists only during execution
In-memory chunk map, token logs, context stack	Cleared post-ingestion or on flush trigger
Used by `ChunkArtifactSkill`, `EmbedArtifactSkill`, `SimilaritySearchSkill`
Implemented via `MemoryContext.cs`, `SKContext`, or DI container session-scoped services

📂 STM Example¶

{
  "currentTraceId": "proj-812-v2",
  "chunkWindow": 512,
  "activeArtifactId": "doc-observability-principles",
  "recentTags": ["observability", "otel", "logging"],
  "agentRole": "DocumentationAgent"
}

🧱 Long-Term Memory (LTM)¶

Layer	Purpose
`memory-entry.json` (per artifact)	Canonical metadata + classification
`embedding-vector.json`	Persisted vector stored in Azure AI Search
`trace-link-map.json`	Artifact ↔ trace ↔ agent graph
`flaky-tests-index.yaml` (if relevant)	Carries test memory for QA clusters
`knowledge-index.yaml`	Global listing of all indexed knowledge units
`studio.knowledge.status.json`	Aggregated view for dashboard metrics
`cosmosdb.table(artifactId)`	Optional key-value store for tag history or version chaining

🧠 LTM Queryability¶

Method	Description
Vector similarity search	Top-k recall by embedding distance (semantic match)
Metadata filter	E.g., “all artifacts from `TestGeneratorAgent` in `BookingService`”
Edition-contextual retrieval	Only memory scoped to `editionId: vetclinic-lite`
Time-anchored range	Show artifacts from last 7 days or build `v5.3.0` only

🔁 STM ↔ LTM Lifecycle¶

flowchart LR
    STM[Short-Term Context] --> CHUNK[ChunkArtifactSkill]
    CHUNK --> EMBED[EmbedArtifactSkill]
    EMBED --> META[TagArtifactSkill]
    META --> LTM[StoreMemoryEntrySkill]

Hold "Alt" / "Option" to enable pan & zoom

📊 Studio Dashboards & Memory Metrics¶

Metric	Source
`memoryCoverageByModule`	Count of artifacts tagged per domainContext
`averageEmbeddingSize`	From vector DB ingestion stats
`retrievalRecallRate`	Used by downstream Generator agents
`redundancyRatio`	Duplicate memory rate during re-ingestion

✅ Summary¶

The Knowledge Management Agent supports two levels of memory:

🧠 Short-Term: Used for skill chaining, execution scope, and token-aware processing
🧠 Long-Term: Structured, retrievable, and trace-linked memory that powers semantic reuse across the entire platform

This dual-layer memory system enables semantic persistence, agent collaboration, and autonomous recall across builds, editions, and microservices.

✅ Validation Logic¶

This section defines how the Knowledge Management Agent performs semantic, structural, and traceability validation on each knowledge unit before persisting it into long-term memory or emitting events.

Validation ensures memory is always accurate, non-redundant, and safe for downstream use across agents and pipelines.

✅ Validation Lifecycle¶

flowchart TD
    PARSE[🔍 Parse Artifact] --> TAG[🏷️ Tag + Metadata]
    TAG --> EMBED[🧠 Embed Vector]
    EMBED --> VALIDATE[✅ ValidateArtifactSkill]
    VALIDATE -->|Pass| STORE[💾 Store in Memory]
    VALIDATE -->|Fail| REPORT[🛑 Write to memory-validation-report.yaml]

Hold "Alt" / "Option" to enable pan & zoom

🧪 Validation Categories¶

Category	Checks Performed
Traceability	Must have `traceId`, `agentId`, `artifactId`, and `domainContext`
Schema Compliance	Must conform to `memory-entry.schema.json`
Embedding Health	Vector is non-null, has expected dimensionality (e.g., 1536 for OpenAI)
Token Thresholds	Chunk sizes must not exceed configured limits (e.g., 1000 tokens)
Tag Completeness	Must contain 3–10 meaningful tags
Edition Scoping	`editionId` must match known edition keyspace if present
Duplicate Detection	Check if `artifactId` with same content exists → resolve as update or skip

📘 Sample Validation Report Entry¶

artifactId: test-case-cancel-booking
errors:
  - missing traceId
  - tag count too low (1 tag detected)
  - domainContext not namespaced (value = "Domain")
status: rejected
timestamp: 2025-05-15T18:21:00Z

🔁 Deduplication Logic¶

Check	Action
Exact hash match (content + traceId)	Skip storage, log as known
Same `artifactId` + different version	Store as versioned update (`MemoryEntryUpdated`)
Overlapping tag set + different module	Validate semantic distance → suggest merge or skip
Duplicate across editions	Separate if `editionId` differs; else link to shared entry

📂 Output: `memory-validation-report.yaml`¶

Every batch run or ingestion process outputs this summary file.

Field	Purpose
`artifactId`	The artifact being evaluated
`validationErrors[]`	List of failed rules
`resolvedAction`	Skip, retry, mark for human review
`confidence`	(Optional) score from classification if ambiguous
`suggestedFixes[]`	Optional remediation hints (e.g., add tag, reclassify)

🧪 Validation Skill: `ValidateArtifactSkill`¶

Runs last in the pipeline. Emits status (valid, warning, invalid), associated logs, and validationResultId for traceability.

Also updates memory-metrics.json with validationStatus: pass|fail|warn.

🧩 Fix Forward Patterns¶

Issue	Fix
Missing domain context	Use fallback prompt to reclassify
Tags too generic	Trigger tag rerun with zero-temperature prompt
Missing `editionId`	Default to `core` if none applicable
Non-namespaced `artifactType`	Rewrite to lower-kebab-cased type (e.g., `prompt-plan`)

✅ Summary¶

The Knowledge Management Agent validates each knowledge unit across:

🔒 Trace and schema conformance
🧠 Semantic tag richness and uniqueness
🧾 Embedding and dimensionality correctness
🔁 Duplication and version tracking

This ensures every output is reliable, retrievable, and trusted — enabling safe reuse across the entire ConnectSoft AI Software Factory.

🔁 Retry / Correction Flow¶

This section defines how the Knowledge Management Agent handles ingestion failures, invalid outputs, semantic mismatches, and retriable operations during its pipeline execution. It supports automated correction where possible and emits structured reports when human input is required.

🔄 Retry Triggers and Conditions¶

Trigger	Description
❌ Embedding Failure	Azure OpenAI or SK embedding service fails (timeout, model unavailability)
⚠️ Validation Error	Required fields missing (e.g., traceId, domainContext, tags)
🚫 Duplicate Artifact Detected	Artifact exists with same hash → needs merge or skip decision
⛔ Chunking Failed	Tokenization exceeded limit or returned empty chunks
🤖 Classification Ambiguous	Prompt failed to classify artifact type or domain context
💬 Prompt Completion Timeout	Tag generation or summarization LLM timed out or incomplete

🔁 Retry Logic Flow¶

flowchart TD
    INGEST[Artifact Received] --> PROCESS[Skill Pipeline Run]
    PROCESS --> VALIDATE
    VALIDATE -->|Fail| RETRY[RetryHandler]
    RETRY --> FIX1[Try Reclassify]
    RETRY --> FIX2[Re-chunk Smaller]
    RETRY --> FIX3[Retry Embedding]
    FIX3 --> REVALIDATE[Re-run Validation]
    REVALIDATE -->|Pass| STORE
    REVALIDATE -->|Fail| ESCALATE[Mark as Requires Review]

Hold "Alt" / "Option" to enable pan & zoom

🧩 Auto-Correction Steps¶

Step	Action
Retry embedding	Uses fallback model or delay before retry
Re-chunking	Reduces chunk size to avoid token overflow
Tag regeneration	Re-prompts with adjusted classification parameters (e.g., temp=0, max_tokens=256)
Schema patching	Auto-fills editionId = `"core"` or applies default domain context if known
Hash rebasing	Changes version hash to avoid overwrite in cross-edition ingestion

📘 Correction Metadata in `memory-validation-report.yaml`¶

artifactId: test-scenario-missing-login
status: retried
retriesAttempted: 2
corrections:
  - embedding retried
  - classification tag regenerated
validationResult: passed
originalFailure: missing embedding + invalid traceId

🚫 Escalation Path (if retry fails)¶

Condition	Escalation
Retries exhausted (3 attempts)	Logged with `requires-human-review: true`
Ambiguous or contradictory metadata	Output added to `manual-review-needed.md`
Missing core identifiers	Skipped entirely; flagged in validation report
Conflicting domain assignments	Added to `conflict-resolution-queue.yaml`

📦 Output Signals and Events¶

Signal	Emitted When
`MemoryEntryRetrying`	First failure detected, attempting correction
`MemoryEntryCorrected`	Retry succeeded, now passed validation
`MemoryEntryRejected`	Correction failed, entry skipped
`MemoryEntryEscalated`	Requires human triage, included in review dashboard

🧠 Retry Metrics (logged to `memory-metrics.json`)¶

Metric	Description
`retrySuccessRate`	% of retries that passed
`maxRetriesReached`	Count of artifacts with retry cap hit
`retryAverageDurationMs`	Time taken to resolve a retry case
`auto-corrected-fields`	Count of missing tags, traceIds, or metadata filled during correction

✅ Summary¶

The Knowledge Management Agent includes a resilient retry and correction flow that:

🧠 Detects and retries recoverable ingestion failures
🔧 Automatically corrects classification, embedding, and metadata gaps
🚫 Escalates only truly ambiguous or unsolvable cases
🧾 Logs every correction path and emits observability events

This ensures semantic memory remains clean, complete, and reusable — even in the face of partial or malformed inputs.

🤝 Collaboration Interfaces¶

This section defines how the Knowledge Management Agent interacts with other agents, services, and orchestration layers in the ConnectSoft AI Software Factory.

It enables cross-agent memory ingestion, retrieval, trace enrichment, and feedback sharing — forming the foundation of shared knowledge across the entire system.

🧩 Agent Collaboration Map¶

flowchart TD
    subgraph Producers
      Arch[📐 Architecture Agents]
      Dev[💻 Developer Agents]
      Doc[📄 Documentation Agent]
      Gen[🧠 Generator Agents]
      QA[🧪 QA & Test Agents]
    end

    subgraph Consumers
      Plan[📊 Vision & Planning Agents]
      Rev[🔍 Reviewer Agents]
      Orchestrator[🧭 Orchestrator]
    end

    Arch --> KM[🧠 Knowledge Management Agent]
    Dev --> KM
    Doc --> KM
    Gen --> KM
    QA --> KM

    KM --> Plan
    KM --> Rev
    KM --> Orchestrator

Hold "Alt" / "Option" to enable pan & zoom

🔁 Types of Collaborations¶

Role	Description
Artifact Producers	Agents that create structured outputs: templates, test plans, docs, specs
Memory Consumers	Agents that retrieve or reference stored knowledge for generation or reasoning
Orchestration Layer	Coordinates execution, triggers ingestion, and validates memory events

📘 Collaboration Interfaces by Agent¶

Agent	Collaboration Details
VisionArchitectAgent	Retrieves past vision plans, strategic goal maps, blueprint fragments
TestGeneratorAgent	Pushes BDD scenarios and test metadata → KM stores as `memory-entry.json`
ProductManagerAgent	Embeds prompt plans and decision logs for trace-based reuse
DocumentationAgent	Stores `.md` documents, indexes for retrieval in Studio
Generator Agents (Code)	Push generated templates, retrieve semantic memory via `SimilaritySearchSkill`
QAEngineerAgent	Stores `qa-summary.json` and regression metadata, links to trace
HumanOps Agent	May inspect memory for context in debug-handoff workflows
Studio Agent	Queries memory to build visual dashboards and trace graphs

🔗 Interface Types¶

Interface	Mechanism
`SemanticKernelSkill`	`StoreMemoryEntrySkill`, `SimilaritySearchSkill`, `TraceLinkSkill`
`HTTP API (internal)`	`/memory/entry/{artifactId}` for agent-to-agent lookups
`Event Bus`	Emits `MemoryEntryCreated`, `MemoryUpdated`, `MemoryTagged` for consumption
`Blob Index/Vector Search`	Queryable from orchestrator or consumers using OpenAI/Azure AI Search SDK
`Studio Memory Status Export`	JSON feed (`studio.knowledge.status.json`) consumed by dashboard UI

📎 Example: Memory Entry Ingestion from Generator Agent¶

{
  "agentId": "MicroserviceGeneratorAgent",
  "traceId": "proj-850-v1",
  "artifactType": "template",
  "moduleId": "NotificationService",
  "tags": ["template", "notifications", "service", "async"],
  "embeddingId": "vec-a8fba99f"
}

→ Available for retrieval by ReviewerAgent, VisionArchitectAgent, or TestGeneratorAgent.

✅ Collaboration Rules¶

Rule	Purpose
⛓️ Trace Required	Every artifact must be linked to trace and agent
🔄 Read-Write Roles	Producers write, Consumers query only
🔐 RBAC Optional	Edition-aware filtering can restrict visibility for some agents
🔍 Retrieval Optimized	Embeddings + metadata filters for fast queries
🧠 Feedback Loop	Consumers can push tags or annotations back into memory (`MemoryTagged` event)

✅ Summary¶

The Knowledge Management Agent:

🤝 Interfaces with every agent to store, index, and expose contextual memory
📤 Enables trace-based collaboration across planning, generation, testing, and validation
🔗 Supports bi-directional trace enrichment and query workflows
📊 Powers Studio dashboards and AI planning agents with embedded institutional memory

This enables a modular, agent-driven knowledge mesh, where all decisions and outputs are contextual, reusable, and interconnected.

📊 Observability Hooks¶

This section defines the observability model for the Knowledge Management Agent — covering emitted events, logs, metrics, dashboards, and diagnostic metadata. These hooks ensure the agent’s behavior is traceable, auditable, and integrable with Studio, CI/CD, and other agents.

📡 Observability Events¶

Event Name	Trigger	Payload Fields
`MemoryEntryCreated`	After valid ingestion	`artifactId`, `traceId`, `agentId`, `tags`, `embeddingId`
`MemoryEntryUpdated`	Artifact re-ingested with version delta	`artifactId`, `versionFrom`, `versionTo`, `changeSummary`
`MemoryTagged`	Manual or auto-tagging applied	`artifactId`, `tagsAdded`, `sourceAgentId`
`MemoryEntryRejected`	Validation failed after retries	`artifactId`, `reason`, `traceId`, `validationResultId`

These events are published to Azure Event Grid, Service Bus, or internal EventStore, depending on environment.

📘 Sample: `MemoryEntryCreated` Event¶

{
  "eventType": "MemoryEntryCreated",
  "artifactId": "doc-event-driven-mindset",
  "traceId": "proj-872-v2",
  "agentId": "EnterpriseArchitectAgent",
  "embeddingId": "vec-cdb83ae1",
  "tags": ["architecture", "events", "messaging", "ddd"],
  "timestamp": "2025-05-15T18:37:00Z"
}

📊 Metrics Collected¶

Metric	Description
`memoryEntriesIngested`	Total artifacts processed and stored
`embeddingAverageSize`	Vector length (e.g., 1536 for OpenAI)
`validationPassRate`	Ratio of successfully validated artifacts
`retrySuccessRate`	How often retry flow succeeded
`tagDensityScore`	Avg. # of meaningful tags per artifact
`traceCoverageRatio`	% of project traces with linked memory
`artifactTypeDistribution`	Breakdown of ingested artifacts by type
`duplicateSuppressionRate`	% of entries skipped due to deduplication

🖥️ Studio Dashboard Hooks¶

Dashboard Tile	Data Source
🧠 Knowledge Coverage by Module	Aggregated `studio.knowledge.status.json`
🔁 Memory Update Activity	Count of `MemoryEntryUpdated` events per sprint
🧩 Tag Heatmap	Visual tag cloud built from most common tags by domain
🔍 Search Quality Preview	Top search results from recent queries with relevancy metrics
🧾 Validation Error Panel	Outputs from `memory-validation-report.yaml` with fixes suggested

📂 Log Files¶

File	Description
`memory-ingestion-log.jsonl`	Line-by-line log of each ingestion step: parse, tag, embed, validate
`memory-metrics.json`	Exported counters, histograms, validation stats
`memory-validation-report.yaml`	Full list of failed validations with context
`studio.knowledge.status.json`	Summary of coverage, edition impact, agent participation

🧩 OpenTelemetry Instrumentation¶

Span Name	Description
`MemoryAgent.IngestArtifact`	Main ingestion span (traced by traceId + artifactId)
`MemoryAgent.EmbedArtifactSkill`	Embedding vector creation sub-span
`MemoryAgent.ValidateArtifactSkill`	Validation span (logs error if fails)
`MemoryAgent.EmitEvent`	Event publication latency and confirmation

📦 Integration Targets¶

Consumer	Usage
Orchestrator	Confirms memory entry emission before continuing agent cascade
Studio	Displays coverage, validation errors, memory lineage maps
HumanOps Agent	Reads logs for escalated debug-handled artifacts
CI/CD Pipelines	Optional: warn if memory delta is unexpectedly low (possible regression)

✅ Summary¶

The Knowledge Management Agent:

Emits rich observability signals (events, metrics, logs, OpenTelemetry spans)
Powers dashboards, pipelines, and audits through trace-linked semantic metadata
Supports live feedback, QA memory monitoring, and studio visualization
Enables end-to-end trust in memory-based generation, validation, and planning

This ensures memory is not just accurate — it's transparent, explainable, and measurable.

🧑‍💻 Human Intervention Hooks¶

This section outlines how human operators — such as architects, quality leads, or HumanOps agents — can interact with or override the behavior of the Knowledge Management Agent when automatic ingestion fails, classification is ambiguous, or manual tagging and curation is desired.

🎯 When Human Intervention Is Needed¶

Scenario	Trigger
❌ Artifact fails validation after max retries	Listed in `memory-validation-report.yaml`
❓ Classification ambiguity	`ClassifyArtifactSkill` returns low confidence or null type
🧩 Domain or tags are misapplied	Semantic mismatch detected by consumer agent or reviewer
⛔ Overwritten or conflicting versions	`artifactId` appears in conflicting modules or editions
🔄 Re-ingestion produces duplicate embeddings with inconsistent metadata	Requires merge decision
🔍 Developer or architect manually submits undocumented artifact	Needs human classification and tagging

🛠️ HumanOps-Driven Inputs¶

Input	Description
`manual-review-needed.md`	Markdown-based summary of memory items flagged for manual triage
`studio.knowledge.annotations.json`	Allows architects to inject tags, fix domain mappings, reclassify
`artifact-manual-ingestion.yaml`	Curated knowledge units uploaded manually with full metadata
`knowledge-conflict-resolution.yaml`	Resolved overrides for edition/multi-agent artifacts
`trace-enrichment.json`	Humans add traceId/agentId to “orphaned” artifacts post-facto

📘 Example: `manual-review-needed.md`¶

## 🧠 Manual Review – Memory Ingestion Issues

1. **Artifact:** test-scenario-retry-appointment
   - **Issue:** Unclassified test type; conflicting domain context
   - **Suggested Fix:** Add domain: `Appointments::DomainLayer`; Type: `test-case`
   - **Path:** /tests/scenarios/booking-retry.feature
   - **traceId:** (missing)

2. **Artifact:** doc-legacy-workflow.md
   - **Issue:** No traceId or agentId; manually uploaded
   - **Action:** Tag as `PlatformHistory::Documentation`

🖥️ Studio Hooks¶

Feature	Description
🟡 “Needs Review” tag on tile	Appears on memory unit without clear classification
🔍 Inline Tag Editor	Allows adding/removing tags in Studio UI
🧭 Domain Reclassifier	Dropdown to select correct bounded context
✅ “Mark as Reviewed” button	Updates `MemoryEntryValidatedByHuman` event
🧾 Annotation Panel	View and add `studio.knowledge.annotations.json` entries directly

🔁 Feedback Flow¶

flowchart TD
    REJECTED[❌ Artifact Fails Validation]
    REJECTED --> MANUAL[📋 Added to Review Queue]
    HUMAN[🧑‍💻 HumanOps Annotates]
    HUMAN --> ANNOTATIONS[📥 Updates Annotations File]
    ANNOTATIONS --> REINGEST[🔁 Agent Re-ingests with Human Hints]
    REINGEST --> MemoryEntryCorrected

Hold "Alt" / "Option" to enable pan & zoom

🧠 HumanOps Actions Supported¶

Action	Result
Add `traceId`, `agentId`, `editionId`	Enables retry and linkage
Reclassify artifact type	Updates `artifactType` and re-indexes
Adjust domain context	Moves entry to proper bounded context
Inject manual tags	Overwrites or appends to auto-generated tags
Submit fix for validation error	Clears from validation report and proceeds to memory entry creation

📎 Outputs from Human Edits¶

File	Effect
`studio.knowledge.annotations.json`	Source of manual tags and corrections
`memory-entry.json`	Updated with merged metadata from annotations
`MemoryEntryCorrected`	Event emitted upon successful re-ingestion after human input
`conflict-resolution.yaml`	Used in multi-edition or agent artifact re-alignment

✅ Summary¶

The Knowledge Management Agent:

🧑‍💻 Supports structured human input when automatic ingestion fails
🧾 Provides tooling for architects and HumanOps to correct memory metadata
🧩 Allows manual tagging, classification, and domain realignment
📤 Resumes ingestion after intervention, preserving traceability

This creates a human-AI collaboration loop that ensures even edge-case or legacy artifacts are captured in the ConnectSoft knowledge graph — with auditability and context preserved.

🧾 Traceability & Governance¶

This section defines how the Knowledge Management Agent ensures full traceability, accountability, and governance for every memory action — from ingestion to update — aligning with ConnectSoft’s principles of observability-first, auditability, and multi-tenant safety.

🔐 Traceability Requirements for Every Memory Entry¶

Each memory-entry.json must include:

Field	Required
`artifactId`	✅ Unique identifier for the artifact
`traceId`	✅ Factory-wide execution trace linking to source run
`agentId`	✅ Which agent created or submitted the artifact
`editionId`	✅ Which tenant/edition the knowledge applies to
`moduleId`	✅ Which microservice/module this memory belongs to
`version`	✅ Build or semantic version of the artifact
`embeddingId`	✅ ID linking to the vector representation
`ingestedAt`	✅ UTC timestamp of ingestion or update
`artifactType`, `domainContext`, `tags`	✅ Required classification metadata

📘 Sample: Full Traceable Entry¶

{
  "artifactId": "template-notification-service-v5_3_0",
  "traceId": "proj-888-v4",
  "agentId": "MicroserviceGeneratorAgent",
  "moduleId": "NotificationService",
  "domainContext": "Messaging::ApplicationLayer",
  "artifactType": "template",
  "editionId": "vetclinic-premium",
  "embeddingId": "vec-789abc45",
  "version": "v5.3.0",
  "tags": ["notifications", "service", "template"],
  "ingestedAt": "2025-05-15T18:52:00Z"
}

🗂️ Governance Rules Enforced¶

Policy	Enforcement
❗ No orphaned memory	Reject entries without `traceId`, `agentId`, or `moduleId`
🔐 Edition-aware indexing	Memory is stored in edition-specific collections or partitions
🧾 Signed memory updates	Every update includes previous version ID and diff summary
🧑‍⚖️ Immutable history	Once stored, a memory version cannot be deleted — only superseded
📊 Audit trails available	All ingestion events are timestamped and logged with diff metadata

🧩 Multi-Tenant and Edition Governance¶

Strategy	Description
`editionId` namespacing	Stored in blob keys, search filters, and index documents
RBAC + scoped queries	Consumers may only retrieve memory for allowed editions
Isolated update workflows	Edition-specific annotations and overrides do not affect others
Memory overlays	Same artifact across editions stored as separate entries with linkage metadata (`memory-overlay-map.yaml`)

🔄 Update and Diff Tracking¶

Scenario	Governance Behavior
`artifactId` exists, version differs	`MemoryEntryUpdated` emitted, prior entry archived
Re-tagging occurs	Manual or auto-tagging triggers signed `MemoryTagged` event
Version rollback requested	Studio or Orchestrator may flag entry for rollback display (not deletion)

🖥️ Audit & Review Access¶

Tool	Capability
`memory-ingestion-log.jsonl`	Step-by-step audit of ingestion, tagging, embedding, event emission
`memory-validation-report.yaml`	Captures any rejected entries and why
`studio.knowledge.status.json`	Shows coverage by trace, agent, module, edition
`artifact-diff-tracker.yaml`	Optional: shows structural delta between versions (for visual review)

🔁 Governance Event Timeline¶

timeline
    Ingestion: 2025-05-15T18:52Z : MemoryEntryCreated
    Update: 2025-05-16T08:31Z : MemoryEntryUpdated
    Tag Add: 2025-05-16T09:00Z : MemoryTagged
    Studio View Refreshed: 2025-05-16T09:01Z

Hold "Alt" / "Option" to enable pan & zoom

✅ Summary¶

The Knowledge Management Agent:

✅ Ensures every memory unit is trace-linked, version-controlled, and edition-aware
🔐 Protects knowledge integrity through immutable versioning and audit logging
📤 Enables Studio, Orchestrator, and downstream agents to trust every retrieved artifact
🧠 Supports cross-edition overlays and governed multi-agent updates
🧾 Provides a verifiable memory trail across factory runs, sprints, and modules

This makes ConnectSoft’s knowledge layer accountable, transparent, and production-grade.

🖼️ Overview Diagram: Memory Flow¶

This section presents a high-level diagram showing the Knowledge Management Agent’s position in the semantic memory ecosystem, tracing how artifacts move from agent outputs into validated, traceable, and reusable long-term memory — and how other agents consume this knowledge for autonomous generation, validation, and reasoning.

📊 Memory Flow Diagram¶

flowchart TD
    subgraph Agent Producers
        A1[🧱 Architecture Agent]
        A2[💻 Developer Agent]
        A3[📄 Documentation Agent]
        A4[🧪 QA/Test Agent]
        A5[🧠 Generator Agent]
    end

    subgraph Knowledge Management Agent
        K1[📥 Artifact Ingestion]
        K2[🏷️ Tag + Classify]
        K3[🧠 Embed Vector]
        K4[✅ Validate]
        K5[💾 Store Entry + Metadata]
        K6[📡 Emit Events + Logs]
    end

    subgraph Long-Term Memory
        M1[📂 memory-entry.json]
        M2[📎 embedding-vector.json]
        M3[📜 trace-link-map.json]
        M4[📊 studio.knowledge.status.json]
    end

    subgraph Consumers
        C1[🧭 Orchestrator]
        C2[🧠 Generator Agents]
        C3[📊 Studio Dashboard]
        C4[🔍 Reviewer Agent]
        C5[🧑‍💻 HumanOps Agent]
    end

    A1 --> K1
    A2 --> K1
    A3 --> K1
    A4 --> K1
    A5 --> K1

    K1 --> K2 --> K3 --> K4 --> K5 --> K6

    K5 --> M1
    K5 --> M2
    K5 --> M3
    K6 --> M4

    M1 --> C2
    M2 --> C2
    M3 --> C4
    M4 --> C3
    M1 --> C1
    M1 --> C5

Hold "Alt" / "Option" to enable pan & zoom

🧠 Flow Summary¶

Artifact Producers generate:
Code templates, test plans, architecture specs, documentation, prompts
KM Agent performs:
Tagging, classification, vectorization, validation
Memory is stored as:
Embeddings + metadata + trace-linked records
Consumers retrieve memory to:
Generate new features, validate coverage, populate dashboards, and close the trace loop

🧩 Role in Factory Flow¶

Phase	KM Agent Role
🧭 Vision & Planning	Supplies prior goals, features, architecture
🧱 Architecture Design	Retains reusable patterns and specs
🛠️ Generation	Enables prompt/context enrichment for test/code
🧪 QA/Validation	Tracks regressions, test memory, edition coverage
📜 Documentation	Links all outputs into retrievable explainers
📊 Observability	Feeds Studio knowledge graphs and dashboards

🎯 Benefits of the Memory Flow¶

📚 Reusable intelligence across 3000+ services and editions
🔗 Traceable lineage of all agent outputs
🧠 Contextual prompt grounding for generation agents
🔍 Cross-agent understanding of architecture, test, and plan decisions
✅ Auditable memory trail for production-grade SaaS automation

✅ Summary¶

This diagram illustrates the Knowledge Management Agent’s role as:

🔄 The hub of semantic ingestion
💾 The gatekeeper of reusable memory
📡 The emitter of traceable knowledge events
🔍 The foundation for AI-driven decision reuse, validation, and planning

It visually maps the heart of ConnectSoft’s Memory-First Software Factory.

📘 Summary & Final Blueprint¶

This final section consolidates the Knowledge Management Agent’s design, capabilities, trace integration, and strategic role across the ConnectSoft AI Software Factory — and outlines future extensions to evolve it as an autonomous knowledge steward.

🧠 Final Blueprint Summary¶

🔍 Core Mission¶

“Turn all agent output into structured, semantic, traceable, and reusable knowledge.”

The Knowledge Management Agent is not just a logger. It’s a semantic infrastructure that ensures:

No knowledge is lost
Every artifact is context-aware
Memory becomes the foundation for reasoning and reuse

🧱 Agent Lifecycle (Summary)¶

flowchart TD
    Ingest[📥 Artifact Ingested] --> Tag[🏷️ Classify + Tag]
    Tag --> Embed[🧠 Vector Embedding]
    Embed --> Validate[✅ Validate & Deduplicate]
    Validate --> Store[💾 Store & Index]
    Store --> Emit[📡 Emit Events + Update Studio]

Hold "Alt" / "Option" to enable pan & zoom

📘 Core Capabilities Recap¶

Area	Description
📥 Ingestion	Accepts artifacts from any agent: code, test, plan, doc, prompt
🧠 Semantic Enrichment	Tags, classifies, embeds, chunks, and versions each artifact
🔗 Traceability	Links memory to `traceId`, `agentId`, `moduleId`, `editionId`
💾 Long-Term Storage	Vector store + structured metadata + trace-link graphs
📤 Event Emission	Emits creation, update, tagging, and rejection events
🔍 Retrieval	Enables semantic search, edition-aware filtering, and prompt grounding
📊 Observability	Powers Studio dashboards and CI/CD validation metrics
🧑‍💻 Human Collaboration	Supports annotations, overrides, and manual ingestion paths

📂 Memory Artifact System¶

Artifact	Purpose
`memory-entry.json`	Canonical metadata and trace for each artifact
`embedding-vector.json`	Semantic vector for retrieval and reasoning
`trace-link-map.json`	Lineage graph between agent, trace, and output
`memory-validation-report.yaml`	Tracks validation issues and correction outcomes
`studio.knowledge.status.json`	Displays coverage, gaps, and edition insights

📊 Factory-Wide Impact¶

Factory Stage	KM Agent Role
🧭 Planning	Retrieves strategic memory for alignment
🏗️ Architecture	Reuses existing blueprints and domain layers
🛠️ Generation	Provides prompt grounding and reusable patterns
🧪 QA & Testing	Links regressions, test coverage, and flakiness memory
📜 Documentation	Stores reusable explainers, release notes, test guides
📈 Observability	Tracks knowledge coverage, growth, and resolution trends

🔮 Future Expansion¶

Feature	Description
🧠 Knowledge Graph API	Structured querying of memory as interconnected domain graph
🧬 Memory Diff Engine	Git-like diff view of knowledge changes across sprints
🧾 Prompt Patch Log	Detect when prompt completions or decisions evolve over time
📚 Memory Explorer UI	Human-facing browser to navigate memory entries by edition, module, or tag
🤖 Autonomous Knowledge Curator	AI agent that audits, prunes, and optimizes the knowledge graph proactively

✅ Final Statement¶

The Knowledge Management Agent transforms ConnectSoft’s factory from a code generator into a self-aware, memory-driven software intelligence system.

It is the backbone of continuity, the reasoner of trace, and the semantic source of truth across all modular, agentic automation flows.

Without it, agents forget. With it, they evolve.

🧠 Knowledge Management Agent Specification¶

🎯 Purpose¶

📌 Strategic Position in the Platform¶

🗺️ Where the Agent Operates in the Factory¶

📘 Real-World Examples of Its Use¶

🔗 Anchored by ConnectSoft Principles¶

💡 Philosophy¶

✅ Summary¶

📋 Responsibilities¶

📦 Core Responsibilities¶

🧾 Extended Responsibilities¶

🧠 Knowledge Domains Tracked¶

✅ Summary¶

📥 Inputs Consumed¶

📂 Accepted Input Types¶

🧠 Semantic Metadata Extracted¶

📘 Sample Input Artifact (Simplified)¶

🧠 Derived Inputs (via SK plugins or Orchestration)¶

🔄 Ingestion Modes¶

✅ Summary¶

📤 Outputs Produced¶

📦 Primary Output Artifacts¶

📘 Example: memory-entry.json¶

📘 Example: trace-link-map.json¶

📈 memory-metrics.json Fields¶

🧩 Outputs for Downstream Agents¶

✅ Summary¶

🧠 Knowledge Base¶

📚 Pre-Embedded Core Knowledge Domains¶

📘 Example: Template Knowledge Entry (Preloaded)¶

🧠 Built-In Conceptual Models¶

🧩 Inherited Context from Other Agents¶

🧾 Prebuilt Memory Structures¶

✅ Summary¶

🔄 Process Flow¶

🔁 High-Level Execution Flow¶

🧩 Phase-by-Phase Breakdown¶

📘 Example MemoryEntryCreated Event¶

🔄 Re-Entry Triggers¶

🧠 Side Processes¶

📦 Intermediate Artifacts¶

✅ Summary¶

🧩 Skills and Kernel Functions¶

🧠 Core Skills List¶

📘 Example: TagArtifactSkill Output¶

🧪 Example Prompt Template (used by ClassifyArtifactSkill)¶

🔁 Skill Composition Flow¶

🔗 Shared/Exported Skills for Other Agents¶

🧠 Skill Observability Metadata¶

✅ Summary¶

🛠️ Technologies Used¶

🧠 Core AI & Embedding Infrastructure¶

🗂️ Memory Storage & Retrieval Layer¶

📡 Event & Observability Infrastructure¶

🧱 Platform & Runtime¶

🧰 Supporting Tooling¶

🔐 Security, Access, and Edition Isolation¶

✅ Summary¶

🧾 System Prompt¶

🧠 System Prompt Template¶

🔐 Purpose of the System Prompt¶

🧭 Personality Traits Encoded¶

✅ Summary¶

🧾 Input Prompt Template¶

📘 Input Prompt Template – Artifact Classification & Metadata Extraction¶

🔍 Example Completion Result¶

🧠 Supported Completion Modes¶

🧪 Prompt Parameters Controlled via Orchestration¶

📂 Prompt Usage Scenarios¶

✅ Summary¶

📤 Output Expectations¶

📦 Primary Output: memory-entry.json¶

📘 Output Format Standards¶

📂 Additional Outputs¶

🧪 Output Quality Requirements¶

🧰 Examples of Output Failures (Rejected Entries)¶

✅ Summary¶

🧠 Memory: Short-Term and Long-Term¶

🧠 Memory Types¶

📦 Short-Term Memory (STM)¶

📘 Example: `memory-entry.json`¶

📘 Example: `trace-link-map.json`¶

📈 `memory-metrics.json` Fields¶

📘 Example: `TagArtifactSkill` Output¶

🧪 Example Prompt Template (used by `ClassifyArtifactSkill`)¶

📦 Primary Output: `memory-entry.json`¶

📂 Output: `memory-validation-report.yaml`¶

🧪 Validation Skill: `ValidateArtifactSkill`¶

📘 Correction Metadata in `memory-validation-report.yaml`¶

🧠 Retry Metrics (logged to `memory-metrics.json`)¶

📘 Sample: `MemoryEntryCreated` Event¶

📘 Example: `manual-review-needed.md`¶