Skip to content

🧠 Data Architect Agent Specification

🎯 Purpose

The Data Architect Agent is responsible for designing, evolving, and governing the data storage layer of every service in the ConnectSoft AI Software Factory β€” while ensuring:

  • Scalability (partitioned & cloud-native storage)
  • Integrity (strong constraints, referential safety)
  • Compliance (PII classification, retention rules, auditability)
  • Extensibility (schema versioning, event rehydration, projections)
  • Zero-downtime evolution (migration automation, backward compatibility)

It produces platform-ready, validated, and observable database blueprints and migration artifacts, aligned with Clean Architecture, DDD, and ConnectSoft's security-first and observability-first principles.


πŸ’‘ Why This Agent Exists

Without a dedicated Data Architect Agent:

  • Tables and schemas drift from the domain model
  • Indexes, constraints, and nullability rules are inconsistently applied
  • Data becomes fragmented across incompatible formats
  • Migrations are unsafe, untraceable, and break CI/CD flows
  • No unified view of multitenancy, soft-deletion, or retention timelines
  • Event-sourced services lack structured projections for querying

With this agent:

βœ… Schemas are derived directly from the domain model
βœ… Migrations are tracked and environment-specific
βœ… Data is partitioned and scoped per tenant
βœ… PII fields and retention rules are codified
βœ… All artifacts are versioned, diffable, and CI/CD-integrated


🧱 Design Scope

  • βœ… DDL generation (SQL, YAML, FluentMigrator)
  • βœ… Migrations (create, alter, drop, index, rename, seed)
  • βœ… NoSQL collections & document design (MongoDB, Cosmos)
  • βœ… Projection modeling for event-driven views
  • βœ… Multitenant partitioning rules
  • βœ… TTL-based storage (for logs, sessions, events)
  • βœ… Data retention & redaction policy tagging
  • βœ… Observability: schema span emissions, table access tracing
  • βœ… Migration versioning via FluentMigrator or SQL tags

πŸ” Multi-Migration Strategy Support

The agent supports three primary types of migrations per service:

Migration Type Format Purpose
Declarative schema.sql, schema.yaml For schema-first services; output = full source of truth
Imperative *.cs using FluentMigrator For .NET-based microservices using versioned migrations
Composable migration-*.sql scripts For raw SQL runners, cross-dialect compatibility, rollback-friendly

βœ… The agent can emit all three types depending on settings.migration_strategy in the input prompt.


πŸ—‚οΈ Example Output Bundle

/schema
  schema.yaml
  schema.sql
  migration-20250422-initialize-db.sql
  migration-20250423-add-index-on-email.sql
  fluent/20250423_AddAuditColumns.cs
/metadata
  field-retention-map.yaml
  er-diagram.mmd
  schema-observability.yaml
DatabaseSchemaPublished.json

🧠 Role in the Agentic Lifecycle

flowchart TD
    DomainModeler --> DataArchitect
    ApplicationArchitect --> DataArchitect
    DataArchitect --> MigrationEngine
    DataArchitect --> TestGenerator
    DataArchitect --> ObservabilityAgent
    DataArchitect --> DevOpsAgent
Hold "Alt" / "Option" to enable pan & zoom

🧭 Summary

Feature Value
Domain-aligned schemas βœ… DDD mapping to storage
Migration strategy βœ… FluentMigrator, SQL, or YAML
Audit & retention awareness βœ… PII tagging, TTL injection
Multitenant design βœ… Partition, filter, and access rules per tenant
CI/CD readiness βœ… Outputs versioned, published, and validated
Observability βœ… Tables and fields mapped to telemetry templates

πŸ›οΈ Scope of Influence

The Data Architect Agent governs all aspects of a service’s persistent storage architecture, ensuring every decision is:

  • Domain-aligned
  • Cloud-native
  • Multi-tenant ready
  • Compliant
  • Migration-safe

It acts as the authoritative designer of relational, NoSQL, event-sourced, and cache-backed storage layers for microservices, modules, and shared infrastructure components.


πŸ“˜ Logical Scope

Area Included
βœ… Table/collection design CREATE TABLE / collection.definition
βœ… Field types, nullability, defaults e.g., amount NUMERIC(10,2) NOT NULL DEFAULT 0.0
βœ… Indexes, keys, constraints Primary, unique, foreign keys, composite indexes
βœ… Value object flattening Embed VOs inside parent entities (e.g., Address, Money)
βœ… Enum mapping Stored as integer or string (configurable per enum)
βœ… Tenant scoping Partition key, row filters, access policies
βœ… History/versioning fields CreatedAt, UpdatedAt, DeletedAt, Version, etc.
βœ… Observability tagging Span templates for table operations, audit log hooks
βœ… TTL & retention Policies for deletion, redaction, expiration

🧱 Physical Scope

Storage Type Responsibilities
Relational DB (Postgres, SQL Server) Normalized schema, constraints, indexes, foreign keys
NoSQL DB (MongoDB, Cosmos DB) Document design, nested field constraints, TTL indexing
Event Store / Projection DB Denormalized tables/views from event replay
Blob Storage Data classification (e.g., x-sensitive, file naming, metadata tagging)
Cache/Read Replicas Declare what fields can be projected or cached
Embedded DB (LiteDB, SQLite for mobile/edge) Strip down schema for constrained environments

πŸ”„ Temporal Scope

Aspect Controlled
Schema versioning schema.yaml.version, migrationId, fluentMigrator version
Historical data rules Audit logs, soft-deletion, row snapshots
Retention policies Time-based (e.g., 30d logs), legal holds
TTL logic Auto-purge of sessions, messages, etc.

πŸ” Compliance and Security Scope

Feature Role
PII classification Tags fields in field-retention-map.yaml
Redactable fields Emits policy flags (x-redactable: true)
Encrypted at rest Identifies fields requiring AES/GCM or external KMS
Row-level security (Postgres) Maps tenant access conditions to RLS rules
Data masking Emits masking metadata for tools like SQL Server Dynamic Data Masking or Cosmos DB Confidential LSM

🚫 Out of Scope

Not Owned Delegated To
Migration execution (runtime) βœ… DevOps Architect Agent
ORM entity generation (e.g., NHibernate) βœ… Adapter Generator Agent
Data analytics or BI schema ❌ (handled by future Analytics Architect Agent)
Application-layer queries (LINQ, etc.) βœ… Developer responsibility
Full-text search design βœ… Search Architect Agent (future role)

πŸ“¦ Output Influence Summary

Artifact Owned
schema.sql βœ…
schema.yaml βœ…
migration-*.sql βœ…
migration/*.cs (FluentMigrator) βœ…
er-diagram.mmd βœ…
field-retention-map.yaml βœ…
DatabaseSchemaPublished.json βœ…
Runtime index management ❌ (influences DevOps but not runtime schema sync)

🧭 ConnectSoft Scope Matrix

Layer Data Architect Agent
Domain Reads (VOs, Entities, Enums)
Application Reads use case flows for read/write frequency
Infrastructure Owns physical schema + migrations
Observability Emits table-span templates
Security Flags PII, tenant, RLS, retention fields
Governance Outputs versioning + changelogs

πŸ“‹ Core Responsibilities

The Data Architect Agent owns the definition and evolution of persistent storage blueprints
for ConnectSoft microservices, ensuring they are:

  • Domain-driven
  • Observable
  • Secure
  • Versioned
  • Composable

It produces schemas, migrations, diagrams, and compliance maps ready for CI/CD, deployment, and governance pipelines.


πŸ“˜ 1. Schema Design Responsibilities

Task Output
Translate domain aggregates and value objects into tables/collections schema.yaml, schema.sql, or schema.mongo.json
Flatten embedded value objects intelligently E.g., Address β†’ address_line1, city, zip
Handle enums as int, string, or lookup table (based on config) enum-strategy.yaml
Define primary keys, foreign keys, unique indexes, nullable/required flags schema.sql, migration-*.sql
Generate observability fields created_at, updated_at, trace_id, version, tenant_id

πŸ” 2. Migration and Evolution Responsibilities

Task Output
Generate forward-only SQL migrations migration-<timestamp>-*.sql
Generate FluentMigrator C# classes Migrations/20250501_AddFieldToClient.cs
Track schema version and emit diffs schema.version, migration.log
Suggest rollback plan or up/down separation Optional down.sql or Revert... migration
Handle safe field changes: nullable β†’ non-null, type widening, enum expansion Auto-patch and document

πŸ§ͺ 3. Validation and Consistency Enforcement

Task Validated
Naming conventions snake_case for SQL, camelCase for NoSQL
Type consistency with domain model Match DTOs and VOs (or explain mismatch)
Field constraints From validation-map.yaml: minLength, max, enum, etc.
Nullability rules Optionality mapped to nullable column/type
Index redundancy Avoid overlapping composite/unique/index sets
RLS & tenant fields Ensure tenant_id included where required

πŸ“‘ 4. Retention, Compliance, and Security

Task Output
PII/retention annotation field-retention-map.yaml
Masking/redaction strategy x-redactable, x-sensitive tags
TTL declaration for short-lived entities ttl-policy.yaml or inline schema flags
Data encryption flagging Fields marked x-encrypted: true
Compliance scope linking Maps to GDPR/SOC2 classification tags

🧭 5. Observability and Metrics Integration

Task Output
Emit schema-span templates schema-observability.yaml
Mark fields as observable (e.g., invoice.total, client.status) Used by Observability Agent
Inject audit trail structure created_by, modified_by, trace_id, correlation_id
Emit DatabaseSchemaPublished event with trace_id, schema version, and changelog URL

πŸ“„ 6. Documentation and Diagrams

Task Output
Generate ER diagram from schema er-diagram.mmd
Describe relationships in Markdown data-relationships.md
Auto-link schema objects to glossary terms Enables rich developer portal linking

βœ… Summary of Responsibilities

Area Description
🧱 Physical schema design βœ… Tables, fields, keys, indexes
🧩 Domain-aligned translation βœ… Aggregates, VOs, enums β†’ storage
πŸ“¦ Migration authoring βœ… FluentMigrator, SQL, DDL diffs
πŸ” Retention and compliance tagging βœ… PII, TTL, masking, audit trail
πŸ“ˆ Observability tagging βœ… Span templates, access fields
πŸ“Š Documentation and traceability βœ… ERD, Markdown, lifecycle event

πŸ“₯ Core Inputs

The Data Architect Agent consumes structured, traceable, and semantically meaningful artifacts
from upstream agents to model and evolve the persistent storage layer.

These inputs define:

  • The entities, value objects, and enums to persist
  • The field-level constraints and validations
  • The retention, PII, and security policies
  • The observability and multitenancy boundaries
  • Any previously generated schema snapshots for diffing and migration generation

πŸ“˜ Required Input Artifacts

Artifact Source Agent Purpose
domain-model.yaml Domain Modeler Agent Defines aggregates, value objects, identifiers
validation-map.yaml Domain Modeler Agent Field-level constraints (e.g., not null, length, enum)
use-cases.yaml Application Architect Agent Determines access patterns (write-heavy, read projections)
field-retention-policy.yaml Compliance Architect Agent Flags fields as pii, ttl, audit_required, redactable
application-architecture.md Application Architect Agent Maps services to tenancy rules and cross-entity flows
previous-schema.yaml (optional) CI/CD or prior agent run Used for diffing and migration detection
trace_id, bounded_context, service_name Orchestrator Enables traceability and governance audit linkage

πŸ“„ Input Examples

πŸ”Ά domain-model.yaml

aggregates:
  - name: Client
    fields:
      - name: clientId
        type: Guid
        required: true
      - name: email
        type: Email
      - name: address
        type: Address
      - name: status
        type: ClientStatus
    value_objects:
      - Address
      - Email
    enums:
      - ClientStatus: [Active, Inactive, Suspended]

πŸ”Ά validation-map.yaml

Client:
  email:
    format: email
    maxLength: 255
  address.zip:
    minLength: 5
    maxLength: 10
  status:
    enum: [Active, Inactive, Suspended]

πŸ”Ά field-retention-policy.yaml

fields:
  Client.email:
    pii: true
    redactable: true
    encrypted: true
  Client.status:
    retention: 5y
  Client.createdAt:
    audit_required: true

πŸ”Ά application-architecture.md (Excerpt)

  • This service is multi-tenant. All persistent objects must include a tenant_id column.
  • The Client aggregate supports soft deletion (i.e., deleted_at should be modeled).
  • The Address VO is reused across four services β€” prefer embedded rather than normalized form.

⚠️ Input Validation Rules

Rule Enforced
domain-model.yaml and validation-map.yaml must be present βœ…
Each field must resolve to a scalar or embeddable VO βœ…
Enum values must be matched to definition (for type safety) βœ…
If multi-tenancy is enabled, tenant_id must be generated and included βœ…
Previous schema required if migration_strategy = diff βœ…
Traceability fields must be resolved βœ… (trace_id, service_name, bounded_context)

🧠 Semantic Memory Integration (Optional)

Query Result
"Invoice.CreatedAt" Suggest timestamp with time zone, index on created_at, add to audit trail
"Status" enum reuse Suggest shared enum or foreign key to status_types table
"Address" VO reuse Suggest embedded pattern with naming template address_city, address_zip, etc.

πŸ“¦ Input Prompt Template (YAML)

assignment: "generate-schema"
project:
  service_name: ClientService
  trace_id: trace-data-112938
  bounded_context: CRM

inputs:
  domain_model_url: https://.../domain-model.yaml
  validation_rules_url: https://.../validation-map.yaml
  retention_policy_url: https://.../field-retention-policy.yaml
  app_architecture_url: https://.../application-architecture.md
  previous_schema_url: https://.../schema-v1.yaml

settings:
  migration_strategy: fluentmigrator
  output_format: [sql, yaml, erd]

πŸ“€ Core Outputs

The Data Architect Agent produces a complete, versioned, and traceable data schema bundle,
ready for:

  • CI/CD deployment
  • FluentMigrator-based evolution
  • Event-sourced projection building
  • Observability, compliance, and retention auditing
  • Developer portal documentation

πŸ“¦ Primary Output Artifacts

Artifact Format Purpose
schema.sql SQL DDL Declarative schema for relational DBs
schema.yaml YAML DSL Platform-native representation of entity/table structure
migration-*.sql SQL Forward-only or reversible migrations
Migrations/*.cs C# FluentMigrator classes Versioned imperative migrations
field-retention-map.yaml YAML Field-level metadata: PII, TTL, encryption
er-diagram.mmd Mermaid Entity-relationship diagram
data-schema-observability.yaml YAML Span templates for DB-level tracing
DatabaseSchemaPublished JSON Event that notifies schema is ready for use

πŸ“˜ 1. schema.sql

| Format | PostgreSQL or SQL Server | | Example |

CREATE TABLE client (
  client_id UUID PRIMARY KEY,
  email VARCHAR(255) NOT NULL,
  status VARCHAR(20) NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now(),
  tenant_id UUID NOT NULL,
  trace_id UUID,
  deleted_at TIMESTAMPTZ
);

βœ… Used in schema-first environments or as a reference baseline.


πŸ“˜ 2. schema.yaml

| Format | ConnectSoft DSL | | Purpose | Declarative schema for agents, dev portals, and governance |

tables:
  - name: client
    fields:
      - name: client_id
        type: uuid
        primary_key: true
      - name: email
        type: varchar(255)
        required: true
        pii: true
        encrypted: true
      - name: status
        type: varchar(20)
        enum: ClientStatus
      - name: tenant_id
        type: uuid
        partition_key: true
      - name: deleted_at
        type: timestamp
        soft_delete: true
    observability:
      traceable: true
      span_name: db.crm.client.insert

πŸ“¦ 3. Migrations

πŸ”Ή migration-<timestamp>-add-column.sql

ALTER TABLE client ADD COLUMN marketing_opt_in BOOLEAN DEFAULT false;

πŸ”Ή Migrations/20250428_AddMarketingOptIn.cs

[Migration(2025042801)]
public class AddMarketingOptIn : Migration
{
    public override void Up() =>
        Alter.Table("client").AddColumn("marketing_opt_in").AsBoolean().WithDefaultValue(false);

    public override void Down() =>
        Delete.Column("marketing_opt_in").FromTable("client");
}

βœ… Used by FluentMigrator runners or SQL CI scripts
βœ… Tags each migration with trace_id, service_name, version


πŸ“‘ 4. field-retention-map.yaml

| Format | YAML | | Purpose | Compliance annotation for field-level audit, redaction, encryption

client:
  email:
    pii: true
    redactable: true
    encrypted: true
  status:
    retention: 5y
  created_at:
    audit_required: true
  marketing_opt_in:
    retention: 1y

πŸ—ΊοΈ 5. er-diagram.mmd

erDiagram
    client {
        UUID client_id PK
        STRING email
        STRING status
        UUID tenant_id
        TIMESTAMP created_at
        TIMESTAMP deleted_at
    }

    invoice {
        UUID invoice_id PK
        UUID client_id FK
        NUMERIC amount
    }

    client ||--o{ invoice : has
Hold "Alt" / "Option" to enable pan & zoom

βœ… Used for developer portal embedding and visual CI diffs


πŸ“ˆ 6. data-schema-observability.yaml

client:
  operations:
    - insert:
        span_name: db.crm.client.insert
        trace_id_field: trace_id
    - delete:
        soft_delete: true
        span_name: db.crm.client.delete
        retention_policy: 7y

βœ… Enables Observability Agent to link spans to table operations


πŸ“’ 7. DatabaseSchemaPublished (Event)

{
  "event": "DatabaseSchemaPublished",
  "service": "ClientService",
  "boundedContext": "CRM",
  "schemaVersion": "v2.0",
  "traceId": "trace-data-112938",
  "artifactUrls": {
    "ddl": "https://schemas.connectsoft.dev/crm/schema-v2.sql",
    "migrations": "https://schemas.connectsoft.dev/crm/migrations/",
    "observability": "https://schemas.connectsoft.dev/crm/data-schema-observability.yaml"
  },
  "timestamp": "2025-05-01T23:00:00Z"
}

βœ… Consumed by CI, Test Generator, DevOps Agent, Dev Portal


βœ… Output Checklist

Output Required
Declarative schema (schema.sql, schema.yaml) βœ…
Migration strategy (SQL or FluentMigrator) βœ…
Retention/compliance metadata βœ…
Observability fields + spans βœ…
Event emission βœ… with links and trace_id
Diagram output (optional but default) βœ… Mermaid preferred

πŸ“š Knowledge Base Overview

The Data Architect Agent relies on a curated knowledge base of:

  • Schema design best practices
  • Domain β†’ storage translation patterns
  • Column naming conventions
  • Migration strategies
  • Retention policies
  • Polyglot persistence heuristics

This enables the agent to make domain-informed, platform-aligned decisions while producing predictable, reusable data designs.


🧱 1. Naming Conventions & Translation Rules

Pattern Output
Address VO β†’ flatten address_line1, address_city, address_zip
Enums stored as string Column type β†’ varchar(20), constraint on values
Timestamps Always use TIMESTAMPTZ with DEFAULT now()
Boolean fields Use is_* or _enabled suffix: is_active, email_verified
UUIDs Use uuid type, primary keys are snake_case_id (e.g., client_id)
Soft deletion Field = deleted_at TIMESTAMPTZ NULL, queried via RLS or adapter policy
Multi-tenancy Field = tenant_id UUID NOT NULL, partitioning enabled

πŸ“¦ 2. Schema Normalization & Denormalization Rules

Pattern Applied When
Flatten VOs If reused only once or shallow (e.g., Address)
Reference VOs If reused across >3 aggregates or contains sub-VOs
Lookup tables for enums If enum is user-extensible or over 10 values
Denormalized projections For CQRS read models and Materialized Views
Array-of-VOs Use JSONB or link-table (depending on access/read pattern)

πŸ” 3. Migration Strategy Heuristics

Migration Type Rule
SQL Script Preferred when schema is stable and cross-language
FluentMigrator Required for .NET-first services using NHibernate or Dapper
Diff-based generation Triggered when previous-schema.yaml is provided
Rollback support Enabled only if down.sql or Revert*.cs migration is explicitly requested
Data seeding Separate class or SQL script if requires_seed_data: true in schema.yaml

πŸ” 4. Retention & PII Enforcement Patterns

Field Type Tag
email, phone, ssn, dob x-sensitive: true, encrypted: true
user_id, device_id, ip_address audit_required: true
session, token, log tables ttl: 30d or custom expiration index
tenant_id Always partition key and indexed
deleted_at Required for soft-deletable entities

🧠 5. Semantic Memory β€” Schema Reuse Suggestions

Input Suggested Output
"Client aggregate with Address VO" Flatten Address inline unless reused by 3+ other aggregates
"Status = [Active, Suspended, Inactive]" Enum β†’ status VARCHAR(20) NOT NULL + constraint check
"Invoice.Amount" Suggest NUMERIC(10,2) with NOT NULL DEFAULT 0.00
"CreateAt" Suggest TIMESTAMPTZ DEFAULT now() and include in audit metadata
"EventStore" Suggest table with stream_id, sequence_number, event_type, payload JSONB

πŸ“ˆ 6. Optimization Patterns

Pattern Used For
Composite Index (email, status) Email-based filtering or dashboard search
Partial Index (deleted_at IS NULL) Faster active-only queries
Covering Index (client_id, created_at, amount) Optimized dashboards
FTS (Full-text search) Only for designated searchable_fields in schema.yaml
GIN Index on JSONB Used for polymorphic settings or tags in event_metadata column

πŸ§ͺ 7. Validation-Aware Templates

Validation Injected Constraint
minLength: 5 VARCHAR(255) CHECK (LENGTH(field) >= 5)
enum CHECK (status IN ('Active', 'Suspended'))
range: 0-100 CHECK (score BETWEEN 0 AND 100)
pattern: email Optional CHECK or rely on app-level + observability tags

πŸ” Changelog Awareness

Change Detected Action
Field added β†’ not required Minor migration with NULL default
Field added β†’ required Migration with backfill or default
Field removed Flag as breaking; require schema bump
Type changed Flag as breaking; recommend DROP ADD or shadow copy
Constraint tightened Suggest ALTER ... CHECK + validation audit

πŸ”„ Process Flow Overview

The Data Architect Agent executes a structured, deterministic flow to translate domain models, validation rules, and compliance metadata into:

  • Schema files (.sql, .yaml)
  • Migrations (.sql, FluentMigrator .cs)
  • Diagrams (.mmd)
  • Metadata maps for observability and retention

The agent supports multiple backends (PostgreSQL, Cosmos, MongoDB, etc.) and migration styles based on input prompt strategy.


πŸ“‹ Step-by-Step Lifecycle

Phase Description

1️⃣ Input Parsing and Semantic Resolution

  • Load and validate:
    • domain-model.yaml
    • validation-map.yaml
    • retention-policy.yaml
    • application-architecture.md
    • (Optional) previous-schema.yaml for diff-based migrations
  • Identify:
    • Aggregates, VOs, Enums
    • Table/entity boundaries
    • Multitenant scoping rules
    • Compliance and observability annotations

2️⃣ Table & Field Mapping

  • Map each aggregate β†’ table
  • Flatten value objects into fields (unless reused widely)
  • Enum handling strategy (varchar, int, lookup table)
  • Resolve field types and nullability
  • Add tenant_id, trace_id, and created_at if required

3️⃣ Index, Constraint, and Retention Policy Generation

  • Generate:
    • Primary and foreign keys
    • Unique constraints
    • Indexes (single, composite, partial)
  • Inject field constraints from validation-map.yaml:

    • minLength, enum, CHECK(...), default
  • Tag PII and TTL fields:

    • x-sensitive: true
    • ttl: 30d
    • encrypted: true
    • audit_required: true

4️⃣ Migration Strategy Resolution

  • If migration_strategy = diff:

    • Load previous-schema.yaml
    • Generate migration script or class for:
      • Added/dropped/renamed fields
      • Modified constraints or types
  • If fluentmigrator:

    • Create Migration/YYYYMMDD_HHMM_AddField.cs
  • If sql:

    • Generate migration-<timestamp>-add-field.sql

πŸ”„ Retry if type mismatch or unsafe drop detected (e.g., backfill or copy strategy required)


5️⃣ Schema File and DSL Output

  • Emit:
    • schema.sql β€” full DDL snapshot
    • schema.yaml β€” normalized data blueprint
    • schema.version (for governance and trace)
    • field-retention-map.yaml β€” used by compliance and governance agents

6️⃣ Observability Injection

  • Auto-inject:
    • trace_id, correlation_id fields
    • created_by, updated_by, deleted_by
    • Span templates per operation: insert, update, delete
    • Output: data-schema-observability.yaml

7️⃣ Documentation and Diagram Generation

  • Generate:
    • er-diagram.mmd β€” Mermaid-based ERD
    • data-relationships.md β€” Plaintext description of keys and joins
    • Optional: export to PlantUML or SVG if output_format requires

8️⃣ Schema Publication and Event Emission

  • Emit:
    • DatabaseSchemaPublished event
      β†’ contains: service, version, trace_id, artifact URLs
  • Archive:
    • Generated artifacts under /schema/v{X.Y.Z}/

πŸ“ˆ Telemetry & Span Injection

Span Description
schema_generation.duration_ms Total schema file generation time
migration_generation.retry_count Retry count on invalid field evolution
observability_injection.status Pass/fail on trace field mapping
compliance_annotation.coverage_percent Percent of fields tagged as PII or retention-bound
schema_diff.breaking_change_detected Boolean flag for version escalator

🧠 High-Level Flow

flowchart TD
    A[Parse Inputs] --> B[Map Aggregates to Tables]
    B --> C[Inject Validation + Constraints]
    C --> D[Generate Migrations]
    D --> E[Emit schema.sql / schema.yaml]
    E --> F[Inject Observability + PII]
    F --> G[Generate Diagram + Metadata]
    G --> H[Publish Event + Archive]
Hold "Alt" / "Option" to enable pan & zoom

πŸ› οΈ Skills and Kernel Functions

The Data Architect Agent is implemented as a modular, skill-driven pipeline using Semantic Kernel (.NET).
Each skill is independently observable, retry-safe, and designed to operate over domain-aligned inputs with structured traceability.


πŸ“‹ Core Skills Used

🧩 1. Input Inference and Normalization

Skill Purpose
DomainModelParserSkill Parses domain-model.yaml, identifies aggregates, VOs, enums
ValidationMapNormalizerSkill Loads and restructures field-level constraints
RetentionPolicyMapperSkill Maps compliance metadata (e.g., pii, ttl) to fields
TenantScoperSkill Ensures tenant_id is injected where needed

🧱 2. Schema and Field Modeling

Skill Purpose
TableBuilderSkill Converts aggregates β†’ tables, maps primary keys, VOs
FieldTypeInfererSkill Resolves types (uuid, varchar, jsonb, timestampz) from domain model
VOFlattenerSkill Embeds VOs inline unless explicitly configured as reference
EnumStrategySelectorSkill Determines how enums are stored (int, varchar, or lookup FK)
ConstraintInjectorSkill Applies CHECK, NOT NULL, DEFAULT, and other column constraints

πŸ” 3. Migration Strategy and Evolution

Skill Purpose
SchemaDiffDetectorSkill Diffs current schema vs. previous-schema.yaml to detect changes
SqlMigrationGeneratorSkill Emits ALTER TABLE, ADD COLUMN, etc. as .sql files
FluentMigratorGeneratorSkill Emits C# migration class templates (Up, Down) with version tags
MigrationSafetyCheckerSkill Flags unsafe ops (e.g., dropping columns, type narrowing) and injects fallback

πŸ”’ 4. Compliance and Retention Enforcement

Skill Purpose
PIITaggerSkill Tags fields with x-sensitive, encrypted, redactable
RetentionPolicyPlannerSkill Generates field TTLs, table-level expirations
RLSPolicyInjectorSkill Emits CREATE POLICY for Postgres-based RLS enforcement (if enabled)
MaskingStrategySkill Suggests dynamic masking for SQL Server / Cosmos DB use cases

πŸ“ˆ 5. Observability Injection

Skill Purpose
TraceFieldEnforcerSkill Ensures trace_id, correlation_id, created_by fields exist
AuditColumnInjectorSkill Adds created_at, updated_at, deleted_at, version fields
SpanTemplateEmitterSkill Outputs data-schema-observability.yaml with span templates
TelemetryTaggerSkill Annotates key business fields (e.g., invoice.amount) for tracing agents

πŸ–ΌοΈ 6. Documentation and Visualization

Skill Purpose
ERDiagramGeneratorSkill Generates Mermaid ER diagram (er-diagram.mmd)
RelationshipMarkdownWriterSkill Produces data-relationships.md for DevPortal
GlossaryLinkerSkill Cross-links fields to known terms in glossary.yaml (optional)

πŸ“€ 7. Publishing and Events

Skill Purpose
SchemaFileWriterSkill Outputs schema.sql, schema.yaml, field-retention-map.yaml
MigrationFileWriterSkill Emits .sql or .cs files based on strategy
DatabaseSchemaPublishedEmitterSkill Emits contract event with trace_id and artifact URLs
SchemaChangelogEmitterSkill (optional) Outputs version diffs if enabled via settings

🧠 Skill Composition Flow (Simplified)

flowchart TD
    A[DomainModelParserSkill] --> B[TableBuilderSkill]
    B --> C[ConstraintInjectorSkill]
    C --> D[MigrationStrategySkill]
    D --> E[Observability & Retention Injection]
    E --> F[SchemaFileWriterSkill]
    F --> G[DatabaseSchemaPublishedEmitterSkill]
Hold "Alt" / "Option" to enable pan & zoom

πŸ” Retry-Safe Behaviors

Skill Retry Trigger Resolution
FieldTypeInfererSkill Unknown or ambiguous type Fallback to text, flag warning
MigrationSafetyCheckerSkill Unsafe change detected Suggest DROP/ADD pair with backfill
PIITaggerSkill Unmapped sensitive field Suggest tag from memory or glossary
FluentMigratorGeneratorSkill Name conflict in class Auto-rename or suffix with timestamp

πŸ“ˆ Metrics Emitted (Per Skill)

Metric Example
skill_execution_duration_ms{name} ConstraintInjectorSkill: 24ms
retry_count{name} FieldTypeInfererSkill: 1 retry
sensitive_fields_detected_total email, phone, ssn
migration_artifact_emitted_total 3 .sql, 1 .cs
er_diagram_generated: true Mermaid flow verified

πŸ› οΈ Technologies Used

The Data Architect Agent leverages a robust, cloud-native, and polyglot-compatible toolchain to generate, validate, and publish data schemas and migrations across multiple database engines and migration systems β€” fully aligned with ConnectSoft’s automation-first strategy.


πŸ”· 1. Core Platforms & Frameworks

Technology Purpose
Semantic Kernel (.NET) Skill orchestration, input coordination, retry logic
FluentMigrator (.NET) C#-based declarative migration generation (Up()/Down() methods)
PostgreSQL Default SQL target for multi-tenant SaaS environments
SQL Server Optional support for legacy and enterprise integrations
Mermaid.js Visual generation of ER diagrams from schema structure
OpenTelemetry SDK Trace/span emission per skill and contract generation step

πŸ“œ 2. Schema Output Formats

Format Purpose
schema.sql Declarative PostgreSQL/SQL Server DDL
schema.yaml Platform-native schema description used across agents
migration-*.sql Forward-only or versioned migrations
*.cs (FluentMigrator) .NET-first imperative migrations
er-diagram.mmd Graphical entity relationships (for DevPortal + docs)
field-retention-map.yaml Compliance metadata for security/governance agents
data-schema-observability.yaml Spans and trace fields for runtime agents

πŸ” 3. Migration Strategies Supported

Strategy Tools Description
Declarative DDL schema.sql, schema.yaml Single-file representation; used as baseline
FluentMigrator .cs C# classes Used in CI/CD pipeline for .NET microservices
Diff-based SQL migration-*.sql Generated from previous schema snapshots (ALTER TABLE, etc.)
Rollback support FluentMigrator or optional down.sql Optional if configured in prompt or CI policy

☁️ 4. Cloud-Native Support Targets

Provider Component
Azure SQL Database Target for production relational workloads
Azure Cosmos DB (Mongo API) NoSQL collections with TTL/index rules
Azure Storage Blobs Used for uploading schema bundles + changelogs
Azure Event Grid Publishes DatabaseSchemaPublished event
Azure DevOps CI pipeline triggers + artifact deployment

🧠 5. Semantic Memory & Validation Helpers

Tool Purpose
ConnectSoft Schema Registry Stores historical schema snapshots and changelogs
SQLFluff / pgFormatter Optional formatting and linting of generated .sql files
DiffEngine Computes field-level diffs and detects breaking schema changes
Governance Rule Engine Ensures compliance with ConnectSoft standards (naming, retention, versioning)

πŸ” 6. Security & PII Tagging Compatibility

System Integration
Microsoft Purview Tag fields with pii, sensitive for catalog sync
Dynamic Data Masking (SQL Server) Emit mask metadata (MASKED WITH FUNCTION)
PostgreSQL RLS Inject CREATE POLICY and ALTER TABLE ENABLE RLS
Field-level encryption hints Optional x-encrypted: true metadata for KMS-aware adapters

πŸ“ˆ 7. Observability + Telemetry Infrastructure

Tool Output
OpenTelemetry Span for each step of schema generation and migration writing
Prometheus (via Observability Agent) Emits metrics: number of fields, retry count, PII coverage
Application Insights Track errors, latency, and field-level validation failures

βœ… Output Delivery Channels

Artifact Destination
schema.sql, migration.sql Artifact Storage API
.cs (FluentMigrator) .NET Service Repositories
DatabaseSchemaPublished Event Grid + CI/CD trigger
Diagrams Developer Portal UI
Observability Map Observability Agent and tracing systems

πŸ“œ System Prompt (Bootstrapping Instruction)

The System Prompt defines the agent’s core instructions, design rules, and execution logic when activated by the ConnectSoft AI Factory.
It ensures consistency, governance compliance, and repeatable schema generation across all services and domains.


βœ… Full System Prompt (Plain Text)

You are the **Data Architect Agent** in the ConnectSoft AI Software Factory.

Your responsibility is to translate domain models, value objects, enums, validation rules,  
and retention/compliance policies into versioned, cloud-ready **data schemas and migrations**.

---

## Your Core Goals:

1. Produce:
   - `schema.sql` (relational DDL)
   - `schema.yaml` (ConnectSoft platform DSL)
   - `migration-*.sql` or `*.cs` (FluentMigrator classes)
   - `er-diagram.mmd` (visual documentation)
   - `field-retention-map.yaml` (compliance tags)
   - `data-schema-observability.yaml` (telemetry)
   - `DatabaseSchemaPublished` (contract event)

2. Map:
   - Aggregates β†’ Tables or Collections
   - Value Objects β†’ Embedded or Normalized fields
   - Enums β†’ Varchar, int, or lookup tables (based on reuse)
   - Field constraints (`required`, `minLength`, etc.) β†’ DB constraints
   - Retention/PII fields β†’ masking/encryption/TTL metadata

3. Ensure:
   - Multitenancy (`tenant_id` field is included and indexed)
   - Traceability (`trace_id`, `correlation_id`, `created_at`)
   - Referential integrity (primary, foreign, unique constraints)
   - Observability span mapping per table operation
   - Compatibility with FluentMigrator or SQL-first CI/CD pipelines

---

## Validation Rules:

- Every table must have a primary key
- Indexes must not overlap or duplicate constraints
- If RLS is enabled, inject `CREATE POLICY`
- If migration strategy = diff, compare `previous-schema.yaml`
- Soft-deletable entities must include `deleted_at`
- Sensitive fields (email, ssn, dob) must be tagged with `x-sensitive: true`
- Enum fields must have value validation (`CHECK (status IN (...))`)
- All schema artifacts must include `trace_id`, `service_name`, `version`

---

## Migration Strategy Modes:

- `fluentmigrator` β†’ emit `.cs` files using `Migration(YyyyMMdd_HHmm)` format
- `sql` β†’ emit timestamped forward-only `.sql` files
- `diff` β†’ compare against previous schema and generate additive diffs
- All strategies must support rollback or changelog tagging (if configured)

---

## Publication:

When schema generation is complete:
- Emit a `DatabaseSchemaPublished` event with artifact links, version, and `trace_id`
- Archive schema bundle for CI/CD and downstream test/migration runners

---

## Output Format and Structure:

- Use snake_case for table and column names
- Prefer `uuid`, `timestampz`, `varchar`, `jsonb` types
- Enforce DDD boundaries β€” no DTOs, only domain-derived structures
- Avoid circular foreign keys unless explicitly declared
- Each field in `schema.yaml` must contain `type`, `required`, and `observability` attributes

---

## Versioning:

- Changes to field type, constraints, or deletion = **breaking** β†’ bump major version
- New nullable fields or optional indexes = **non-breaking** β†’ bump minor version
- Deprecations must be marked and included in changelog

🧠 Summary of Behavior

The system prompt ensures the Data Architect Agent:

  • Builds clean, domain-aligned relational schemas
  • Emits machine- and human-readable contracts
  • Complies with observability, retention, and security standards
  • Publishes results for immediate downstream use in DevOps and governance

πŸ“₯ Input Prompt Template

The Data Architect Agent receives a structured YAML prompt from the ConnectSoft orchestration layer.
This defines:

  • The service context
  • Source artifacts
  • Desired migration strategy
  • Compliance and observability enforcement options
  • Output formatting rules

βœ… Full Input Prompt Example (YAML)

assignment: "generate-schema"

project:
  trace_id: "trace-data-55623"
  project_id: "connectsoft-saas"
  bounded_context: "Billing"
  service_name: "InvoiceService"
  agent_version: "1.3.0"

inputs:
  domain_model_url: "https://artifacts.connectsoft.dev/billing/domain-model.yaml"
  validation_rules_url: "https://artifacts.connectsoft.dev/billing/validation-map.yaml"
  retention_policy_url: "https://artifacts.connectsoft.dev/billing/field-retention-policy.yaml"
  app_architecture_url: "https://artifacts.connectsoft.dev/billing/application-architecture.md"
  glossary_url: "https://artifacts.connectsoft.dev/glossary/billing.yaml"
  previous_schema_url: "https://artifacts.connectsoft.dev/billing/schema-v1.yaml"  # optional

settings:
  migration_strategy: "fluentmigrator"     # options: fluentmigrator, sql, diff
  output_format: ["sql", "yaml", "erd"]
  include_observability_map: true
  publish_contract: true
  generate_down_migrations: false

🧾 Field Descriptions

🧠 project

Field Description
trace_id Correlation trace for full execution lineage
project_id Root-level SaaS/project group
bounded_context Domain boundary this service belongs to
service_name Name of the microservice
agent_version Which version of the Data Architect Agent is executing this spec

πŸ“‚ inputs

Artifact Description
domain_model_url Aggregates, value objects, and enums
validation_rules_url Field-level rules: min/max, format, nullable
retention_policy_url PII, TTL, encryption, audit settings
app_architecture_url Multitenancy, soft-deletion, RLS enforcement
glossary_url Optional β€” aligns naming and reused concepts
previous_schema_url Optional β€” enables diffing, changelog, and migration strategy derivation

βš™οΈ settings

Field Purpose
migration_strategy Determines output: .sql, .cs, or auto-diff
output_format Declares desired formats (can include diagrams, schema DSL)
include_observability_map Emits data-schema-observability.yaml
publish_contract Emits DatabaseSchemaPublished event
generate_down_migrations Whether to emit Down() in FluentMigrator (default false)

🧠 Minimal Example Prompt

assignment: generate-schema
project:
  service_name: ClientService
  trace_id: trace-client-774

inputs:
  domain_model_url: https://.../client/domain-model.yaml
  validation_rules_url: https://.../client/validation.yaml

settings:
  migration_strategy: sql
  output_format: [yaml]

βœ… Validation Rules

Rule Description
Must include trace_id, service_name, and at least domain_model_url βœ…
validation_rules_url and retention_policy_url recommended for compliance coverage βœ…
If migration_strategy = diff, previous_schema_url is required βœ…
If publish_contract: true, all outputs must pass validation before event emission βœ…

πŸ” Compatibility Notes

  • All inputs are trace-linked and version-aware
  • Fully compatible with:
    • Vision Architect Agent
    • Application Architect Agent
    • Domain Modeler Agent
    • Compliance Agent
    • Governance Dashboards
    • DevOps Pipelines

πŸ“€ Output Expectations

The Data Architect Agent produces a complete, traceable, and deployable data definition package.
These outputs are consumed by:

  • 🧱 Adapter Generator Agent (for ORM/entity generation)
  • πŸš€ DevOps Architect Agent (for pipeline execution)
  • πŸ§ͺ Test Generator Agent (for fixture bootstrapping)
  • 🧭 Governance Dashboards (for audit and compliance validation)

All artifacts include traceability, versioning, retention mapping, and observability hooks.


πŸ“¦ Output Artifact Summary

Artifact Format Description
schema.sql SQL Declarative schema DDL for relational DBs
schema.yaml YAML Agent-readable DSL version of schema
migration-*.sql SQL Forward-only or reversible migrations
Migrations/*.cs C# FluentMigrator migrations (imperative)
field-retention-map.yaml YAML Field-level retention/PII classification
data-schema-observability.yaml YAML Field-span mapping, trace injection, audit coverage
er-diagram.mmd Mermaid ER diagram for dev portal visualization
DatabaseSchemaPublished JSON CI/CD-triggered contract event

🧾 Detailed Artifact Descriptions

🟦 1. schema.sql

A canonical DDL export aligned with domain structure and constraints

  • Includes:
    • Table definitions
    • Primary/foreign keys
    • Indexes, CHECK constraints
    • Nullable/required rules
    • Default values
  • Compatible with PostgreSQL and SQL Server dialects

🟨 2. schema.yaml

Abstract schema model for platform-wide reuse

tables:
  - name: invoice
    fields:
      - name: invoice_id
        type: uuid
        primary_key: true
        required: true
      - name: status
        type: varchar(20)
        enum: InvoiceStatus
        pii: false
      - name: tenant_id
        type: uuid
        required: true
        indexed: true
    retention:
      status: 5y
    observability:
      trace_id: true
      created_at: true

πŸŸͺ 3. Migrations

Type Output
SQL migration-20250429-add-table-client.sql
FluentMigrator Migrations/20250429_AddTableClient.cs

Includes Up() (always), Down() (if enabled), with trace annotations


πŸŸ₯ 4. field-retention-map.yaml

Compliance-oriented metadata extracted from policy inputs

invoice.status:
  pii: false
  retention: 7y
  encrypted: false

client.email:
  pii: true
  encrypted: true
  redactable: true

🟩 5. data-schema-observability.yaml

Tells the Observability Agent what to trace at the DB level

tables:
  client:
    spans:
      - insert:
          span_name: db.crm.client.insert
          trace_fields: [trace_id, correlation_id]
      - update:
          span_name: db.crm.client.update

🟫 6. er-diagram.mmd

erDiagram
    invoice {
        UUID invoice_id PK
        UUID client_id FK
        NUMERIC amount
        TIMESTAMP created_at
    }
    client {
        UUID client_id PK
        STRING email
        TIMESTAMP deleted_at
    }
    client ||--o{ invoice : has
Hold "Alt" / "Option" to enable pan & zoom

πŸ“© 7. DatabaseSchemaPublished Event

Field Description
event DatabaseSchemaPublished
service e.g., InvoiceService
schemaVersion e.g., v2.0.1
traceId Correlation ID
artifactUrls DDL, DSL, diagrams, retention, observability
timestamp UTC time of publication

Enables downstream jobs and dashboards to sync and validate schema artifacts


βœ… Output Compliance Criteria

Rule Enforced
All tables have primary_key, created_at, tenant_id βœ…
Sensitive fields include x-sensitive, pii, encrypted tags βœ…
Span templates exist for insert, update, delete ops βœ…
Version bump triggered on any breaking change βœ…
Traceability fields present in all outputs βœ… trace_id, project_id, service_name

🧠 Memory Strategy Overview

The Data Architect Agent uses structured and semantic memory to:

  • Reuse known field structures, enum mappings, and constraints
  • Apply naming, compliance, and index patterns consistently
  • Detect historical schema versions and generate intelligent diffs
  • Prevent unsafe or duplicated migrations
  • Assist in automated versioning and changelog generation

This memory system operates in two scopes:
πŸ• Short-term (session memory) and πŸ“š Long-term (knowledge base).


πŸ• Short-Term Memory

Memory Area Purpose
session.trace_id Tracks lineage of current generation run
field_definitions[] Captures all resolved fields + inferred types/constraints
aggregates[] List of processed aggregates and their mappings
value_object_usage[] Tracks flattened vs referenced VOs
generated_migrations[] Avoids duplication and helps deduplicate FluentMigrator class names
schema_diff_cache Temporarily holds change list between old and new schema for rollback/scoping logic

πŸ“š Long-Term Semantic Memory

1. πŸ“¦ Reusable Schema Fragments

Pattern Description
Money { amount NUMERIC(10,2), currency VARCHAR(3) }
Email { email VARCHAR(255) NOT NULL CHECK (email LIKE '%@%') }
AuditTrailFields created_at, updated_at, deleted_at, created_by, trace_id
Multitenancy tenant_id UUID NOT NULL, indexed, included in all queries

2. πŸ” Compliance & PII Tagging

Trigger Output
Field name matches "email", "ssn", "dob" Auto-tag with x-sensitive, encrypted, audit_required
Field flagged as TTL Inject x-ttl-days and mark for storage pruning or indexing
Service is in a regulated domain (e.g., Insurance, Healthcare) Enforce retention mapping and trace coverage

3. πŸ“‘ Version History Ledger

Tracked Use
schema@v1, schema@v2 Enables diff-based migration generation
field_changes[] Auto-generates changelogs: added, removed, modified
breaking_changes[] Triggers version bump logic and rollback plan (if allowed)
last_seen_field_states[] Prevents downgrades or regressions in schema type evolution

4. 🧱 Migration Pattern Matching

Detection Memory Action
Field type widening (e.g., varchar(50) β†’ varchar(255)) Mark as safe
Enum extension Log, suggest adding CHECK with extended IN clause
Field drop attempt Flag as unsafe unless wrapped in reversible strategy
New VO used Suggest reuse pattern based on prior flatten/reference ratio

5. 🧠 Glossary-Linked Metadata

Concept Used For
"Address" Suggest common VO flattening structure (address_line1, city, zip)
"Status" Enum field with shared value set (Active, Suspended, Pending)
"AuditTrail" Standard field group injection + x-audit-required: true

πŸ§ͺ Memory-Driven Changelog Generation

When previous_schema.yaml is present, memory diff engine emits:

changelog:
  version: v2.0.0
  breaking_changes:
    - field_removed: client.middle_name
    - constraint_changed: invoice.amount now NOT NULL
  additions:
    - table: payment_method
    - field_added: invoice.due_date

🧠 Semantic Memory Match Example

query: "field: invoice.amount"
match:
  reuse_pattern: Money
  type: NUMERIC(10,2)
  constraint: NOT NULL DEFAULT 0.00
  traceability: true
  audit_required: true
  last_used_by: InvoiceService

βœ… Benefits of Memory Strategy

Capability Value
Schema reuse Faster, consistent generation across services
Compliance tagging Reduced errors, governance integration
Safe migration evolution Intelligent diffs and rollback logic
Version diffing Enables changelog and CI/CD auditability
Observability propagation Preserves trace span coverage across evolutions

βœ… Validation & Correction Framework

The Data Architect Agent executes multi-phase schema validation, correction, and recovery logic across all artifacts it generates.
It ensures data designs are:

  • Structurally valid
  • Compliant with domain and validation rules
  • Backward-compatible (when possible)
  • Safe for CI/CD deployment and migration

πŸ” Validation Phases

1️⃣ Schema Structure & Syntax Validation

Target Check
schema.sql Valid syntax per SQL dialect (PostgreSQL, SQL Server)
schema.yaml Valid against ConnectSoft schema DSL schema
migration.sql Executes in dry-run parser or mock DB
fluentmigrator.cs Compiles with no semantic errors
enum definitions Ensure values match usage and constraints

2️⃣ Domain ↔ Storage Consistency Validation

Rule Enforced
Every aggregate maps to one table or collection βœ…
Every required field must be NOT NULL βœ…
Field constraints match validation-map.yaml βœ… (e.g., length, min/max, format)
Enum fields have correct constraint (CHECK or FK) βœ…
Field types are semantically consistent (Email β†’ varchar) βœ…

3️⃣ Migration Safety Validation

Rule Check
Dropping a column triggers warning + audit event ⚠️
Changing nullable β†’ non-null requires default or backfill βœ…
Field renames must include mapping βœ…
Duplicated migrations by name β†’ flagged βœ…
New table already exists β†’ abort or suggest suffixing

4️⃣ Compliance & Observability Validation

Rule Action
Missing trace_id, created_at, tenant_id β†’ inject πŸ”
PII field detected but untagged β†’ tag as x-sensitive πŸ”
No observability span for insert/update/delete β†’ warn and generate defaults πŸ”
Retention policy undefined for ttl fields β†’ assign fallback or escalate ⚠️

πŸ” Retry & Auto-Correction Flow

Skill Error Retry Action
FieldTypeInfererSkill Unknown domain type Fallback to text or jsonb with comment
MigrationSafetyCheckerSkill Unsafe drop Remove drop, inject deprecation comment
PIITaggerSkill Detected untagged field Search glossary or pattern match
EnumStrategySelectorSkill Enum mismatch Convert to varchar(20) or FK lookup table
TraceFieldEnforcerSkill trace_id missing Add field + observability span

πŸ“œ Correction Example

{
  "trace_id": "data-trace-44192",
  "skill": "MigrationSafetyCheckerSkill",
  "retry": 1,
  "issue": "Unsafe type change: amount NUMERIC β†’ INTEGER",
  "resolution": "Suggested add new column 'amount_int' and backfill from 'amount'"
}

πŸ“ˆ Metrics Collected

Metric Description
schema_validation_passed_total Count of successful full schema validations
migration_validation_failed_total Count of rejected or unsafe migration attempts
pii_fields_tagged_total Number of x-sensitive fields detected
observability_span_injected_total Operations auto-covered
retry_count_by_skill Diagnostic metric for agent behavior tuning

🚨 Escalation Conditions

Condition Action
3+ retries in same skill Emit SchemaGenerationWarning
Unsafe column deletion without --SAFE_DROP tag Emit blocking error, abort pipeline
Trace fields missing from multiple tables Block observability publication
Version mismatch with schema.version Warn CI and halt DatabaseSchemaPublished event

βœ… Recovery Guarantee

Most failures are resolved via:

  • Pattern-based correction
  • Semantic fallback injection
  • Memory-guided type resolution
  • Retry with normalized constraint variants

Final output is only published if:

βœ… All artifacts pass validation
βœ… Changelog is emitted (if breaking)
βœ… Observability and compliance coverage β‰₯ 90%


🀝 Collaboration Interfaces

The Data Architect Agent integrates seamlessly with both upstream semantic agents and downstream operational agents to maintain data-design integrity across the ConnectSoft ecosystem.

It consumes domain and application structure from upstream agents and produces schema, migration, and metadata artifacts for others to consume in deployment, validation, and visualization flows.


πŸ”Ό Upstream Inputs (Dependencies)

Agent Artifact
Domain Modeler Agent domain-model.yaml, validation-map.yaml, value-objects.yaml
Application Architect Agent use-cases.yaml, application-architecture.md (multi-tenancy, soft delete, access patterns)
Compliance Architect Agent field-retention-policy.yaml, PII rules
Solution Architect Agent Overall platform schema strategy (shared DB, RLS, etc.)
Glossary Agent (optional) Term-to-field mappings and reusable VO field sets

πŸ”½ Downstream Consumers

Agent Artifacts Consumed
Adapter Generator Agent schema.yaml, schema.sql, field-retention-map.yaml
β†’ Generates ORM models, query layer, value mappers

| DevOps Architect Agent | migration-*.sql, Migrations/*.cs, schema.version, DatabaseSchemaPublished
β†’ Integrates schema deployment into CI/CD pipelines |

| Test Generator Agent | schema.yaml, schema.sql
β†’ Creates fixtures, seed data, integration DB tests |

| Observability Agent | data-schema-observability.yaml
β†’ Generates span mapping and access telemetry rules |

| Compliance Dashboard Agent | field-retention-map.yaml, schema lineage
β†’ Ensures traceable PII/redaction coverage |

| Developer Portal Generator | er-diagram.mmd, schema.yaml, data-relationships.md
β†’ Builds visual API-to-DB relationship diagrams and schema browser |


πŸ“’ Event-Driven Interface

Event Description
DatabaseSchemaPublished Published once all validation and artifact generation passes
β†’ Contains artifact links, trace_id, schemaVersion, service, timestamp
SchemaGenerationWarning (optional) Emitted when unsafe migrations, missing PII tags, or trace fields are detected but auto-corrected

🧠 Inter-Agent Traceability

All generated artifacts include trace metadata:

trace_id: data-trace-000112
agent_version: 1.3.0
service_name: InvoiceService
bounded_context: Billing
schema_version: v2.1.0
generated_on: 2025-05-01T23:15:00Z

βœ… Enables end-to-end tracing across modeling, schema, tests, and gateway configuration


πŸ”„ Collaboration Flow Diagram

flowchart TD
  DomainModeler -->|domain-model.yaml| DataArchitect
  ApplicationArchitect -->|use-cases.yaml| DataArchitect
  ComplianceArchitect -->|retention-policies.yaml| DataArchitect
  DataArchitect --> AdapterGenerator
  DataArchitect --> DevOpsArchitect
  DataArchitect --> TestGenerator
  DataArchitect --> ObservabilityAgent
  DataArchitect --> DeveloperPortal
Hold "Alt" / "Option" to enable pan & zoom

πŸ” Contract Compliance for Consumers

Constraint Applied
Migration consumers must use provided version tag (schema.version) βœ…
ORM codegen must respect nullable and encrypted flags βœ…
PII fields must be consumed only via compliant field mappers βœ…
Tests must use fixture-safe schema versions (schema.yaml@vN) βœ…
Observability spans must align with table/action trace templates βœ…

βœ… Outcomes of Agent Collaboration

Capability Benefit
Unified source of truth for DB schema βœ… Every agent reads from schema.yaml
Safe migration rollout via CI/CD βœ… DevOps Agent applies only validated .sql / .cs
Traceable PII and retention policies βœ… Compliance & Security Dashboards stay aligned
Full-stack visibility βœ… API calls trace down to fields and rows in DB

πŸ“‘ Observability Hooks

The Data Architect Agent is fully observable and auditable via:

  • OpenTelemetry spans emitted during each skill's execution
  • Lifecycle events for schema generation and migration publication
  • Semantic trace metadata embedded in all output files

These ensure complete transparency into what was generated, why, and how it evolved.


πŸ“ˆ Key Telemetry Metrics

Metric Description
schema_generation_duration_ms Total time to parse, generate, and validate all artifacts
pii_fields_tagged_total Number of fields identified as sensitive/PII
migration_retry_count_total Count of fallback/correction attempts during migration creation
observability_field_injection_rate % of tables with required span fields (trace_id, created_at, etc.)
breaking_change_detection_count How often the agent identifies and manages schema-breaking diffs

🧠 Span Emission Model

Each skill emits a span:

{
  "span_name": "SchemaBuilderSkill",
  "trace_id": "data-trace-11722",
  "duration_ms": 42,
  "agent_version": "1.3.0",
  "service_name": "InvoiceService",
  "status": "success"
}

These spans are collected by the Observability Agent and visible in ConnectSoft’s trace dashboards.


🧍 Human Oversight Triggers

Although autonomous, the agent flags and escalates edge cases such as:

Trigger Escalation
PII fields without tagging β†’ 3+ retries SchemaGenerationWarning emitted
Unsafe DROP COLUMN without rollback support Migration halted, escalated to Governance Officer
Traceability fields (trace_id, tenant_id) missing on multiple entities Event suppressed until corrected
Schema version mismatch or illegal downgrade Deployment blocked, audit note added

πŸ” Compliance and Governance Integration

  • All output files are tagged with:
    • agent_version, trace_id, schema_version, bounded_context, service_name
  • field-retention-map.yaml and data-schema-observability.yaml are synced with:
    • Compliance Dashboards
    • Data Loss Prevention Auditors
    • Governance Agents

πŸ“’ Lifecycle Event: DatabaseSchemaPublished

Field Description
event DatabaseSchemaPublished
schemaVersion Auto-computed via semver diff engine
artifactUrls Links to schema, migrations, diagram, retention map
generated_by Data Architect Agent v1.3.0
trace_id Traceable execution lineage
timestamp UTC timestamp

βœ… Enables downstream CI/CD pipelines, test agents, and artifact promotion workflows


πŸ“˜ Audit Trail Support

Each generated migration or schema file includes:

metadata:
  trace_id: "data-trace-11233"
  schema_version: "v2.1.0"
  service: "InvoiceService"
  generated_on: "2025-05-01T23:42:00Z"
  generated_by: "Data Architect Agent v1.3.0"

βœ… Enables long-term schema evolution lineage and rollback validation


βœ… Final Outcomes

The Data Architect Agent enables ConnectSoft to:

  • Operate with governed, reusable, traceable schema designs
  • Automate migration generation with confidence and auditability
  • Align field-level data design with domain, compliance, and observability
  • Support multi-agent cooperation across platform, infrastructure, and business domains

πŸ“¦ Output Summary

Artifact Purpose
schema.sql, schema.yaml Declarative schema
migration.sql, .cs Forward-compatible evolutions
field-retention-map.yaml PII, TTL, compliance tracking
observability.yaml Span coverage and trace field injection
er-diagram.mmd Visual schema overview
DatabaseSchemaPublished Event trigger for all downstream agents