🧠 Data Architect Agent Specification¶

🎯 Purpose¶

The Data Architect Agent is responsible for designing, evolving, and governing the data storage layer of every service in the ConnectSoft AI Software Factory — while ensuring:

Scalability (partitioned & cloud-native storage)
Integrity (strong constraints, referential safety)
Compliance (PII classification, retention rules, auditability)
Extensibility (schema versioning, event rehydration, projections)
Zero-downtime evolution (migration automation, backward compatibility)

It produces platform-ready, validated, and observable database blueprints and migration artifacts, aligned with Clean Architecture, DDD, and ConnectSoft's security-first and observability-first principles.

💡 Why This Agent Exists¶

Without a dedicated Data Architect Agent:

Tables and schemas drift from the domain model
Indexes, constraints, and nullability rules are inconsistently applied
Data becomes fragmented across incompatible formats
Migrations are unsafe, untraceable, and break CI/CD flows
No unified view of multitenancy, soft-deletion, or retention timelines
Event-sourced services lack structured projections for querying

With this agent:

✅ Schemas are derived directly from the domain model
✅ Migrations are tracked and environment-specific
✅ Data is partitioned and scoped per tenant
✅ PII fields and retention rules are codified
✅ All artifacts are versioned, diffable, and CI/CD-integrated

🧱 Design Scope¶

✅ DDL generation (SQL, YAML, FluentMigrator)
✅ Migrations (create, alter, drop, index, rename, seed)
✅ NoSQL collections & document design (MongoDB, Cosmos)
✅ Projection modeling for event-driven views
✅ Multitenant partitioning rules
✅ TTL-based storage (for logs, sessions, events)
✅ Data retention & redaction policy tagging
✅ Observability: schema span emissions, table access tracing
✅ Migration versioning via FluentMigrator or SQL tags

🔁 Multi-Migration Strategy Support¶

The agent supports three primary types of migrations per service:

Migration Type	Format	Purpose
Declarative	`schema.sql`, `schema.yaml`	For schema-first services; output = full source of truth
Imperative	`.cs` using FluentMigrator*	For .NET-based microservices using versioned migrations
Composable	`migration-*.sql` scripts	For raw SQL runners, cross-dialect compatibility, rollback-friendly

✅ The agent can emit all three types depending on settings.migration_strategy in the input prompt.

🗂️ Example Output Bundle¶

/schema
  schema.yaml
  schema.sql
  migration-20250422-initialize-db.sql
  migration-20250423-add-index-on-email.sql
  fluent/20250423_AddAuditColumns.cs
/metadata
  field-retention-map.yaml
  er-diagram.mmd
  schema-observability.yaml
DatabaseSchemaPublished.json

🧠 Role in the Agentic Lifecycle¶

flowchart TD
    DomainModeler --> DataArchitect
    ApplicationArchitect --> DataArchitect
    DataArchitect --> MigrationEngine
    DataArchitect --> TestGenerator
    DataArchitect --> ObservabilityAgent
    DataArchitect --> DevOpsAgent

Hold "Alt" / "Option" to enable pan & zoom

🧭 Summary¶

Feature	Value
Domain-aligned schemas	✅ DDD mapping to storage
Migration strategy	✅ FluentMigrator, SQL, or YAML
Audit & retention awareness	✅ PII tagging, TTL injection
Multitenant design	✅ Partition, filter, and access rules per tenant
CI/CD readiness	✅ Outputs versioned, published, and validated
Observability	✅ Tables and fields mapped to telemetry templates

🏛️ Scope of Influence¶

The Data Architect Agent governs all aspects of a service’s persistent storage architecture, ensuring every decision is:

Domain-aligned
Cloud-native
Multi-tenant ready
Compliant
Migration-safe

It acts as the authoritative designer of relational, NoSQL, event-sourced, and cache-backed storage layers for microservices, modules, and shared infrastructure components.

📘 Logical Scope¶

Area	Included
✅ Table/collection design	`CREATE TABLE` / `collection.definition`
✅ Field types, nullability, defaults	e.g., `amount NUMERIC(10,2) NOT NULL DEFAULT 0.0`
✅ Indexes, keys, constraints	Primary, unique, foreign keys, composite indexes
✅ Value object flattening	Embed VOs inside parent entities (e.g., Address, Money)
✅ Enum mapping	Stored as integer or string (configurable per enum)
✅ Tenant scoping	Partition key, row filters, access policies
✅ History/versioning fields	CreatedAt, UpdatedAt, DeletedAt, Version, etc.
✅ Observability tagging	Span templates for table operations, audit log hooks
✅ TTL & retention	Policies for deletion, redaction, expiration

🧱 Physical Scope¶

Storage Type	Responsibilities
Relational DB (Postgres, SQL Server)	Normalized schema, constraints, indexes, foreign keys
NoSQL DB (MongoDB, Cosmos DB)	Document design, nested field constraints, TTL indexing
Event Store / Projection DB	Denormalized tables/views from event replay
Blob Storage	Data classification (e.g., `x-sensitive`, file naming, metadata tagging)
Cache/Read Replicas	Declare what fields can be projected or cached
Embedded DB (LiteDB, SQLite for mobile/edge)	Strip down schema for constrained environments

🔄 Temporal Scope¶

Aspect	Controlled
Schema versioning	`schema.yaml.version`, `migrationId`, `fluentMigrator version`
Historical data rules	Audit logs, soft-deletion, row snapshots
Retention policies	Time-based (e.g., 30d logs), legal holds
TTL logic	Auto-purge of sessions, messages, etc.

🔐 Compliance and Security Scope¶

Feature	Role
PII classification	Tags fields in `field-retention-map.yaml`
Redactable fields	Emits policy flags (`x-redactable: true`)
Encrypted at rest	Identifies fields requiring AES/GCM or external KMS
Row-level security (Postgres)	Maps tenant access conditions to RLS rules
Data masking	Emits masking metadata for tools like SQL Server Dynamic Data Masking or Cosmos DB Confidential LSM

🚫 Out of Scope¶

Not Owned	Delegated To
Migration execution (runtime)	✅ DevOps Architect Agent
ORM entity generation (e.g., NHibernate)	✅ Adapter Generator Agent
Data analytics or BI schema	❌ (handled by future Analytics Architect Agent)
Application-layer queries (LINQ, etc.)	✅ Developer responsibility
Full-text search design	✅ Search Architect Agent (future role)

📦 Output Influence Summary¶

Artifact	Owned
`schema.sql`	✅
`schema.yaml`	✅
`migration-*.sql`	✅
`migration/*.cs` (FluentMigrator)	✅
`er-diagram.mmd`	✅
`field-retention-map.yaml`	✅
`DatabaseSchemaPublished.json`	✅
Runtime index management	❌ (influences DevOps but not runtime schema sync)

🧭 ConnectSoft Scope Matrix¶

Layer	Data Architect Agent
Domain	Reads (VOs, Entities, Enums)
Application	Reads use case flows for read/write frequency
Infrastructure	Owns physical schema + migrations
Observability	Emits table-span templates
Security	Flags PII, tenant, RLS, retention fields
Governance	Outputs versioning + changelogs

📋 Core Responsibilities¶

The Data Architect Agent owns the definition and evolution of persistent storage blueprints
for ConnectSoft microservices, ensuring they are:

Domain-driven
Observable
Secure
Versioned
Composable

It produces schemas, migrations, diagrams, and compliance maps ready for CI/CD, deployment, and governance pipelines.

📘 1. Schema Design Responsibilities¶

Task	Output
Translate domain aggregates and value objects into tables/collections	`schema.yaml`, `schema.sql`, or `schema.mongo.json`
Flatten embedded value objects intelligently	E.g., `Address` → `address_line1`, `city`, `zip`
Handle enums as `int`, `string`, or lookup table (based on config)	`enum-strategy.yaml`
Define primary keys, foreign keys, unique indexes, nullable/required flags	`schema.sql`, `migration-*.sql`
Generate observability fields	`created_at`, `updated_at`, `trace_id`, `version`, `tenant_id`

🔁 2. Migration and Evolution Responsibilities¶

Task	Output
Generate forward-only SQL migrations	`migration-<timestamp>-*.sql`
Generate FluentMigrator C# classes	`Migrations/20250501_AddFieldToClient.cs`
Track schema version and emit diffs	`schema.version`, `migration.log`
Suggest rollback plan or up/down separation	Optional `down.sql` or `Revert...` migration
Handle safe field changes: `nullable` → `non-null`, type widening, enum expansion	Auto-patch and document

🧪 3. Validation and Consistency Enforcement¶

Task	Validated
Naming conventions	snake_case for SQL, camelCase for NoSQL
Type consistency with domain model	Match DTOs and VOs (or explain mismatch)
Field constraints	From `validation-map.yaml`: `minLength`, `max`, `enum`, etc.
Nullability rules	Optionality mapped to nullable column/type
Index redundancy	Avoid overlapping composite/unique/index sets
RLS & tenant fields	Ensure `tenant_id` included where required

📑 4. Retention, Compliance, and Security¶

Task	Output
PII/retention annotation	`field-retention-map.yaml`
Masking/redaction strategy	`x-redactable`, `x-sensitive` tags
TTL declaration for short-lived entities	`ttl-policy.yaml` or inline schema flags
Data encryption flagging	Fields marked `x-encrypted: true`
Compliance scope linking	Maps to GDPR/SOC2 classification tags

🧭 5. Observability and Metrics Integration¶

Task	Output
Emit schema-span templates	`schema-observability.yaml`
Mark fields as observable (e.g., `invoice.total`, `client.status`)	Used by Observability Agent
Inject audit trail structure	`created_by`, `modified_by`, `trace_id`, `correlation_id`
Emit `DatabaseSchemaPublished` event with `trace_id`, schema version, and changelog URL

📄 6. Documentation and Diagrams¶

Task	Output
Generate ER diagram from schema	`er-diagram.mmd`
Describe relationships in Markdown	`data-relationships.md`
Auto-link schema objects to glossary terms	Enables rich developer portal linking

✅ Summary of Responsibilities¶

Area	Description
🧱 Physical schema design	✅ Tables, fields, keys, indexes
🧩 Domain-aligned translation	✅ Aggregates, VOs, enums → storage
📦 Migration authoring	✅ FluentMigrator, SQL, DDL diffs
🔐 Retention and compliance tagging	✅ PII, TTL, masking, audit trail
📈 Observability tagging	✅ Span templates, access fields
📊 Documentation and traceability	✅ ERD, Markdown, lifecycle event

📥 Core Inputs¶

The Data Architect Agent consumes structured, traceable, and semantically meaningful artifacts
from upstream agents to model and evolve the persistent storage layer.

These inputs define:

The entities, value objects, and enums to persist
The field-level constraints and validations
The retention, PII, and security policies
The observability and multitenancy boundaries
Any previously generated schema snapshots for diffing and migration generation

📘 Required Input Artifacts¶

Artifact	Source Agent	Purpose
`domain-model.yaml`	Domain Modeler Agent	Defines aggregates, value objects, identifiers
`validation-map.yaml`	Domain Modeler Agent	Field-level constraints (e.g., not null, length, enum)
`use-cases.yaml`	Application Architect Agent	Determines access patterns (write-heavy, read projections)
`field-retention-policy.yaml`	Compliance Architect Agent	Flags fields as `pii`, `ttl`, `audit_required`, `redactable`
`application-architecture.md`	Application Architect Agent	Maps services to tenancy rules and cross-entity flows
`previous-schema.yaml` (optional)	CI/CD or prior agent run	Used for diffing and migration detection
`trace_id`, `bounded_context`, `service_name`	Orchestrator	Enables traceability and governance audit linkage

📄 Input Examples¶

🔶 `domain-model.yaml`¶

aggregates:
  - name: Client
    fields:
      - name: clientId
        type: Guid
        required: true
      - name: email
        type: Email
      - name: address
        type: Address
      - name: status
        type: ClientStatus
    value_objects:
      - Address
      - Email
    enums:
      - ClientStatus: [Active, Inactive, Suspended]

🔶 `validation-map.yaml`¶

Client:
  email:
    format: email
    maxLength: 255
  address.zip:
    minLength: 5
    maxLength: 10
  status:
    enum: [Active, Inactive, Suspended]

🔶 `field-retention-policy.yaml`¶

fields:
  Client.email:
    pii: true
    redactable: true
    encrypted: true
  Client.status:
    retention: 5y
  Client.createdAt:
    audit_required: true

🔶 `application-architecture.md` (Excerpt)¶

This service is multi-tenant. All persistent objects must include a tenant_id column.

The Client aggregate supports soft deletion (i.e., deleted_at should be modeled).

The Address VO is reused across four services — prefer embedded rather than normalized form.

⚠️ Input Validation Rules¶

Rule	Enforced
`domain-model.yaml` and `validation-map.yaml` must be present	✅
Each field must resolve to a scalar or embeddable VO	✅
Enum values must be matched to definition (for type safety)	✅
If multi-tenancy is enabled, `tenant_id` must be generated and included	✅
Previous schema required if `migration_strategy = diff`	✅
Traceability fields must be resolved	✅ (`trace_id`, `service_name`, `bounded_context`)

🧠 Semantic Memory Integration (Optional)¶

Query	Result
`"Invoice.CreatedAt"`	Suggest `timestamp with time zone`, index on `created_at`, add to audit trail
`"Status"` enum reuse	Suggest shared enum or foreign key to `status_types` table
`"Address"` VO reuse	Suggest embedded pattern with naming template `address_city`, `address_zip`, etc.

📦 Input Prompt Template (YAML)¶

assignment: "generate-schema"
project:
  service_name: ClientService
  trace_id: trace-data-112938
  bounded_context: CRM

inputs:
  domain_model_url: https://.../domain-model.yaml
  validation_rules_url: https://.../validation-map.yaml
  retention_policy_url: https://.../field-retention-policy.yaml
  app_architecture_url: https://.../application-architecture.md
  previous_schema_url: https://.../schema-v1.yaml

settings:
  migration_strategy: fluentmigrator
  output_format: [sql, yaml, erd]

📤 Core Outputs¶

The Data Architect Agent produces a complete, versioned, and traceable data schema bundle,
ready for:

CI/CD deployment
FluentMigrator-based evolution
Event-sourced projection building
Observability, compliance, and retention auditing
Developer portal documentation

📦 Primary Output Artifacts¶

Artifact	Format	Purpose
`schema.sql`	SQL DDL	Declarative schema for relational DBs
`schema.yaml`	YAML DSL	Platform-native representation of entity/table structure
`migration-*.sql`	SQL	Forward-only or reversible migrations
`Migrations/*.cs`	C# FluentMigrator classes	Versioned imperative migrations
`field-retention-map.yaml`	YAML	Field-level metadata: PII, TTL, encryption
`er-diagram.mmd`	Mermaid	Entity-relationship diagram
`data-schema-observability.yaml`	YAML	Span templates for DB-level tracing
`DatabaseSchemaPublished`	JSON	Event that notifies schema is ready for use

📘 1. `schema.sql`¶

CREATE TABLE client (
  client_id UUID PRIMARY KEY,
  email VARCHAR(255) NOT NULL,
  status VARCHAR(20) NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now(),
  tenant_id UUID NOT NULL,
  trace_id UUID,
  deleted_at TIMESTAMPTZ
);

✅ Used in schema-first environments or as a reference baseline.

📘 2. `schema.yaml`¶

tables:
  - name: client
    fields:
      - name: client_id
        type: uuid
        primary_key: true
      - name: email
        type: varchar(255)
        required: true
        pii: true
        encrypted: true
      - name: status
        type: varchar(20)
        enum: ClientStatus
      - name: tenant_id
        type: uuid
        partition_key: true
      - name: deleted_at
        type: timestamp
        soft_delete: true
    observability:
      traceable: true
      span_name: db.crm.client.insert

📦 3. Migrations¶

🔹 `migration-<timestamp>-add-column.sql`¶

ALTER TABLE client ADD COLUMN marketing_opt_in BOOLEAN DEFAULT false;

🔹 `Migrations/20250428_AddMarketingOptIn.cs`¶

[Migration(2025042801)]
public class AddMarketingOptIn : Migration
{
    public override void Up() =>
        Alter.Table("client").AddColumn("marketing_opt_in").AsBoolean().WithDefaultValue(false);

    public override void Down() =>
        Delete.Column("marketing_opt_in").FromTable("client");
}

✅ Used by FluentMigrator runners or SQL CI scripts
✅ Tags each migration with trace_id, service_name, version

📑 4. `field-retention-map.yaml`¶

client:
  email:
    pii: true
    redactable: true
    encrypted: true
  status:
    retention: 5y
  created_at:
    audit_required: true
  marketing_opt_in:
    retention: 1y

🗺️ 5. `er-diagram.mmd`¶

erDiagram
    client {
        UUID client_id PK
        STRING email
        STRING status
        UUID tenant_id
        TIMESTAMP created_at
        TIMESTAMP deleted_at
    }

    invoice {
        UUID invoice_id PK
        UUID client_id FK
        NUMERIC amount
    }

    client ||--o{ invoice : has

Hold "Alt" / "Option" to enable pan & zoom

✅ Used for developer portal embedding and visual CI diffs

📈 6. `data-schema-observability.yaml`¶

client:
  operations:
    - insert:
        span_name: db.crm.client.insert
        trace_id_field: trace_id
    - delete:
        soft_delete: true
        span_name: db.crm.client.delete
        retention_policy: 7y

✅ Enables Observability Agent to link spans to table operations

📢 7. `DatabaseSchemaPublished` (Event)¶

{
  "event": "DatabaseSchemaPublished",
  "service": "ClientService",
  "boundedContext": "CRM",
  "schemaVersion": "v2.0",
  "traceId": "trace-data-112938",
  "artifactUrls": {
    "ddl": "https://schemas.connectsoft.dev/crm/schema-v2.sql",
    "migrations": "https://schemas.connectsoft.dev/crm/migrations/",
    "observability": "https://schemas.connectsoft.dev/crm/data-schema-observability.yaml"
  },
  "timestamp": "2025-05-01T23:00:00Z"
}

✅ Consumed by CI, Test Generator, DevOps Agent, Dev Portal

✅ Output Checklist¶

Output	Required
Declarative schema (`schema.sql`, `schema.yaml`)	✅
Migration strategy (SQL or FluentMigrator)	✅
Retention/compliance metadata	✅
Observability fields + spans	✅
Event emission	✅ with links and trace_id
Diagram output (optional but default)	✅ Mermaid preferred

📚 Knowledge Base Overview¶

The Data Architect Agent relies on a curated knowledge base of:

Schema design best practices
Domain → storage translation patterns
Column naming conventions
Migration strategies
Retention policies
Polyglot persistence heuristics

This enables the agent to make domain-informed, platform-aligned decisions while producing predictable, reusable data designs.

🧱 1. Naming Conventions & Translation Rules¶

Pattern	Output
`Address` VO → flatten	`address_line1`, `address_city`, `address_zip`
Enums stored as `string`	Column type → `varchar(20)`, constraint on values
Timestamps	Always use `TIMESTAMPTZ` with `DEFAULT now()`
Boolean fields	Use `is_*` or `_enabled` suffix: `is_active`, `email_verified`
UUIDs	Use `uuid` type, primary keys are `snake_case_id` (e.g., `client_id`)
Soft deletion	Field = `deleted_at TIMESTAMPTZ NULL`, queried via RLS or adapter policy
Multi-tenancy	Field = `tenant_id UUID NOT NULL`, partitioning enabled

📦 2. Schema Normalization & Denormalization Rules¶

Pattern	Applied When
Flatten VOs	If reused only once or shallow (e.g., `Address`)
Reference VOs	If reused across >3 aggregates or contains sub-VOs
Lookup tables for enums	If enum is user-extensible or over 10 values
Denormalized projections	For CQRS read models and Materialized Views
Array-of-VOs	Use JSONB or link-table (depending on access/read pattern)

🔁 3. Migration Strategy Heuristics¶

Migration Type	Rule
SQL Script	Preferred when schema is stable and cross-language
FluentMigrator	Required for .NET-first services using NHibernate or Dapper
Diff-based generation	Triggered when `previous-schema.yaml` is provided
Rollback support	Enabled only if `down.sql` or `Revert*.cs` migration is explicitly requested
Data seeding	Separate class or SQL script if `requires_seed_data: true` in schema.yaml

🔐 4. Retention & PII Enforcement Patterns¶

Field Type	Tag
`email`, `phone`, `ssn`, `dob`	`x-sensitive: true`, `encrypted: true`
`user_id`, `device_id`, `ip_address`	`audit_required: true`
`session`, `token`, `log` tables	`ttl: 30d` or custom expiration index
`tenant_id`	Always partition key and indexed
`deleted_at`	Required for soft-deletable entities

🧠 5. Semantic Memory — Schema Reuse Suggestions¶

Input	Suggested Output
`"Client aggregate with Address VO"`	Flatten `Address` inline unless reused by 3+ other aggregates
`"Status = [Active, Suspended, Inactive]"`	Enum → `status VARCHAR(20) NOT NULL` + constraint check
`"Invoice.Amount"`	Suggest `NUMERIC(10,2)` with `NOT NULL DEFAULT 0.00`
`"CreateAt"`	Suggest `TIMESTAMPTZ DEFAULT now()` and include in audit metadata
`"EventStore"`	Suggest table with `stream_id`, `sequence_number`, `event_type`, `payload JSONB`

📈 6. Optimization Patterns¶

Pattern	Used For
Composite Index (`email`, `status`)	Email-based filtering or dashboard search
Partial Index (`deleted_at IS NULL`)	Faster active-only queries
Covering Index (`client_id`, `created_at`, `amount`)	Optimized dashboards
FTS (Full-text search)	Only for designated `searchable_fields` in schema.yaml
GIN Index on JSONB	Used for polymorphic settings or tags in `event_metadata` column

🧪 7. Validation-Aware Templates¶

Validation	Injected Constraint
`minLength: 5`	`VARCHAR(255) CHECK (LENGTH(field) >= 5)`
`enum`	`CHECK (status IN ('Active', 'Suspended'))`
`range: 0-100`	`CHECK (score BETWEEN 0 AND 100)`
`pattern: email`	Optional `CHECK` or rely on app-level + observability tags

🔁 Changelog Awareness¶

Change Detected	Action
Field added → not required	Minor migration with `NULL` default
Field added → required	Migration with backfill or default
Field removed	Flag as breaking; require schema bump
Type changed	Flag as breaking; recommend `DROP ADD` or shadow copy
Constraint tightened	Suggest `ALTER ... CHECK` + validation audit

🔄 Process Flow Overview¶

The Data Architect Agent executes a structured, deterministic flow to translate domain models, validation rules, and compliance metadata into:

Schema files (.sql, .yaml)
Migrations (.sql, FluentMigrator .cs)
Diagrams (.mmd)
Metadata maps for observability and retention

The agent supports multiple backends (PostgreSQL, Cosmos, MongoDB, etc.) and migration styles based on input prompt strategy.

📋 Step-by-Step Lifecycle¶

Phase	Description

1️⃣ Input Parsing and Semantic Resolution¶

Load and validate:
- domain-model.yaml
- validation-map.yaml
- retention-policy.yaml
- application-architecture.md
- (Optional) previous-schema.yaml for diff-based migrations
Identify:
- Aggregates, VOs, Enums
- Table/entity boundaries
- Multitenant scoping rules
- Compliance and observability annotations

2️⃣ Table & Field Mapping¶

Map each aggregate → table
Flatten value objects into fields (unless reused widely)
Enum handling strategy (varchar, int, lookup table)
Resolve field types and nullability
Add tenant_id, trace_id, and created_at if required

3️⃣ Index, Constraint, and Retention Policy Generation¶

Generate:
- Primary and foreign keys
- Unique constraints
- Indexes (single, composite, partial)
Inject field constraints from validation-map.yaml:
- minLength, enum, CHECK(...), default
Tag PII and TTL fields:
- x-sensitive: true
- ttl: 30d
- encrypted: true
- audit_required: true

4️⃣ Migration Strategy Resolution¶

If migration_strategy = diff:
- Load previous-schema.yaml
- Generate migration script or class for:
  - Added/dropped/renamed fields
  - Modified constraints or types
If fluentmigrator:
- Create Migration/YYYYMMDD_HHMM_AddField.cs
If sql:
- Generate migration-<timestamp>-add-field.sql

🔄 Retry if type mismatch or unsafe drop detected (e.g., backfill or copy strategy required)

5️⃣ Schema File and DSL Output¶

Emit:
- schema.sql — full DDL snapshot
- schema.yaml — normalized data blueprint
- schema.version (for governance and trace)
- field-retention-map.yaml — used by compliance and governance agents

6️⃣ Observability Injection¶

Auto-inject:
- trace_id, correlation_id fields
- created_by, updated_by, deleted_by
- Span templates per operation: insert, update, delete
- Output: data-schema-observability.yaml

7️⃣ Documentation and Diagram Generation¶

Generate:
- er-diagram.mmd — Mermaid-based ERD
- data-relationships.md — Plaintext description of keys and joins
- Optional: export to PlantUML or SVG if output_format requires

8️⃣ Schema Publication and Event Emission¶

Emit:
- DatabaseSchemaPublished event
  → contains: service, version, trace_id, artifact URLs
Archive:
- Generated artifacts under /schema/v{X.Y.Z}/

📈 Telemetry & Span Injection¶

Span	Description
`schema_generation.duration_ms`	Total schema file generation time
`migration_generation.retry_count`	Retry count on invalid field evolution
`observability_injection.status`	Pass/fail on trace field mapping
`compliance_annotation.coverage_percent`	Percent of fields tagged as PII or retention-bound
`schema_diff.breaking_change_detected`	Boolean flag for version escalator

🧠 High-Level Flow¶

flowchart TD
    A[Parse Inputs] --> B[Map Aggregates to Tables]
    B --> C[Inject Validation + Constraints]
    C --> D[Generate Migrations]
    D --> E[Emit schema.sql / schema.yaml]
    E --> F[Inject Observability + PII]
    F --> G[Generate Diagram + Metadata]
    G --> H[Publish Event + Archive]

Hold "Alt" / "Option" to enable pan & zoom

🛠️ Skills and Kernel Functions¶

The Data Architect Agent is implemented as a modular, skill-driven pipeline using Semantic Kernel (.NET).
Each skill is independently observable, retry-safe, and designed to operate over domain-aligned inputs with structured traceability.

📋 Core Skills Used¶

🧩 1. Input Inference and Normalization¶

Skill	Purpose
`DomainModelParserSkill`	Parses `domain-model.yaml`, identifies aggregates, VOs, enums
`ValidationMapNormalizerSkill`	Loads and restructures field-level constraints
`RetentionPolicyMapperSkill`	Maps compliance metadata (e.g., `pii`, `ttl`) to fields
`TenantScoperSkill`	Ensures `tenant_id` is injected where needed

🧱 2. Schema and Field Modeling¶

Skill	Purpose
`TableBuilderSkill`	Converts aggregates → tables, maps primary keys, VOs
`FieldTypeInfererSkill`	Resolves types (`uuid`, `varchar`, `jsonb`, `timestampz`) from domain model
`VOFlattenerSkill`	Embeds VOs inline unless explicitly configured as reference
`EnumStrategySelectorSkill`	Determines how enums are stored (`int`, `varchar`, or lookup FK)
`ConstraintInjectorSkill`	Applies `CHECK`, `NOT NULL`, `DEFAULT`, and other column constraints

🔁 3. Migration Strategy and Evolution¶

Skill	Purpose
`SchemaDiffDetectorSkill`	Diffs current schema vs. `previous-schema.yaml` to detect changes
`SqlMigrationGeneratorSkill`	Emits `ALTER TABLE`, `ADD COLUMN`, etc. as `.sql` files
`FluentMigratorGeneratorSkill`	Emits C# migration class templates (`Up`, `Down`) with version tags
`MigrationSafetyCheckerSkill`	Flags unsafe ops (e.g., dropping columns, type narrowing) and injects fallback

🔒 4. Compliance and Retention Enforcement¶

Skill	Purpose
`PIITaggerSkill`	Tags fields with `x-sensitive`, `encrypted`, `redactable`
`RetentionPolicyPlannerSkill`	Generates field TTLs, table-level expirations
`RLSPolicyInjectorSkill`	Emits `CREATE POLICY` for Postgres-based RLS enforcement (if enabled)
`MaskingStrategySkill`	Suggests dynamic masking for SQL Server / Cosmos DB use cases

📈 5. Observability Injection¶

Skill	Purpose
`TraceFieldEnforcerSkill`	Ensures `trace_id`, `correlation_id`, `created_by` fields exist
`AuditColumnInjectorSkill`	Adds `created_at`, `updated_at`, `deleted_at`, `version` fields
`SpanTemplateEmitterSkill`	Outputs `data-schema-observability.yaml` with span templates
`TelemetryTaggerSkill`	Annotates key business fields (e.g., `invoice.amount`) for tracing agents

🖼️ 6. Documentation and Visualization¶

Skill	Purpose
`ERDiagramGeneratorSkill`	Generates Mermaid ER diagram (`er-diagram.mmd`)
`RelationshipMarkdownWriterSkill`	Produces `data-relationships.md` for DevPortal
`GlossaryLinkerSkill`	Cross-links fields to known terms in glossary.yaml (optional)

📤 7. Publishing and Events¶

Skill	Purpose
`SchemaFileWriterSkill`	Outputs `schema.sql`, `schema.yaml`, `field-retention-map.yaml`
`MigrationFileWriterSkill`	Emits `.sql` or `.cs` files based on strategy
`DatabaseSchemaPublishedEmitterSkill`	Emits contract event with trace_id and artifact URLs
`SchemaChangelogEmitterSkill` (optional)	Outputs version diffs if enabled via settings

🧠 Skill Composition Flow (Simplified)¶

flowchart TD
    A[DomainModelParserSkill] --> B[TableBuilderSkill]
    B --> C[ConstraintInjectorSkill]
    C --> D[MigrationStrategySkill]
    D --> E[Observability & Retention Injection]
    E --> F[SchemaFileWriterSkill]
    F --> G[DatabaseSchemaPublishedEmitterSkill]

Hold "Alt" / "Option" to enable pan & zoom

🔁 Retry-Safe Behaviors¶

Skill	Retry Trigger	Resolution
`FieldTypeInfererSkill`	Unknown or ambiguous type	Fallback to `text`, flag warning
`MigrationSafetyCheckerSkill`	Unsafe change detected	Suggest `DROP/ADD` pair with backfill
`PIITaggerSkill`	Unmapped sensitive field	Suggest tag from memory or glossary
`FluentMigratorGeneratorSkill`	Name conflict in class	Auto-rename or suffix with timestamp

📈 Metrics Emitted (Per Skill)¶

Metric	Example
`skill_execution_duration_ms{name}`	`ConstraintInjectorSkill: 24ms`
`retry_count{name}`	`FieldTypeInfererSkill: 1 retry`
`sensitive_fields_detected_total`	`email`, `phone`, `ssn`
`migration_artifact_emitted_total`	`3 .sql, 1 .cs`
`er_diagram_generated: true`	Mermaid flow verified

🛠️ Technologies Used¶

The Data Architect Agent leverages a robust, cloud-native, and polyglot-compatible toolchain to generate, validate, and publish data schemas and migrations across multiple database engines and migration systems — fully aligned with ConnectSoft’s automation-first strategy.

🔷 1. Core Platforms & Frameworks¶

Technology	Purpose
Semantic Kernel (.NET)	Skill orchestration, input coordination, retry logic
FluentMigrator (.NET)	C#-based declarative migration generation (`Up()`/`Down()` methods)
PostgreSQL	Default SQL target for multi-tenant SaaS environments
SQL Server	Optional support for legacy and enterprise integrations
Mermaid.js	Visual generation of ER diagrams from schema structure
OpenTelemetry SDK	Trace/span emission per skill and contract generation step

📜 2. Schema Output Formats¶

Format	Purpose
`schema.sql`	Declarative PostgreSQL/SQL Server DDL
`schema.yaml`	Platform-native schema description used across agents
`migration-*.sql`	Forward-only or versioned migrations
`*.cs` (FluentMigrator)	.NET-first imperative migrations
`er-diagram.mmd`	Graphical entity relationships (for DevPortal + docs)
`field-retention-map.yaml`	Compliance metadata for security/governance agents
`data-schema-observability.yaml`	Spans and trace fields for runtime agents

🔁 3. Migration Strategies Supported¶

Strategy	Tools	Description
Declarative DDL	`schema.sql`, `schema.yaml`	Single-file representation; used as baseline
FluentMigrator	`.cs` C# classes	Used in CI/CD pipeline for .NET microservices
Diff-based SQL	`migration-*.sql`	Generated from previous schema snapshots (`ALTER TABLE`, etc.)
Rollback support	FluentMigrator or optional `down.sql`	Optional if configured in prompt or CI policy

☁️ 4. Cloud-Native Support Targets¶

Provider	Component
Azure SQL Database	Target for production relational workloads
Azure Cosmos DB (Mongo API)	NoSQL collections with TTL/index rules
Azure Storage Blobs	Used for uploading schema bundles + changelogs
Azure Event Grid	Publishes `DatabaseSchemaPublished` event
Azure DevOps	CI pipeline triggers + artifact deployment

🧠 5. Semantic Memory & Validation Helpers¶

Tool	Purpose
ConnectSoft Schema Registry	Stores historical schema snapshots and changelogs
SQLFluff / pgFormatter	Optional formatting and linting of generated `.sql` files
DiffEngine	Computes field-level diffs and detects breaking schema changes
Governance Rule Engine	Ensures compliance with ConnectSoft standards (naming, retention, versioning)

🔐 6. Security & PII Tagging Compatibility¶

System	Integration
Microsoft Purview	Tag fields with `pii`, `sensitive` for catalog sync
Dynamic Data Masking (SQL Server)	Emit mask metadata (`MASKED WITH FUNCTION`)
PostgreSQL RLS	Inject `CREATE POLICY` and `ALTER TABLE ENABLE RLS`
Field-level encryption hints	Optional `x-encrypted: true` metadata for KMS-aware adapters

📈 7. Observability + Telemetry Infrastructure¶

Tool	Output
OpenTelemetry	Span for each step of schema generation and migration writing
Prometheus (via Observability Agent)	Emits metrics: number of fields, retry count, PII coverage
Application Insights	Track errors, latency, and field-level validation failures

✅ Output Delivery Channels¶

Artifact	Destination
`schema.sql`, `migration.sql`	Artifact Storage API
`.cs` (FluentMigrator)	.NET Service Repositories
`DatabaseSchemaPublished`	Event Grid + CI/CD trigger
Diagrams	Developer Portal UI
Observability Map	Observability Agent and tracing systems

📜 System Prompt (Bootstrapping Instruction)¶

The System Prompt defines the agent’s core instructions, design rules, and execution logic when activated by the ConnectSoft AI Factory.
It ensures consistency, governance compliance, and repeatable schema generation across all services and domains.

✅ Full System Prompt (Plain Text)¶

You are the **Data Architect Agent** in the ConnectSoft AI Software Factory.

Your responsibility is to translate domain models, value objects, enums, validation rules,  
and retention/compliance policies into versioned, cloud-ready **data schemas and migrations**.

---

## Your Core Goals:

1. Produce:
   - `schema.sql` (relational DDL)
   - `schema.yaml` (ConnectSoft platform DSL)
   - `migration-*.sql` or `*.cs` (FluentMigrator classes)
   - `er-diagram.mmd` (visual documentation)
   - `field-retention-map.yaml` (compliance tags)
   - `data-schema-observability.yaml` (telemetry)
   - `DatabaseSchemaPublished` (contract event)

2. Map:
   - Aggregates → Tables or Collections
   - Value Objects → Embedded or Normalized fields
   - Enums → Varchar, int, or lookup tables (based on reuse)
   - Field constraints (`required`, `minLength`, etc.) → DB constraints
   - Retention/PII fields → masking/encryption/TTL metadata

3. Ensure:
   - Multitenancy (`tenant_id` field is included and indexed)
   - Traceability (`trace_id`, `correlation_id`, `created_at`)
   - Referential integrity (primary, foreign, unique constraints)
   - Observability span mapping per table operation
   - Compatibility with FluentMigrator or SQL-first CI/CD pipelines

---

## Validation Rules:

- Every table must have a primary key
- Indexes must not overlap or duplicate constraints
- If RLS is enabled, inject `CREATE POLICY`
- If migration strategy = diff, compare `previous-schema.yaml`
- Soft-deletable entities must include `deleted_at`
- Sensitive fields (email, ssn, dob) must be tagged with `x-sensitive: true`
- Enum fields must have value validation (`CHECK (status IN (...))`)
- All schema artifacts must include `trace_id`, `service_name`, `version`

---

## Migration Strategy Modes:

- `fluentmigrator` → emit `.cs` files using `Migration(YyyyMMdd_HHmm)` format
- `sql` → emit timestamped forward-only `.sql` files
- `diff` → compare against previous schema and generate additive diffs
- All strategies must support rollback or changelog tagging (if configured)

---

## Publication:

When schema generation is complete:
- Emit a `DatabaseSchemaPublished` event with artifact links, version, and `trace_id`
- Archive schema bundle for CI/CD and downstream test/migration runners

---

## Output Format and Structure:

- Use snake_case for table and column names
- Prefer `uuid`, `timestampz`, `varchar`, `jsonb` types
- Enforce DDD boundaries — no DTOs, only domain-derived structures
- Avoid circular foreign keys unless explicitly declared
- Each field in `schema.yaml` must contain `type`, `required`, and `observability` attributes

---

## Versioning:

- Changes to field type, constraints, or deletion = **breaking** → bump major version
- New nullable fields or optional indexes = **non-breaking** → bump minor version
- Deprecations must be marked and included in changelog

🧠 Summary of Behavior¶

The system prompt ensures the Data Architect Agent:

Builds clean, domain-aligned relational schemas
Emits machine- and human-readable contracts
Complies with observability, retention, and security standards
Publishes results for immediate downstream use in DevOps and governance

📥 Input Prompt Template¶

The Data Architect Agent receives a structured YAML prompt from the ConnectSoft orchestration layer.
This defines:

The service context
Source artifacts
Desired migration strategy
Compliance and observability enforcement options
Output formatting rules

✅ Full Input Prompt Example (YAML)¶

assignment: "generate-schema"

project:
  trace_id: "trace-data-55623"
  project_id: "connectsoft-saas"
  bounded_context: "Billing"
  service_name: "InvoiceService"
  agent_version: "1.3.0"

inputs:
  domain_model_url: "https://artifacts.connectsoft.dev/billing/domain-model.yaml"
  validation_rules_url: "https://artifacts.connectsoft.dev/billing/validation-map.yaml"
  retention_policy_url: "https://artifacts.connectsoft.dev/billing/field-retention-policy.yaml"
  app_architecture_url: "https://artifacts.connectsoft.dev/billing/application-architecture.md"
  glossary_url: "https://artifacts.connectsoft.dev/glossary/billing.yaml"
  previous_schema_url: "https://artifacts.connectsoft.dev/billing/schema-v1.yaml"  # optional

settings:
  migration_strategy: "fluentmigrator"     # options: fluentmigrator, sql, diff
  output_format: ["sql", "yaml", "erd"]
  include_observability_map: true
  publish_contract: true
  generate_down_migrations: false

🧾 Field Descriptions¶

🧠 `project`¶

Field	Description
`trace_id`	Correlation trace for full execution lineage
`project_id`	Root-level SaaS/project group
`bounded_context`	Domain boundary this service belongs to
`service_name`	Name of the microservice
`agent_version`	Which version of the Data Architect Agent is executing this spec

📂 `inputs`¶

Artifact	Description
`domain_model_url`	Aggregates, value objects, and enums
`validation_rules_url`	Field-level rules: min/max, format, nullable
`retention_policy_url`	PII, TTL, encryption, audit settings
`app_architecture_url`	Multitenancy, soft-deletion, RLS enforcement
`glossary_url`	Optional — aligns naming and reused concepts
`previous_schema_url`	Optional — enables diffing, changelog, and migration strategy derivation

⚙️ `settings`¶

Field	Purpose
`migration_strategy`	Determines output: `.sql`, `.cs`, or auto-diff
`output_format`	Declares desired formats (can include diagrams, schema DSL)
`include_observability_map`	Emits `data-schema-observability.yaml`
`publish_contract`	Emits `DatabaseSchemaPublished` event
`generate_down_migrations`	Whether to emit `Down()` in FluentMigrator (default false)

🧠 Minimal Example Prompt¶

assignment: generate-schema
project:
  service_name: ClientService
  trace_id: trace-client-774

inputs:
  domain_model_url: https://.../client/domain-model.yaml
  validation_rules_url: https://.../client/validation.yaml

settings:
  migration_strategy: sql
  output_format: [yaml]

✅ Validation Rules¶

Rule	Description
Must include `trace_id`, `service_name`, and at least `domain_model_url`	✅
`validation_rules_url` and `retention_policy_url` recommended for compliance coverage	✅
If `migration_strategy = diff`, `previous_schema_url` is required	✅
If `publish_contract: true`, all outputs must pass validation before event emission	✅

🔁 Compatibility Notes¶

All inputs are trace-linked and version-aware
Fully compatible with:
- Vision Architect Agent
- Application Architect Agent
- Domain Modeler Agent
- Compliance Agent
- Governance Dashboards
- DevOps Pipelines

📤 Output Expectations¶

The Data Architect Agent produces a complete, traceable, and deployable data definition package.
These outputs are consumed by:

🧱 Adapter Generator Agent (for ORM/entity generation)
🚀 DevOps Architect Agent (for pipeline execution)
🧪 Test Generator Agent (for fixture bootstrapping)
🧭 Governance Dashboards (for audit and compliance validation)

All artifacts include traceability, versioning, retention mapping, and observability hooks.

📦 Output Artifact Summary¶

Artifact	Format	Description
`schema.sql`	SQL	Declarative schema DDL for relational DBs
`schema.yaml`	YAML	Agent-readable DSL version of schema
`migration-*.sql`	SQL	Forward-only or reversible migrations
`Migrations/*.cs`	C#	FluentMigrator migrations (imperative)
`field-retention-map.yaml`	YAML	Field-level retention/PII classification
`data-schema-observability.yaml`	YAML	Field-span mapping, trace injection, audit coverage
`er-diagram.mmd`	Mermaid	ER diagram for dev portal visualization
`DatabaseSchemaPublished`	JSON	CI/CD-triggered contract event

🧾 Detailed Artifact Descriptions¶

🟦 1. `schema.sql`¶

A canonical DDL export aligned with domain structure and constraints

Includes:
- Table definitions
- Primary/foreign keys
- Indexes, CHECK constraints
- Nullable/required rules
- Default values
Compatible with PostgreSQL and SQL Server dialects

🟨 2. `schema.yaml`¶

Abstract schema model for platform-wide reuse

tables:
  - name: invoice
    fields:
      - name: invoice_id
        type: uuid
        primary_key: true
        required: true
      - name: status
        type: varchar(20)
        enum: InvoiceStatus
        pii: false
      - name: tenant_id
        type: uuid
        required: true
        indexed: true
    retention:
      status: 5y
    observability:
      trace_id: true
      created_at: true

🟪 3. Migrations¶

Type	Output
SQL	`migration-20250429-add-table-client.sql`
FluentMigrator	`Migrations/20250429_AddTableClient.cs`

Includes Up() (always), Down() (if enabled), with trace annotations

🟥 4. `field-retention-map.yaml`¶

Compliance-oriented metadata extracted from policy inputs

invoice.status:
  pii: false
  retention: 7y
  encrypted: false

client.email:
  pii: true
  encrypted: true
  redactable: true

🟩 5. `data-schema-observability.yaml`¶

Tells the Observability Agent what to trace at the DB level

tables:
  client:
    spans:
      - insert:
          span_name: db.crm.client.insert
          trace_fields: [trace_id, correlation_id]
      - update:
          span_name: db.crm.client.update

🟫 6. `er-diagram.mmd`¶

erDiagram
    invoice {
        UUID invoice_id PK
        UUID client_id FK
        NUMERIC amount
        TIMESTAMP created_at
    }
    client {
        UUID client_id PK
        STRING email
        TIMESTAMP deleted_at
    }
    client ||--o{ invoice : has

Hold "Alt" / "Option" to enable pan & zoom

📩 7. `DatabaseSchemaPublished` Event¶

Field	Description
`event`	`DatabaseSchemaPublished`
`service`	e.g., `InvoiceService`
`schemaVersion`	e.g., `v2.0.1`
`traceId`	Correlation ID
`artifactUrls`	DDL, DSL, diagrams, retention, observability
`timestamp`	UTC time of publication

Enables downstream jobs and dashboards to sync and validate schema artifacts

✅ Output Compliance Criteria¶

Rule	Enforced
All tables have `primary_key`, `created_at`, `tenant_id`	✅
Sensitive fields include `x-sensitive`, `pii`, `encrypted` tags	✅
Span templates exist for `insert`, `update`, `delete` ops	✅
Version bump triggered on any breaking change	✅
Traceability fields present in all outputs	✅ `trace_id`, `project_id`, `service_name`

🧠 Memory Strategy Overview¶

The Data Architect Agent uses structured and semantic memory to:

Reuse known field structures, enum mappings, and constraints
Apply naming, compliance, and index patterns consistently
Detect historical schema versions and generate intelligent diffs
Prevent unsafe or duplicated migrations
Assist in automated versioning and changelog generation

This memory system operates in two scopes:
🕐 Short-term (session memory) and 📚 Long-term (knowledge base).

🕐 Short-Term Memory¶

Memory Area	Purpose
`session.trace_id`	Tracks lineage of current generation run
`field_definitions[]`	Captures all resolved fields + inferred types/constraints
`aggregates[]`	List of processed aggregates and their mappings
`value_object_usage[]`	Tracks flattened vs referenced VOs
`generated_migrations[]`	Avoids duplication and helps deduplicate FluentMigrator class names
`schema_diff_cache`	Temporarily holds change list between old and new schema for rollback/scoping logic

📚 Long-Term Semantic Memory¶

1. 📦 Reusable Schema Fragments¶

Pattern	Description
`Money`	`{ amount NUMERIC(10,2), currency VARCHAR(3) }`
`Email`	`{ email VARCHAR(255) NOT NULL CHECK (email LIKE '%@%') }`
`AuditTrailFields`	`created_at`, `updated_at`, `deleted_at`, `created_by`, `trace_id`
`Multitenancy`	`tenant_id UUID NOT NULL`, indexed, included in all queries

2. 🔐 Compliance & PII Tagging¶

Trigger	Output
Field name matches `"email"`, `"ssn"`, `"dob"`	Auto-tag with `x-sensitive`, `encrypted`, `audit_required`
Field flagged as TTL	Inject `x-ttl-days` and mark for storage pruning or indexing
Service is in a regulated domain (e.g., `Insurance`, `Healthcare`)	Enforce retention mapping and trace coverage

3. 📑 Version History Ledger¶

Tracked	Use
`schema@v1`, `schema@v2`	Enables diff-based migration generation
`field_changes[]`	Auto-generates changelogs: added, removed, modified
`breaking_changes[]`	Triggers version bump logic and rollback plan (if allowed)
`last_seen_field_states[]`	Prevents downgrades or regressions in schema type evolution

4. 🧱 Migration Pattern Matching¶

Detection	Memory Action
Field type widening (e.g., `varchar(50)` → `varchar(255)`)	Mark as safe
Enum extension	Log, suggest adding `CHECK` with extended IN clause
Field drop attempt	Flag as unsafe unless wrapped in reversible strategy
New VO used	Suggest reuse pattern based on prior flatten/reference ratio

5. 🧠 Glossary-Linked Metadata¶

Concept	Used For
`"Address"`	Suggest common VO flattening structure (`address_line1`, `city`, `zip`)
`"Status"`	Enum field with shared value set (`Active`, `Suspended`, `Pending`)
`"AuditTrail"`	Standard field group injection + `x-audit-required: true`

🧪 Memory-Driven Changelog Generation¶

When previous_schema.yaml is present, memory diff engine emits:

changelog:
  version: v2.0.0
  breaking_changes:
    - field_removed: client.middle_name
    - constraint_changed: invoice.amount now NOT NULL
  additions:
    - table: payment_method
    - field_added: invoice.due_date

🧠 Semantic Memory Match Example¶

query: "field: invoice.amount"
match:
  reuse_pattern: Money
  type: NUMERIC(10,2)
  constraint: NOT NULL DEFAULT 0.00
  traceability: true
  audit_required: true
  last_used_by: InvoiceService

✅ Benefits of Memory Strategy¶

Capability	Value
Schema reuse	Faster, consistent generation across services
Compliance tagging	Reduced errors, governance integration
Safe migration evolution	Intelligent diffs and rollback logic
Version diffing	Enables changelog and CI/CD auditability
Observability propagation	Preserves trace span coverage across evolutions

✅ Validation & Correction Framework¶

The Data Architect Agent executes multi-phase schema validation, correction, and recovery logic across all artifacts it generates.
It ensures data designs are:

Structurally valid
Compliant with domain and validation rules
Backward-compatible (when possible)
Safe for CI/CD deployment and migration

🔍 Validation Phases¶

1️⃣ Schema Structure & Syntax Validation¶

Target	Check
`schema.sql`	Valid syntax per SQL dialect (PostgreSQL, SQL Server)
`schema.yaml`	Valid against ConnectSoft schema DSL schema
`migration.sql`	Executes in dry-run parser or mock DB
`fluentmigrator.cs`	Compiles with no semantic errors
`enum definitions`	Ensure values match usage and constraints

2️⃣ Domain ↔ Storage Consistency Validation¶

Rule	Enforced
Every aggregate maps to one table or collection	✅
Every required field must be `NOT NULL`	✅
Field constraints match `validation-map.yaml`	✅ (e.g., length, min/max, format)
Enum fields have correct constraint (`CHECK` or FK)	✅
Field types are semantically consistent (`Email` → `varchar`)	✅

3️⃣ Migration Safety Validation¶

Rule	Check
Dropping a column triggers warning + audit event	⚠️
Changing nullable → non-null requires default or backfill	✅
Field renames must include mapping	✅
Duplicated migrations by name → flagged	✅
New table already exists → abort or suggest suffixing

4️⃣ Compliance & Observability Validation¶

Rule	Action
Missing `trace_id`, `created_at`, `tenant_id` → inject	🔁
PII field detected but untagged → tag as `x-sensitive`	🔁
No observability span for insert/update/delete → warn and generate defaults	🔁
Retention policy undefined for `ttl` fields → assign fallback or escalate	⚠️

🔁 Retry & Auto-Correction Flow¶

Skill	Error	Retry Action
`FieldTypeInfererSkill`	Unknown domain type	Fallback to `text` or `jsonb` with comment
`MigrationSafetyCheckerSkill`	Unsafe drop	Remove drop, inject deprecation comment
`PIITaggerSkill`	Detected untagged field	Search glossary or pattern match
`EnumStrategySelectorSkill`	Enum mismatch	Convert to `varchar(20)` or FK lookup table
`TraceFieldEnforcerSkill`	`trace_id` missing	Add field + observability span

📜 Correction Example¶

{
  "trace_id": "data-trace-44192",
  "skill": "MigrationSafetyCheckerSkill",
  "retry": 1,
  "issue": "Unsafe type change: amount NUMERIC → INTEGER",
  "resolution": "Suggested add new column 'amount_int' and backfill from 'amount'"
}

📈 Metrics Collected¶

Metric	Description
`schema_validation_passed_total`	Count of successful full schema validations
`migration_validation_failed_total`	Count of rejected or unsafe migration attempts
`pii_fields_tagged_total`	Number of `x-sensitive` fields detected
`observability_span_injected_total`	Operations auto-covered
`retry_count_by_skill`	Diagnostic metric for agent behavior tuning

🚨 Escalation Conditions¶

Condition	Action
3+ retries in same skill	Emit `SchemaGenerationWarning`
Unsafe column deletion without `--SAFE_DROP` tag	Emit blocking error, abort pipeline
Trace fields missing from multiple tables	Block observability publication
Version mismatch with `schema.version`	Warn CI and halt `DatabaseSchemaPublished` event

✅ Recovery Guarantee¶

Most failures are resolved via:

Pattern-based correction
Semantic fallback injection
Memory-guided type resolution
Retry with normalized constraint variants

Final output is only published if:

✅ All artifacts pass validation
✅ Changelog is emitted (if breaking)
✅ Observability and compliance coverage ≥ 90%

🤝 Collaboration Interfaces¶

The Data Architect Agent integrates seamlessly with both upstream semantic agents and downstream operational agents to maintain data-design integrity across the ConnectSoft ecosystem.

It consumes domain and application structure from upstream agents and produces schema, migration, and metadata artifacts for others to consume in deployment, validation, and visualization flows.

🔼 Upstream Inputs (Dependencies)¶

Agent	Artifact
Domain Modeler Agent	`domain-model.yaml`, `validation-map.yaml`, `value-objects.yaml`
Application Architect Agent	`use-cases.yaml`, `application-architecture.md` (multi-tenancy, soft delete, access patterns)
Compliance Architect Agent	`field-retention-policy.yaml`, PII rules
Solution Architect Agent	Overall platform schema strategy (shared DB, RLS, etc.)
Glossary Agent (optional)	Term-to-field mappings and reusable VO field sets

🔽 Downstream Consumers¶

Agent	Artifacts Consumed
Adapter Generator Agent	`schema.yaml`, `schema.sql`, `field-retention-map.yaml`
→ Generates ORM models, query layer, value mappers

| DevOps Architect Agent | migration-*.sql, Migrations/*.cs, schema.version, DatabaseSchemaPublished
→ Integrates schema deployment into CI/CD pipelines |

| Test Generator Agent | schema.yaml, schema.sql
→ Creates fixtures, seed data, integration DB tests |

| Observability Agent | data-schema-observability.yaml
→ Generates span mapping and access telemetry rules |

| Compliance Dashboard Agent | field-retention-map.yaml, schema lineage
→ Ensures traceable PII/redaction coverage |

| Developer Portal Generator | er-diagram.mmd, schema.yaml, data-relationships.md
→ Builds visual API-to-DB relationship diagrams and schema browser |

📢 Event-Driven Interface¶

Event	Description
`DatabaseSchemaPublished`	Published once all validation and artifact generation passes
→ Contains artifact links, trace_id, schemaVersion, service, timestamp
`SchemaGenerationWarning` (optional)	Emitted when unsafe migrations, missing PII tags, or trace fields are detected but auto-corrected

🧠 Inter-Agent Traceability¶

All generated artifacts include trace metadata:

trace_id: data-trace-000112
agent_version: 1.3.0
service_name: InvoiceService
bounded_context: Billing
schema_version: v2.1.0
generated_on: 2025-05-01T23:15:00Z

✅ Enables end-to-end tracing across modeling, schema, tests, and gateway configuration

🔄 Collaboration Flow Diagram¶

flowchart TD
  DomainModeler -->|domain-model.yaml| DataArchitect
  ApplicationArchitect -->|use-cases.yaml| DataArchitect
  ComplianceArchitect -->|retention-policies.yaml| DataArchitect
  DataArchitect --> AdapterGenerator
  DataArchitect --> DevOpsArchitect
  DataArchitect --> TestGenerator
  DataArchitect --> ObservabilityAgent
  DataArchitect --> DeveloperPortal

Hold "Alt" / "Option" to enable pan & zoom

🔐 Contract Compliance for Consumers¶

Constraint	Applied
Migration consumers must use provided version tag (`schema.version`)	✅
ORM codegen must respect nullable and encrypted flags	✅
PII fields must be consumed only via compliant field mappers	✅
Tests must use fixture-safe schema versions (`schema.yaml@vN`)	✅
Observability spans must align with table/action trace templates	✅

✅ Outcomes of Agent Collaboration¶

Capability	Benefit
Unified source of truth for DB schema	✅ Every agent reads from `schema.yaml`
Safe migration rollout via CI/CD	✅ DevOps Agent applies only validated `.sql` / `.cs`
Traceable PII and retention policies	✅ Compliance & Security Dashboards stay aligned
Full-stack visibility	✅ API calls trace down to fields and rows in DB

📡 Observability Hooks¶

The Data Architect Agent is fully observable and auditable via:

OpenTelemetry spans emitted during each skill's execution
Lifecycle events for schema generation and migration publication
Semantic trace metadata embedded in all output files

These ensure complete transparency into what was generated, why, and how it evolved.

📈 Key Telemetry Metrics¶

Metric	Description
`schema_generation_duration_ms`	Total time to parse, generate, and validate all artifacts
`pii_fields_tagged_total`	Number of fields identified as sensitive/PII
`migration_retry_count_total`	Count of fallback/correction attempts during migration creation
`observability_field_injection_rate`	% of tables with required span fields (`trace_id`, `created_at`, etc.)
`breaking_change_detection_count`	How often the agent identifies and manages schema-breaking diffs

🧠 Span Emission Model¶

Each skill emits a span:

{
  "span_name": "SchemaBuilderSkill",
  "trace_id": "data-trace-11722",
  "duration_ms": 42,
  "agent_version": "1.3.0",
  "service_name": "InvoiceService",
  "status": "success"
}

These spans are collected by the Observability Agent and visible in ConnectSoft’s trace dashboards.

🧍 Human Oversight Triggers¶

Although autonomous, the agent flags and escalates edge cases such as:

Trigger	Escalation
PII fields without tagging → 3+ retries	`SchemaGenerationWarning` emitted
Unsafe `DROP COLUMN` without rollback support	Migration halted, escalated to Governance Officer
Traceability fields (`trace_id`, `tenant_id`) missing on multiple entities	Event suppressed until corrected
Schema version mismatch or illegal downgrade	Deployment blocked, audit note added

🔐 Compliance and Governance Integration¶

All output files are tagged with:
- agent_version, trace_id, schema_version, bounded_context, service_name
field-retention-map.yaml and data-schema-observability.yaml are synced with:
- Compliance Dashboards
- Data Loss Prevention Auditors
- Governance Agents

📢 Lifecycle Event: `DatabaseSchemaPublished`¶

Field	Description
`event`	`DatabaseSchemaPublished`
`schemaVersion`	Auto-computed via semver diff engine
`artifactUrls`	Links to schema, migrations, diagram, retention map
`generated_by`	`Data Architect Agent v1.3.0`
`trace_id`	Traceable execution lineage
`timestamp`	UTC timestamp

✅ Enables downstream CI/CD pipelines, test agents, and artifact promotion workflows

📘 Audit Trail Support¶

Each generated migration or schema file includes:

metadata:
  trace_id: "data-trace-11233"
  schema_version: "v2.1.0"
  service: "InvoiceService"
  generated_on: "2025-05-01T23:42:00Z"
  generated_by: "Data Architect Agent v1.3.0"

✅ Enables long-term schema evolution lineage and rollback validation

✅ Final Outcomes¶

The Data Architect Agent enables ConnectSoft to:

Operate with governed, reusable, traceable schema designs
Automate migration generation with confidence and auditability
Align field-level data design with domain, compliance, and observability
Support multi-agent cooperation across platform, infrastructure, and business domains

📦 Output Summary¶

Artifact	Purpose
`schema.sql`, `schema.yaml`	Declarative schema
`migration.sql`, `.cs`	Forward-compatible evolutions
`field-retention-map.yaml`	PII, TTL, compliance tracking
`observability.yaml`	Span coverage and trace field injection
`er-diagram.mmd`	Visual schema overview
`DatabaseSchemaPublished`	Event trigger for all downstream agents

🧠 Data Architect Agent Specification¶

🎯 Purpose¶

💡 Why This Agent Exists¶

🧱 Design Scope¶

🔁 Multi-Migration Strategy Support¶

🗂️ Example Output Bundle¶

🧠 Role in the Agentic Lifecycle¶

🧭 Summary¶

🏛️ Scope of Influence¶

📘 Logical Scope¶

🧱 Physical Scope¶

🔄 Temporal Scope¶

🔐 Compliance and Security Scope¶

🚫 Out of Scope¶

📦 Output Influence Summary¶

🧭 ConnectSoft Scope Matrix¶

📋 Core Responsibilities¶

📘 1. Schema Design Responsibilities¶

🔁 2. Migration and Evolution Responsibilities¶

🧪 3. Validation and Consistency Enforcement¶

📑 4. Retention, Compliance, and Security¶

🧭 5. Observability and Metrics Integration¶

📄 6. Documentation and Diagrams¶

✅ Summary of Responsibilities¶

📥 Core Inputs¶

📘 Required Input Artifacts¶

📄 Input Examples¶

🔶 domain-model.yaml¶

🔶 validation-map.yaml¶

🔶 field-retention-policy.yaml¶

🔶 application-architecture.md (Excerpt)¶

⚠️ Input Validation Rules¶

🧠 Semantic Memory Integration (Optional)¶

📦 Input Prompt Template (YAML)¶

📤 Core Outputs¶

📦 Primary Output Artifacts¶

📘 1. schema.sql¶

📘 2. schema.yaml¶

📦 3. Migrations¶

🔹 migration-<timestamp>-add-column.sql¶

🔹 Migrations/20250428_AddMarketingOptIn.cs¶

📑 4. field-retention-map.yaml¶

🗺️ 5. er-diagram.mmd¶

📈 6. data-schema-observability.yaml¶

📢 7. DatabaseSchemaPublished (Event)¶

✅ Output Checklist¶

📚 Knowledge Base Overview¶

🧱 1. Naming Conventions & Translation Rules¶

📦 2. Schema Normalization & Denormalization Rules¶

🔁 3. Migration Strategy Heuristics¶

🔐 4. Retention & PII Enforcement Patterns¶

🧠 5. Semantic Memory — Schema Reuse Suggestions¶

📈 6. Optimization Patterns¶

🧪 7. Validation-Aware Templates¶

🔁 Changelog Awareness¶

🔄 Process Flow Overview¶

📋 Step-by-Step Lifecycle¶

1️⃣ Input Parsing and Semantic Resolution¶

2️⃣ Table & Field Mapping¶

3️⃣ Index, Constraint, and Retention Policy Generation¶

4️⃣ Migration Strategy Resolution¶

5️⃣ Schema File and DSL Output¶

6️⃣ Observability Injection¶

7️⃣ Documentation and Diagram Generation¶

8️⃣ Schema Publication and Event Emission¶

📈 Telemetry & Span Injection¶

🧠 High-Level Flow¶

🛠️ Skills and Kernel Functions¶

📋 Core Skills Used¶

🧩 1. Input Inference and Normalization¶

🧱 2. Schema and Field Modeling¶

🔁 3. Migration Strategy and Evolution¶

🔒 4. Compliance and Retention Enforcement¶

📈 5. Observability Injection¶

🖼️ 6. Documentation and Visualization¶

📤 7. Publishing and Events¶

🧠 Skill Composition Flow (Simplified)¶

🔁 Retry-Safe Behaviors¶

📈 Metrics Emitted (Per Skill)¶

🛠️ Technologies Used¶

🔶 `domain-model.yaml`¶

🔶 `validation-map.yaml`¶

🔶 `field-retention-policy.yaml`¶

🔶 `application-architecture.md` (Excerpt)¶

📘 1. `schema.sql`¶

📘 2. `schema.yaml`¶

🔹 `migration-<timestamp>-add-column.sql`¶

🔹 `Migrations/20250428_AddMarketingOptIn.cs`¶

📑 4. `field-retention-map.yaml`¶

🗺️ 5. `er-diagram.mmd`¶

📈 6. `data-schema-observability.yaml`¶

📢 7. `DatabaseSchemaPublished` (Event)¶

🧠 `project`¶

📂 `inputs`¶

⚙️ `settings`¶

🟦 1. `schema.sql`¶

🟨 2. `schema.yaml`¶

🟥 4. `field-retention-map.yaml`¶

🟩 5. `data-schema-observability.yaml`¶

🟫 6. `er-diagram.mmd`¶

📩 7. `DatabaseSchemaPublished` Event¶

📢 Lifecycle Event: `DatabaseSchemaPublished`¶