Skip to content

๐Ÿ—„๏ธ Database Blueprint

๐Ÿ“˜ What Is a Database Blueprint?

A Database Blueprint is a structured, agent-generated artifact that defines the database schema design, migration strategies, query optimization plans, and multi-tenant data isolation for a ConnectSoft-generated component โ€” whether it's a microservice, module, bounded context, or shared data platform.

It represents the data persistence contract of record, created during the generation pipeline and continuously evaluated by downstream data engineering agents, DevOps pipelines, and runtime observability systems.

In the AI Software Factory, the Database Blueprint is not just a schema dump โ€” it's a machine-readable, agent-enforceable contract for how data is structured, accessed, protected, migrated, and scaled.


๐Ÿง  Blueprint Roles in the Factory

The Database Blueprint plays a pivotal role in making data persistence composable, observable, and resilient:

  • ๐Ÿ—„๏ธ Defines table structures, relationships, constraints, and index strategies
  • ๐Ÿ”„ Manages version-controlled migrations with rollback plans and zero-downtime strategies
  • ๐Ÿง  Enables AI agents to enforce schema standards, naming conventions, and performance budgets
  • ๐Ÿข Encodes multi-tenant data isolation patterns โ€” RLS, schema-per-tenant, or database-per-tenant
  • ๐Ÿ“Š Drives query performance baselines, index optimization, and execution plan monitoring
  • ๐Ÿ’พ Specifies backup schedules, RPO/RTO targets, and disaster recovery procedures

It ensures that data architecture is not an afterthought, but a first-class agent responsibility in the generation pipeline.


๐ŸŽฏ Purpose and Motivation

The Database Blueprint exists to solve one of the most persistent and costly problems in modern software delivery:

"Database schemas are inconsistently designed, migrations are fragile, queries degrade silently, and tenant data boundaries are poorly enforced."

In the ConnectSoft AI Software Factory, data persistence is integrated at the blueprint level, making it:

  • โœ… Deterministic (agent-generated, based on traceable domain models)
  • โœ… Repeatable (diffable migrations validated through CI/CD)
  • โœ… Observable (integrated with query metrics, drift detection, and performance budgets)
  • โœ… Composable (aligned with service, security, and infrastructure blueprints)

๐Ÿšจ Problems It Solves

Problem Area How the Database Blueprint Helps
๐Ÿงฉ Inconsistent Schema Designs Enforces naming conventions, type standards, and constraint patterns
๐Ÿ”„ Unmanaged Migrations Generates version-controlled, rollback-safe migrations with zero-downtime
๐Ÿ“‰ Poor Query Performance Defines performance budgets, index strategies, and execution plan baselines
๐Ÿข Tenant Data Leakage Enforces RLS policies, schema isolation, or database-per-tenant partitioning
๐Ÿ” No Index Optimization Strategy Specifies covering indexes, composite indexes, filtered indexes, and maintenance
๐Ÿ’พ Backup/Recovery Gaps Defines RPO/RTO targets, backup schedules, and geo-replication strategies
๐Ÿงฌ Schema Drift Across Environments Tracks blueprint mutations and configuration drift over time
๐Ÿ“Š Opaque Data Contracts Makes data expectations visible to other agents and orchestrators

๐Ÿง  Why Blueprints, Not Just Migrations?

While traditional environments rely on ad hoc migration scripts or ORM-generated schemas, the Factory approach uses blueprints because:

  • Blueprints are memory-linked to every module, entity, and trace ID
  • They are machine-generated and human-readable
  • They support forward/backward analysis across schema versions and data model changes
  • They coordinate multiple agents across Data Architecture, Backend Dev, Infra, Security, and DevOps
  • They encode intent โ€” not just the final DDL โ€” enabling intelligent regeneration and optimization

This allows database design to be treated as code โ€” but also as a living architectural asset.


๐Ÿง‘โ€๐Ÿคโ€๐Ÿง‘ Blueprint Consumers and Usage

The Database Blueprint is not an isolated artifact. It is actively consumed across the ConnectSoft AI Software Factory by agents, humans, and CI systems to design, implement, validate, and operate data persistence layers.

Each consumer interprets the blueprint differently based on its context, but all share a common source of truth.


๐Ÿค– Agent Consumers

Agent Role in Consumption
๐Ÿ—„๏ธ Database Engineer Agent Designs schemas, generates migrations, optimizes indexes, tunes query performance
๐Ÿง  Data Architect Agent High-level data modeling, entity relationships, bounded context alignment
๐Ÿ’ป Backend Developer Agent Implements data access layers, ORM configuration, repository patterns
๐Ÿ“ฆ Infrastructure Engineer Agent Provisions database infrastructure, configures replicas and failover
๐Ÿ›ก๏ธ Security Architect Agent Defines row-level security, encryption at rest, data masking policies
๐Ÿš€ DevOps Engineer Agent Manages database CI/CD pipelines, migration deployment automation
๐Ÿงช Test Automation Engineer Agent Database integration tests, data seeding, test data generation

๐Ÿ‘ค Human Stakeholders

Role Value Derived from Database Blueprint
Data Architect Validates data models, reviews entity relationships and bounded context boundaries
Backend Developer Understands table structures, relationships, and ORM mapping expectations
DBA Reviews index strategies, query plans, replication, and backup configurations
DevOps Engineer Applies migration pipelines, ensures database provisioning aligns across environments
QA Lead Confirms test data seeding strategies and integration test coverage
Security Officer Verifies encryption, RLS policies, and data masking for compliance

๐Ÿง  Machine Consumers

  • GitOps systems: verify drift in schema definitions and migration state
  • CI/CD pipelines: run migration validation gates and performance regression tests
  • ORM frameworks: consume schema metadata for code generation and mapping
  • Monitoring systems: compare query performance against declared baselines
  • Memory Indexing Layer: links blueprint data assertions to downstream agents and alerts

The Database Blueprint becomes a persistence contract, providing guarantees that must be respected by services, infrastructure, CI/CD, and security processes alike.


๐Ÿง  Agent-Created, Trace-Ready Artifact

In the ConnectSoft AI Software Factory, the Database Blueprint is not written manually โ€” it is generated, enriched, and validated by multiple agents, then stored as part of the system's memory graph.

This ensures every data persistence contract is:

  • ๐Ÿ“Œ Traceable to its origin domain model or product feature
  • ๐Ÿ” Regenerable with context-aware mutation
  • ๐Ÿ“Š Auditable through observability-first design
  • ๐Ÿง  Embedded into the long-term agentic memory system

๐Ÿค– Agents Involved in Creation

Agent Responsibility
๐Ÿง  Data Architect Agent Designs high-level data models, defines aggregate boundaries and relationships
๐Ÿ—„๏ธ Database Engineer Agent Translates models into schemas, indexes, constraints, and migration scripts
๐Ÿ’ป Backend Developer Agent Aligns ORM mappings and repository patterns with schema definitions
๐Ÿ›ก๏ธ Security Architect Agent Injects RLS policies, encryption specs, and data masking rules
๐Ÿ“ฆ Infrastructure Engineer Agent Provisions database servers, replicas, and backup infrastructure
๐Ÿš€ DevOps Engineer Agent Integrates migration pipelines and schema validation into CI/CD

Each agent contributes signals, decisions, and enriched metadata to create a complete, executable artifact.


๐Ÿ“ˆ Memory Traceability

Database Blueprints are:

  • ๐Ÿ”— Linked to the project-wide trace ID
  • ๐Ÿ“‚ Associated with the microservice, module, or bounded context
  • ๐Ÿง  Indexed in vector memory for AI reasoning and enforcement
  • ๐Ÿ“œ Versioned and tagged (v1, approved, migrated, drifted, etc.)

This makes the blueprint machine-auditable, AI-searchable, and human-explainable.


๐Ÿ“ Example Storage and Trace Metadata

traceId: trc_82db_OrderService_v3
agentId: db-engineer-001
serviceName: OrderService
dataProfile: relational-multi-tenant
tags:
  - postgresql
  - rls
  - migrations
  - indexes
  - multi-tenant
version: v3
state: approved
createdAt: "2025-07-15T09:30:00Z"
diffFrom: v2
changedFields:
  - schemas.orders.columns
  - indexes.orders_tenant_date_idx
  - migrations.v3_add_order_status

๐Ÿ“ฆ What It Captures

The Database Blueprint encodes a comprehensive set of data persistence dimensions that affect a service or module throughout its lifecycle โ€” from initial design to production runtime.

It defines what data structures exist, how they relate, how they evolve, and how they perform โ€” making it a living contract between the generated component and its data layer.


๐Ÿ—„๏ธ Core Database Elements Captured

Category Captured Details
Schema Definitions Tables, columns, data types, constraints, defaults, computed columns
Entity-Relationship Models ER diagrams, aggregate boundaries, reference data, foreign key graphs
Migration Strategies Version-controlled migrations, rollback plans, zero-downtime DDL changes
Index Optimization Plans Covering indexes, composite indexes, filtered indexes, index maintenance windows
Multi-Tenant Data Isolation Schema-per-tenant, RLS, shared-schema with tenant ID, database-per-tenant
Query Performance Baselines Expected query patterns, execution plan targets, performance budgets
Backup and Recovery Specs RPO/RTO targets, backup schedules, point-in-time recovery, geo-replication
Replication and Sharding Read replicas, horizontal sharding strategies, data partitioning schemes
Data Seeding Seed data definitions, test data generation rules, data masking for non-prod
Temporal Data Temporal tables, audit trails, soft deletes, history tracking

๐Ÿ“Ž Blueprint Snippet (Example)

schema:
  name: OrderService
  engine: postgresql
  version: "16"

tables:
  - name: orders
    schema: public
    columns:
      - name: id
        type: uuid
        primaryKey: true
        default: gen_random_uuid()
      - name: tenant_id
        type: uuid
        nullable: false
        index: true
      - name: customer_id
        type: uuid
        nullable: false
        foreignKey:
          table: customers
          column: id
          onDelete: RESTRICT
      - name: order_number
        type: varchar(50)
        nullable: false
        unique: true
      - name: status
        type: varchar(20)
        nullable: false
        default: "'pending'"
        check: "status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')"
      - name: total_amount
        type: decimal(18,2)
        nullable: false
      - name: currency_code
        type: char(3)
        nullable: false
        default: "'USD'"
      - name: created_at
        type: timestamptz
        nullable: false
        default: now()
      - name: updated_at
        type: timestamptz
        nullable: false
        default: now()
      - name: is_deleted
        type: boolean
        nullable: false
        default: "false"
    indexes:
      - name: ix_orders_tenant_id
        columns: [tenant_id]
      - name: ix_orders_customer_id
        columns: [tenant_id, customer_id]
      - name: ix_orders_status_created
        columns: [tenant_id, status, created_at]
        where: "is_deleted = false"
    rowLevelSecurity:
      enabled: true
      policy: "tenant_id = current_setting('app.current_tenant')::uuid"

multiTenant:
  strategy: shared-schema-with-rls
  tenantColumn: tenant_id
  enforcement: row-level-security

migrations:
  tool: flyway
  baselineVersion: "1.0.0"
  currentVersion: "3.2.0"

backupStrategy:
  schedule: "0 2 * * *"
  retention: 30d
  pointInTimeRecovery: true
  rpoMinutes: 15
  rtoMinutes: 60

๐Ÿง  Cross-Blueprint Intersections (Overview)

  • Security Blueprint โ†’ defines encryption at rest, RLS policies, data masking
  • Infrastructure Blueprint โ†’ defines database provisioning, replicas, networking
  • Service Blueprint โ†’ defines ORM mappings, connection management, data access patterns
  • Test Blueprint โ†’ defines integration tests, seed data, and performance test baselines
  • Observability Blueprint โ†’ defines query metrics, slow query alerts, connection pool monitoring

The Database Blueprint aggregates, links, and applies data persistence rules across all of these โ€” ensuring coherence and alignment.


๐Ÿ—‚๏ธ Output Formats

The Database Blueprint is generated and consumed across multiple layers of the AI Software Factory โ€” from human-readable design reviews to machine-enforced CI/CD migration gates.

To support both automation and collaboration, it is produced in multiple coordinated formats, each aligned with a different set of use cases.


๐Ÿ“„ Human-Readable Markdown (.md)

Used in Studio, code reviews, data architecture reviews, and documentation layers.

  • Sectioned by category: schemas, migrations, indexes, multi-tenancy, backup
  • Rich formatting with ER diagrams and Mermaid visuals
  • Includes YAML/SQL code samples and policy excerpts
  • Links to upstream and downstream blueprints

๐Ÿ“œ Machine-Readable JSON (.json)

Used by agents, pipelines, and enforcement scripts.

  • Flattened and typed schema definitions
  • Includes metadata and trace headers
  • Validated against a shared schema specification
  • Compatible with ORM code generators and migration tools

Example excerpt:

{
  "traceId": "trc_82db_OrderService_v3",
  "engine": "postgresql",
  "version": "16",
  "schemas": {
    "public": {
      "tables": {
        "orders": {
          "columns": {
            "id": { "type": "uuid", "primaryKey": true },
            "tenant_id": { "type": "uuid", "nullable": false },
            "customer_id": { "type": "uuid", "nullable": false },
            "order_number": { "type": "varchar(50)", "unique": true },
            "status": { "type": "varchar(20)", "default": "pending" },
            "total_amount": { "type": "decimal(18,2)" },
            "created_at": { "type": "timestamptz", "default": "now()" }
          },
          "indexes": [
            { "name": "ix_orders_tenant_id", "columns": ["tenant_id"] },
            { "name": "ix_orders_status_created", "columns": ["tenant_id", "status", "created_at"], "filtered": true }
          ],
          "rls": {
            "enabled": true,
            "tenantColumn": "tenant_id"
          }
        }
      }
    }
  },
  "migrations": {
    "tool": "flyway",
    "currentVersion": "3.2.0"
  }
}

๐Ÿ“‚ SQL Migration Files (.sql)

Direct migration scripts generated from blueprint changes.

  • Version-numbered and idempotent
  • Include both up and down scripts
  • Validated against zero-downtime migration rules

๐Ÿ” CI/CD Compatible YAML (.yaml)

Used to configure database pipelines, backup schedules, and monitoring hooks.

  • Migration deployment steps
  • Schema validation gate configurations
  • Performance regression test definitions

๐Ÿง  Embedded Memory Shape (Vectorized)

  • Captured in agent long-term memory
  • Indexed by concept (e.g., rls, migration, index-strategy, backup)
  • Linked to all agent discussions, generations, and validations
  • Enables trace-based enforcement and reuse

๐Ÿ“ Storage Convention

Each Database Blueprint is stored in a consistent, discoverable location within the Factory workspace:

blueprints/database/{component-name}/database-blueprint.md
blueprints/database/{component-name}/database-blueprint.json
blueprints/database/{component-name}/migrations/
blueprints/database/{component-name}/schemas/
blueprints/database/{component-name}/indexes/

๐Ÿ“‚ Directory Structure Example

blueprints/database/order-service/
โ”œโ”€โ”€ database-blueprint.md           # Human-readable specification
โ”œโ”€โ”€ database-blueprint.json         # Machine-readable schema definition
โ”œโ”€โ”€ migrations/
โ”‚   โ”œโ”€โ”€ V1.0.0__initial_schema.sql
โ”‚   โ”œโ”€โ”€ V1.1.0__add_order_status.sql
โ”‚   โ”œโ”€โ”€ V2.0.0__add_rls_policies.sql
โ”‚   โ”œโ”€โ”€ V3.0.0__add_audit_columns.sql
โ”‚   โ”œโ”€โ”€ V3.1.0__add_composite_index.sql
โ”‚   โ””โ”€โ”€ V3.2.0__add_currency_support.sql
โ”œโ”€โ”€ schemas/
โ”‚   โ”œโ”€โ”€ orders.yaml
โ”‚   โ”œโ”€โ”€ customers.yaml
โ”‚   โ”œโ”€โ”€ order_items.yaml
โ”‚   โ””โ”€โ”€ reference_data.yaml
โ”œโ”€โ”€ indexes/
โ”‚   โ”œโ”€โ”€ orders_indexes.yaml
โ”‚   โ”œโ”€โ”€ customers_indexes.yaml
โ”‚   โ””โ”€โ”€ index_maintenance.yaml
โ””โ”€โ”€ seed/
    โ”œโ”€โ”€ reference_data.sql
    โ”œโ”€โ”€ test_tenants.sql
    โ””โ”€โ”€ test_orders.sql
  • Markdown is human-readable and Studio-rendered.
  • JSON is parsed by orchestrators and enforcement agents.
  • SQL migration files are executed by Flyway, Liquibase, or EF Core Migrations.
  • YAML schema/index files are consumed by agents for validation and regeneration.

๐Ÿ—๏ธ Schema Design Principles

The Database Blueprint enforces a rigorous set of schema design principles that ensure consistency, maintainability, and performance across all generated components.

These principles are not guidelines โ€” they are agent-enforced constraints embedded in every schema definition.


๐Ÿ“ Naming Conventions

Element Convention Example
Tables snake_case, plural orders, order_items
Columns snake_case, descriptive created_at, total_amount
Primary Keys id (UUID preferred) id uuid PRIMARY KEY
Foreign Keys {referenced_table}_id customer_id, tenant_id
Indexes ix_{table}_{columns} ix_orders_tenant_id
Unique Constraints uq_{table}_{columns} uq_orders_order_number
Check Constraints ck_{table}_{description} ck_orders_status_valid
Sequences seq_{table}_{column} seq_orders_order_number

๐Ÿ”„ Normalization Strategy

The blueprint prescribes Third Normal Form (3NF) as the default, with explicit, documented exceptions for denormalization:

schemaDesign:
  defaultNormalization: 3NF
  denormalizationPolicy:
    allowed: true
    requiresJustification: true
    commonPatterns:
      - pattern: materialized-view
        useCase: "Read-heavy dashboards and reporting queries"
      - pattern: json-column
        useCase: "Flexible metadata or configuration storage"
      - pattern: computed-column
        useCase: "Derived values that avoid expensive JOINs"
      - pattern: denormalized-aggregate
        useCase: "Pre-computed totals for high-frequency reads"

๐Ÿ• Temporal Tables and Audit Trails

Every table that requires audit history is configured with temporal tracking:

temporalTables:
  enabled: true
  strategy: system-versioned
  tables:
    - name: orders
      historyTable: orders_history
      periodColumns:
        start: valid_from
        end: valid_to
    - name: customers
      historyTable: customers_history
      periodColumns:
        start: valid_from
        end: valid_to

Equivalent SQL (PostgreSQL with temporal extension):

CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL,
    customer_id UUID NOT NULL REFERENCES customers(id),
    order_number VARCHAR(50) NOT NULL,
    status VARCHAR(20) NOT NULL DEFAULT 'pending',
    total_amount DECIMAL(18,2) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    is_deleted BOOLEAN NOT NULL DEFAULT false,
    valid_from TIMESTAMPTZ NOT NULL DEFAULT now(),
    valid_to TIMESTAMPTZ NOT NULL DEFAULT 'infinity'
);

CREATE TABLE orders_history (LIKE orders INCLUDING ALL);

๐Ÿ—‘๏ธ Soft Delete Strategy

All entities use soft deletes by default, with hard delete reserved for compliance-driven data purging:

softDelete:
  enabled: true
  column: is_deleted
  type: boolean
  default: false
  globalFilter: true
  hardDeletePolicy:
    allowed: true
    requiresApproval: true
    useCases:
      - gdpr-right-to-erasure
      - data-retention-expiry

๐Ÿ“˜ Blueprint Schema Validation Rules

schemaValidation:
  rules:
    - rule: all-tables-must-have-primary-key
      severity: error
    - rule: all-foreign-keys-must-have-index
      severity: error
    - rule: no-nullable-foreign-keys
      severity: warning
    - rule: all-tables-must-have-created-at
      severity: error
    - rule: all-tables-must-have-updated-at
      severity: error
    - rule: varchar-must-have-max-length
      severity: error
    - rule: decimal-must-have-precision
      severity: error
    - rule: tenant-column-required-for-business-tables
      severity: error
    - rule: soft-delete-column-required
      severity: warning

๐Ÿค– Agent Collaboration

Agent Role
Data Architect Agent Defines naming conventions, normalization rules, and domain models
Database Engineer Agent Validates schemas against rules, generates DDL, applies constraints
Backend Developer Agent Ensures ORM entities align with schema definitions
Security Architect Agent Verifies sensitive columns are encrypted or masked

๐Ÿ”„ Migration Lifecycle

Migrations are the mechanism by which schema changes are applied to database environments. The Database Blueprint treats migrations as first-class, version-controlled, agent-generated artifacts with full traceability and rollback capabilities.


๐Ÿ“ˆ Migration Lifecycle Flow

flowchart TD
    A[Schema Change Request] --> B[Data Architect Agent]
    B --> C[Database Engineer Agent]
    C --> D[Generate Migration Script]
    D --> E[Validate Against Rules]
    E --> F{Passes Validation?}
    F -->|Yes| G[Generate Rollback Script]
    F -->|No| H[Return to Engineer Agent]
    H --> C
    G --> I[Test in Staging]
    I --> J{Performance OK?}
    J -->|Yes| K[Add to Migration Pipeline]
    J -->|No| L[Optimize & Retry]
    L --> D
    K --> M[Deploy to Production]
    M --> N[Verify & Update Blueprint]
    N --> O[Update Memory Graph]
Hold "Alt" / "Option" to enable pan & zoom

๐Ÿงฉ Migration Types

Migration Type Description Example
Schema Migration DDL changes: CREATE, ALTER, DROP Add new table, modify column type
Data Migration DML changes: INSERT, UPDATE, DELETE Backfill computed column values
Index Migration CREATE INDEX, DROP INDEX, REINDEX Add covering index for new query
Security Migration RLS policies, GRANT/REVOKE, encryption Enable RLS on new table
Reference Data Seed or update lookup/reference tables Add new order status values
Rollback Migration Reverse a previously applied migration Undo column rename

๐Ÿ“œ Migration Naming Convention

V{version}__{description}.sql

Examples:

V1.0.0__initial_schema.sql
V1.1.0__add_order_items_table.sql
V1.2.0__add_rls_policies.sql
V2.0.0__add_audit_columns.sql
V2.0.0__rollback.sql


๐Ÿ“Ž Migration Script Example (Up)

-- V3.2.0__add_currency_support.sql
-- Migration: Add multi-currency support to orders table
-- Agent: db-engineer-001
-- TraceId: trc_82db_OrderService_v3
-- Zero-Downtime: Yes (additive only)

BEGIN;

ALTER TABLE orders
    ADD COLUMN currency_code CHAR(3) NOT NULL DEFAULT 'USD';

ALTER TABLE orders
    ADD COLUMN exchange_rate DECIMAL(12,6) DEFAULT 1.000000;

ALTER TABLE orders
    ADD CONSTRAINT ck_orders_currency_valid
    CHECK (currency_code ~ '^[A-Z]{3}$');

CREATE INDEX CONCURRENTLY ix_orders_currency
    ON orders (tenant_id, currency_code)
    WHERE is_deleted = false;

COMMENT ON COLUMN orders.currency_code IS 'ISO 4217 currency code';
COMMENT ON COLUMN orders.exchange_rate IS 'Exchange rate to base currency at time of order';

COMMIT;

๐Ÿ“Ž Migration Script Example (Down / Rollback)

-- V3.2.0__rollback.sql
-- Rollback: Remove multi-currency support from orders table

BEGIN;

DROP INDEX CONCURRENTLY IF EXISTS ix_orders_currency;

ALTER TABLE orders
    DROP CONSTRAINT IF EXISTS ck_orders_currency_valid;

ALTER TABLE orders
    DROP COLUMN IF EXISTS exchange_rate;

ALTER TABLE orders
    DROP COLUMN IF EXISTS currency_code;

COMMIT;

๐Ÿš€ Zero-Downtime Migration Patterns

The blueprint enforces zero-downtime migration strategies for production deployments:

Pattern Description When to Use
Expand-Contract Add new โ†’ migrate data โ†’ remove old Column renames, type changes
Additive Only Only add columns/tables, never remove in same release New features
Concurrent Index CREATE INDEX CONCURRENTLY to avoid table locks All index additions
Background Data Migration Async backfill with batched updates Large data transformations
Feature Flag Guard Code reads old and new columns, flag controls which is authoritative Breaking schema changes

๐Ÿ“˜ Zero-Downtime YAML Configuration

migrationPolicy:
  zeroDowtime: true
  patterns:
    allowedOperations:
      - ADD_COLUMN
      - ADD_INDEX_CONCURRENTLY
      - ADD_CONSTRAINT_NOT_VALID
      - CREATE_TABLE
      - ADD_RLS_POLICY
    prohibitedOperations:
      - DROP_COLUMN_IMMEDIATE
      - ALTER_COLUMN_TYPE_DIRECT
      - DROP_TABLE_WITH_DATA
      - ADD_NOT_NULL_WITHOUT_DEFAULT
    expandContract:
      enabled: true
      maxPhases: 3
      phaseDurationDays: 7
  validation:
    requireRollbackScript: true
    requirePerformanceTest: true
    requireZeroDowntimeProof: true

๐Ÿค– Agent Collaboration

Agent Role
Database Engineer Agent Generates migration scripts, validates zero-downtime compliance
Data Architect Agent Approves schema-level changes that affect domain model boundaries
DevOps Engineer Agent Integrates migrations into deployment pipelines
Test Automation Engineer Agent Runs migration tests against staging databases
Backend Developer Agent Updates ORM mappings to align with migrated schemas

๐Ÿ“Š Index Strategy and Optimization

Indexes are the primary mechanism for query acceleration in relational databases. The Database Blueprint defines a comprehensive index strategy that balances read performance, write overhead, and storage costs.


๐Ÿงฉ Index Types and Selection

Index Type Description Use Case
B-Tree Default balanced tree index Equality and range queries
Hash Hash-based lookup Exact equality only (rare in PostgreSQL)
GIN Generalized Inverted Index Full-text search, JSONB containment
GiST Generalized Search Tree Geometric, range, and proximity queries
BRIN Block Range Index Large tables with naturally ordered data
Partial/Filtered Index with WHERE clause Active records, non-deleted rows
Covering Includes non-key columns via INCLUDE Index-only scans for common query patterns
Composite Multi-column index Queries filtering on multiple columns

๐Ÿ“Ž Index Blueprint Definition

indexes:
  orders:
    - name: ix_orders_tenant_id
      type: btree
      columns: [tenant_id]
      purpose: "Tenant isolation filter"

    - name: ix_orders_tenant_customer
      type: btree
      columns: [tenant_id, customer_id]
      purpose: "Customer order lookups within tenant"

    - name: ix_orders_tenant_status_date
      type: btree
      columns: [tenant_id, status, created_at]
      where: "is_deleted = false"
      purpose: "Active order filtering by status and date"

    - name: ix_orders_order_number
      type: btree
      columns: [order_number]
      unique: true
      purpose: "Unique order number lookups"

    - name: ix_orders_total_amount_covering
      type: btree
      columns: [tenant_id, created_at]
      include: [total_amount, currency_code, status]
      purpose: "Index-only scan for order summaries"

    - name: ix_orders_jsonb_metadata
      type: gin
      columns: [metadata]
      purpose: "JSONB containment queries on order metadata"

๐Ÿ“œ Generated SQL Examples

-- Partial (filtered) index for active orders
CREATE INDEX CONCURRENTLY ix_orders_tenant_status_date
    ON orders (tenant_id, status, created_at DESC)
    WHERE is_deleted = false;

-- Covering index for index-only scans
CREATE INDEX CONCURRENTLY ix_orders_total_amount_covering
    ON orders (tenant_id, created_at DESC)
    INCLUDE (total_amount, currency_code, status);

-- GIN index for JSONB queries
CREATE INDEX CONCURRENTLY ix_orders_jsonb_metadata
    ON orders USING gin (metadata jsonb_path_ops);

-- BRIN index for time-series data
CREATE INDEX ix_audit_log_created_brin
    ON audit_log USING brin (created_at)
    WITH (pages_per_range = 128);

๐Ÿ”ง Index Maintenance Strategy

indexMaintenance:
  schedule:
    reindex: "weekly-sunday-02:00"
    analyzeFrequency: "daily-03:00"
    bloatThreshold: 30%
  monitoring:
    unusedIndexThreshold: 30d
    duplicateDetection: true
    bloatTracking: true
  automation:
    autoDropUnused: false
    alertOnUnused: true
    alertOnBloat: true
    alertOnDuplicates: true

๐Ÿ“Š Index Monitoring Queries

-- Find unused indexes
SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
    AND indexrelname NOT LIKE 'uq_%'
    AND indexrelname NOT LIKE 'pk_%'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Find duplicate indexes
SELECT
    indrelid::regclass AS table_name,
    array_agg(indexrelid::regclass) AS duplicate_indexes
FROM pg_index
GROUP BY indrelid, indkey
HAVING COUNT(*) > 1;

-- Index bloat estimation
SELECT
    schemaname || '.' || tablename AS table,
    indexname AS index,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size,
    ROUND(100 * (1 - (pg_relation_size(indexname::regclass)::float / 
        NULLIF(pg_total_relation_size(tablename::regclass), 0))), 2) AS bloat_pct
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexname::regclass) DESC;

๐Ÿค– Agent Collaboration

Agent Role
Database Engineer Agent Designs index strategy, generates DDL, monitors performance impact
Data Architect Agent Reviews index alignment with query patterns and data access paths
DevOps Engineer Agent Schedules maintenance windows, automates REINDEX operations
Test Automation Engineer Agent Runs query performance tests to validate index effectiveness

๐Ÿข Multi-Tenant Data Architecture

ConnectSoft's AI Software Factory supports multi-tenant SaaS systems by default. The Database Blueprint encodes strict controls for data partitioning, tenant-aware access, and scoped operations at the database layer.

This section defines the three primary isolation patterns, their trade-offs, and when each should be used.


๐Ÿงฉ Isolation Pattern Comparison

Pattern Isolation Level Complexity Cost Compliance Scale
Shared Schema with RLS Logical Low Low Moderate High
Schema-Per-Tenant Logical+ Medium Medium High Medium
Database-Per-Tenant Physical High High Very High Per-tenant

๐Ÿ” Pattern 1: Shared Schema with Row-Level Security (RLS)

The default and recommended pattern for most ConnectSoft services. All tenants share the same tables, with PostgreSQL RLS policies enforcing data isolation.

flowchart TD
    subgraph SharedDatabase["Shared Database"]
        subgraph Tables["Shared Tables"]
            T1["orders (all tenants)"]
            T2["customers (all tenants)"]
            T3["products (all tenants)"]
        end
        RLS["Row-Level Security Policies"]
    end

    App1["Tenant A Request"] -->|"SET app.current_tenant = 'A'"| RLS
    App2["Tenant B Request"] -->|"SET app.current_tenant = 'B'"| RLS
    RLS --> Tables
Hold "Alt" / "Option" to enable pan & zoom

SQL Implementation

-- Enable RLS on table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
ALTER TABLE orders FORCE ROW LEVEL SECURITY;

-- Create tenant isolation policy
CREATE POLICY tenant_isolation_orders ON orders
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Create tenant-scoped insert policy
CREATE POLICY tenant_insert_orders ON orders
    FOR INSERT
    WITH CHECK (tenant_id = current_setting('app.current_tenant')::uuid);

-- Application sets tenant context per connection
SET app.current_tenant = '550e8400-e29b-41d4-a716-446655440000';

YAML Blueprint Declaration

multiTenant:
  strategy: shared-schema-with-rls
  tenantColumn: tenant_id
  tenantColumnType: uuid
  enforcement: row-level-security
  connectionManagement:
    pooling: shared
    tenantContextSetting: "app.current_tenant"
    setOnConnectionOpen: true
  tables:
    - orders
    - customers
    - order_items
    - products
    - invoices
  excludedTables:
    - reference_currencies
    - reference_countries
    - system_configurations

Pros and Cons

Pros Cons
โœ… Lowest operational complexity โš ๏ธ Noisy neighbor risk on shared resources
โœ… Shared connection pool โš ๏ธ Must ensure RLS policies on every table
โœ… Easy schema migrations (one database) โš ๏ธ Harder compliance for strict data sovereignty
โœ… Cost-effective at scale โš ๏ธ Cross-tenant queries require elevated access

๐Ÿ” Pattern 2: Schema-Per-Tenant

Each tenant gets a dedicated PostgreSQL schema within a shared database. Provides stronger isolation without separate database instances.

flowchart TD
    subgraph SharedDatabase["Shared Database"]
        subgraph SchemaA["Schema: tenant_alpha"]
            TA1["orders"]
            TA2["customers"]
        end
        subgraph SchemaB["Schema: tenant_beta"]
            TB1["orders"]
            TB2["customers"]
        end
        subgraph SchemaC["Schema: tenant_gamma"]
            TC1["orders"]
            TC2["customers"]
        end
    end

    App1["Tenant Alpha"] -->|"SET search_path = 'tenant_alpha'"| SchemaA
    App2["Tenant Beta"] -->|"SET search_path = 'tenant_beta'"| SchemaB
    App3["Tenant Gamma"] -->|"SET search_path = 'tenant_gamma'"| SchemaC
Hold "Alt" / "Option" to enable pan & zoom

SQL Implementation

-- Create tenant schema
CREATE SCHEMA tenant_alpha AUTHORIZATION app_user;

-- Create tables within tenant schema
CREATE TABLE tenant_alpha.orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL REFERENCES tenant_alpha.customers(id),
    order_number VARCHAR(50) NOT NULL,
    status VARCHAR(20) NOT NULL DEFAULT 'pending',
    total_amount DECIMAL(18,2) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Tenant onboarding: clone template schema
CREATE SCHEMA tenant_new_tenant;

DO $$
DECLARE
    rec RECORD;
BEGIN
    FOR rec IN
        SELECT tablename FROM pg_tables WHERE schemaname = 'template_schema'
    LOOP
        EXECUTE format(
            'CREATE TABLE tenant_new_tenant.%I (LIKE template_schema.%I INCLUDING ALL)',
            rec.tablename, rec.tablename
        );
    END LOOP;
END $$;

YAML Blueprint Declaration

multiTenant:
  strategy: schema-per-tenant
  schemaPrefix: "tenant_"
  templateSchema: "template_schema"
  connectionManagement:
    pooling: per-schema
    searchPathSetting: true
  provisioning:
    onTenantCreated: clone-template-schema
    onTenantDeleted: archive-and-drop-schema
  migrations:
    strategy: iterate-all-schemas
    parallelDegree: 4
    failurePolicy: stop-and-alert

Pros and Cons

Pros Cons
โœ… Strong logical isolation โš ๏ธ Migrations must iterate across all schemas
โœ… Per-tenant backup/restore possible โš ๏ธ Higher connection pool overhead
โœ… Schema-level GRANT/REVOKE controls โš ๏ธ Schema proliferation at scale
โœ… Easier compliance than shared-schema โš ๏ธ Cross-tenant analytics requires schema merging

๐Ÿ” Pattern 3: Database-Per-Tenant

Each tenant gets a completely separate database instance. Maximum isolation, typically used for enterprise or regulated tenants.

flowchart TD
    Router["Connection Router"] --> DB_A["Database: tenant_alpha_db"]
    Router --> DB_B["Database: tenant_beta_db"]
    Router --> DB_C["Database: tenant_gamma_db"]

    DB_A --> RA["Full Schema + Data"]
    DB_B --> RB["Full Schema + Data"]
    DB_C --> RC["Full Schema + Data"]
Hold "Alt" / "Option" to enable pan & zoom

YAML Blueprint Declaration

multiTenant:
  strategy: database-per-tenant
  namingPattern: "{tenant_slug}_db"
  connectionManagement:
    routing: connection-string-per-tenant
    connectionStringSource: azure-key-vault
    pooling: per-database
  provisioning:
    provider: azure-database-for-postgresql
    skuPerTier:
      starter: "B_Standard_B1ms"
      professional: "GP_Standard_D2ds_v5"
      enterprise: "MO_Standard_E4ds_v5"
    onTenantCreated: provision-new-database
    onTenantDeleted: backup-and-decommission
  migrations:
    strategy: rolling-per-database
    parallelDegree: 10
    failurePolicy: continue-and-report
  backup:
    perTenantSchedule: true
    retentionDays: 90

Pros and Cons

Pros Cons
โœ… Maximum data isolation โš ๏ธ Highest operational cost
โœ… Per-tenant performance tuning โš ๏ธ Complex migration orchestration
โœ… Strongest compliance posture โš ๏ธ Connection routing complexity
โœ… Independent backup/restore/scaling โš ๏ธ Cross-tenant analytics extremely difficult
โœ… No noisy neighbor risk โš ๏ธ Resource waste for low-activity tenants

๐Ÿ“Š Decision Matrix

tenantIsolationDecision:
  defaultStrategy: shared-schema-with-rls
  overrideRules:
    - condition: "tenant.complianceLevel == 'SOC2_TYPE_II' OR tenant.dataResidency == true"
      strategy: schema-per-tenant
    - condition: "tenant.tier == 'enterprise' OR tenant.regulatoryRequirement == 'HIPAA'"
      strategy: database-per-tenant
    - condition: "tenant.expectedDataVolume > '100GB'"
      strategy: database-per-tenant
  hybridSupport:
    enabled: true
    description: "Mix strategies within the same platform based on tenant tier"

๐Ÿค– Agent Collaboration

Agent Role
Data Architect Agent Selects isolation pattern based on requirements and constraints
Database Engineer Agent Implements RLS policies, schema templates, or database provisioning
Security Architect Agent Validates isolation enforcement and compliance alignment
Infrastructure Engineer Agent Provisions database infrastructure per selected pattern
DevOps Engineer Agent Automates tenant onboarding/offboarding and migration rollouts

๐Ÿ“ˆ Query Performance Management

The Database Blueprint defines query performance expectations as first-class contract elements. Every service's data access patterns are documented, baselined, and monitored for regression.


๐ŸŽฏ Performance Budget Model

queryPerformance:
  budgets:
    - query: "GET /api/orders (list by tenant)"
      maxDurationMs: 50
      expectedRows: 100
      scanType: index-only
      index: ix_orders_tenant_status_date

    - query: "GET /api/orders/:id (single lookup)"
      maxDurationMs: 5
      expectedRows: 1
      scanType: index-scan
      index: ix_orders_order_number

    - query: "GET /api/orders/summary (aggregation)"
      maxDurationMs: 200
      expectedRows: 1
      scanType: index-only
      index: ix_orders_total_amount_covering

    - query: "POST /api/orders (insert)"
      maxDurationMs: 20
      expectedIndexUpdates: 4

    - query: "Dashboard: daily order totals"
      maxDurationMs: 500
      expectedRows: 30
      scanType: bitmap-index
      acceptsMatView: true

๐Ÿ“Š Execution Plan Baselines

The blueprint captures expected execution plan shapes for critical queries:

executionPlanBaselines:
  - queryId: "orders_by_tenant_status"
    expectedPlan:
      type: "Index Only Scan"
      index: ix_orders_tenant_status_date
      estimatedCost: 0.43
      actualRows: 100
    regressionThreshold:
      costIncrease: 200%
      scanTypeChange: true
      seqScanDetection: true

๐Ÿ” Query Performance Monitoring

queryMonitoring:
  enabled: true
  tool: pg_stat_statements
  configuration:
    trackUtility: true
    trackPlanning: true
    maxStatements: 10000
  alerts:
    - condition: "mean_exec_time > 2x baseline"
      action: alert-data-team
    - condition: "seq_scan_count > 100 per hour on indexed table"
      action: alert-db-engineer
    - condition: "lock_wait_time > 5s"
      action: alert-devops
  dashboards:
    - name: "Query Performance Overview"
      metrics:
        - mean_exec_time_ms
        - calls_per_minute
        - rows_returned_avg
        - shared_blks_hit_ratio

๐Ÿ“œ Performance Monitoring SQL

-- Top 10 slowest queries
SELECT
    queryid,
    ROUND(mean_exec_time::numeric, 2) AS avg_ms,
    calls,
    ROUND(total_exec_time::numeric, 2) AS total_ms,
    ROUND((shared_blks_hit::float / NULLIF(shared_blks_hit + shared_blks_read, 0) * 100)::numeric, 2) AS cache_hit_pct,
    LEFT(query, 100) AS query_preview
FROM pg_stat_statements
WHERE dbid = (SELECT oid FROM pg_database WHERE datname = current_database())
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Detect sequential scans on indexed tables
SELECT
    schemaname,
    relname AS table_name,
    seq_scan,
    seq_tup_read,
    idx_scan,
    idx_tup_fetch,
    ROUND(100.0 * seq_scan / NULLIF(seq_scan + idx_scan, 0), 2) AS seq_scan_pct
FROM pg_stat_user_tables
WHERE seq_scan > 100
    AND idx_scan > 0
    AND seq_scan > idx_scan
ORDER BY seq_scan DESC;

๐Ÿ” Regression Detection Pipeline

flowchart LR
    A[Query Runs in CI] --> B[Capture EXPLAIN ANALYZE]
    B --> C[Compare to Baseline]
    C --> D{Regression Detected?}
    D -->|Yes| E[Block Deployment]
    D -->|No| F[Pass Gate]
    E --> G[Alert Database Engineer Agent]
    G --> H[Optimize Query / Index]
    H --> A
    F --> I[Deploy to Production]
Hold "Alt" / "Option" to enable pan & zoom

๐Ÿค– Agent Collaboration

Agent Role
Database Engineer Agent Defines performance budgets, creates baseline plans
Backend Developer Agent Aligns data access code with expected query patterns
Test Automation Engineer Agent Runs performance regression tests in CI pipeline
DevOps Engineer Agent Configures monitoring dashboards and alert routing

๐Ÿ’พ Backup, Recovery, and Disaster Recovery

The Database Blueprint defines comprehensive backup and disaster recovery specifications for every database component. These are not optional โ€” they are contractual SLA elements that must be provisioned, tested, and monitored.


๐ŸŽฏ RPO/RTO Definitions

Metric Definition Blueprint Field
RPO Recovery Point Objective โ€” max acceptable data loss rpoMinutes
RTO Recovery Time Objective โ€” max acceptable downtime rtoMinutes

๐Ÿ“˜ Backup Strategy Blueprint

backupAndRecovery:
  engine: postgresql
  provider: azure-database-for-postgresql-flexible

  rpo:
    minutes: 15
    justification: "Business-critical order data with financial implications"

  rto:
    minutes: 60
    justification: "SLA requires 99.95% availability"

  backup:
    automated:
      type: continuous
      walArchiving: true
      fullBackup:
        schedule: "0 2 * * *"
        retention: 35d
      differentialBackup:
        schedule: "0 */6 * * *"
        retention: 7d
      transactionLogBackup:
        continuous: true
        retention: 7d

    pointInTimeRecovery:
      enabled: true
      granularity: "any point within retention window"
      maxRecoveryWindow: 35d

    storage:
      type: geo-redundant
      primaryRegion: eastus2
      secondaryRegion: westus2
      encryption: AES-256

  geoReplication:
    enabled: true
    mode: asynchronous
    readReplicas:
      - region: westus2
        purpose: read-offload
        lagThresholdSeconds: 30
      - region: northeurope
        purpose: disaster-recovery
        lagThresholdSeconds: 60

  disasterRecovery:
    strategy: geo-failover
    automaticFailover:
      enabled: true
      detectionThreshold: 60s
      gracePeridSeconds: 300
    manualFailover:
      runbookLocation: "runbooks/database/failover-procedure.md"
      notifyChannels:
        - opsgenie
        - teams
    drTesting:
      schedule: quarterly
      type: full-failover-drill
      lastTested: "2025-06-15T14:00:00Z"
      results: "runbooks/database/dr-test-results-2025-q2.md"

๐Ÿ”„ Backup Architecture Flow

flowchart TD
    subgraph Primary["Primary Region (East US 2)"]
        DB_Primary["PostgreSQL Primary"]
        WAL["WAL Archiving"]
        FullBackup["Full Backup (Daily 2AM)"]
        DiffBackup["Differential (Every 6h)"]
    end

    subgraph Secondary["Secondary Region (West US 2)"]
        DB_ReadReplica["Read Replica"]
        DB_DR["DR Standby"]
    end

    subgraph Storage["Geo-Redundant Storage"]
        BackupStore["Backup Storage (AES-256)"]
        PITR["Point-in-Time Recovery Store"]
    end

    DB_Primary --> WAL
    WAL --> BackupStore
    WAL --> PITR
    DB_Primary --> FullBackup --> BackupStore
    DB_Primary --> DiffBackup --> BackupStore
    DB_Primary -->|"Async Replication"| DB_ReadReplica
    DB_Primary -->|"Async Replication"| DB_DR

    DB_DR -->|"Failover"| DB_Primary
Hold "Alt" / "Option" to enable pan & zoom

๐Ÿ“œ Recovery Procedures

Scenario Procedure RTO Target
Accidental Data Deletion Point-in-time recovery to pre-deletion state < 30 minutes
Database Corruption Restore from latest full + WAL replay < 60 minutes
Region Outage Promote DR standby in secondary region < 15 minutes
Full Infrastructure Failure Restore from geo-redundant backup < 4 hours
Ransomware/Security Breach Restore from immutable backup + audit trail < 2 hours

๐Ÿงช DR Testing Configuration

drTesting:
  schedule:
    frequency: quarterly
    nextScheduled: "2025-09-15T14:00:00Z"
  tests:
    - type: failover-drill
      description: "Promote read replica to primary, validate application connectivity"
      acceptanceCriteria:
        - rtoMet: true
        - dataIntegrity: verified
        - applicationHealthy: true
    - type: point-in-time-recovery
      description: "Recover to specific timestamp, validate data consistency"
      acceptanceCriteria:
        - rpoMet: true
        - rowCountMatch: true
    - type: full-restore
      description: "Restore from geo-redundant backup to new instance"
      acceptanceCriteria:
        - schemaMatch: true
        - dataMatch: true
        - indexesIntact: true

๐Ÿค– Agent Collaboration

Agent Role
Infrastructure Engineer Agent Provisions backup infrastructure, configures replication
Database Engineer Agent Defines RPO/RTO targets, validates backup coverage
DevOps Engineer Agent Automates backup scheduling, monitors backup health
Security Architect Agent Ensures backup encryption and access controls
Test Automation Engineer Agent Schedules and validates DR test exercises

๐ŸŒฑ Data Seeding and Test Data

The Database Blueprint defines how seed data, reference data, and test data are managed across environments. This ensures consistent, reliable, and privacy-safe data for development, testing, and staging.


๐Ÿงฉ Seed Data Categories

Category Description Environments
Reference Data Lookup tables: currencies, countries, status codes All
System Configuration Default settings, feature flags, role definitions All
Sample Tenants Pre-configured tenant accounts for development/testing Dev, Test, Staging
Test Entities Orders, customers, products for integration tests Dev, Test
Performance Data Large-volume generated data for load testing Staging, Perf

๐Ÿ“˜ Seed Data Blueprint

dataSeeding:
  referenceData:
    source: "seed/reference_data.sql"
    idempotent: true
    appliesTo: [dev, test, staging, production]
    tables:
      - reference_currencies
      - reference_countries
      - reference_order_statuses
      - reference_payment_methods

  testData:
    source: "seed/test_data.sql"
    appliesTo: [dev, test]
    generator:
      tool: faker
      language: csharp
      config:
        tenants: 5
        customersPerTenant: 100
        ordersPerCustomer: 20
        locale: "en_US"
    cleanup:
      strategy: truncate-and-reseed
      beforeEachTestRun: true

  performanceData:
    source: generated
    appliesTo: [staging, performance]
    generator:
      tenants: 50
      customersPerTenant: 10000
      ordersPerCustomer: 500
      totalExpectedRows: 250000000
      parallelInsertDegree: 8

๐Ÿ”’ Data Masking for Non-Production

dataMasking:
  enabled: true
  appliesTo: [dev, test, staging]
  rules:
    - column: "customers.email"
      strategy: hash-and-domain
      output: "{hash}@example.com"
    - column: "customers.phone_number"
      strategy: randomize
      pattern: "+1-555-XXX-XXXX"
    - column: "customers.first_name"
      strategy: faker
      fakerMethod: "Name.FirstName"
    - column: "customers.last_name"
      strategy: faker
      fakerMethod: "Name.LastName"
    - column: "orders.total_amount"
      strategy: noise
      variance: 15%
    - column: "payment_methods.card_number"
      strategy: redact
      output: "XXXX-XXXX-XXXX-XXXX"

๐Ÿ“œ Reference Data SQL Example

-- Reference data: idempotent seed script
-- Applied to all environments including production

INSERT INTO reference_currencies (code, name, symbol, decimal_places)
VALUES
    ('USD', 'US Dollar', '$', 2),
    ('EUR', 'Euro', 'โ‚ฌ', 2),
    ('GBP', 'British Pound', 'ยฃ', 2),
    ('JPY', 'Japanese Yen', 'ยฅ', 0),
    ('CAD', 'Canadian Dollar', 'C$', 2)
ON CONFLICT (code) DO UPDATE SET
    name = EXCLUDED.name,
    symbol = EXCLUDED.symbol,
    decimal_places = EXCLUDED.decimal_places;

INSERT INTO reference_order_statuses (code, name, display_order, is_terminal)
VALUES
    ('pending', 'Pending', 1, false),
    ('confirmed', 'Confirmed', 2, false),
    ('processing', 'Processing', 3, false),
    ('shipped', 'Shipped', 4, false),
    ('delivered', 'Delivered', 5, true),
    ('cancelled', 'Cancelled', 6, true),
    ('refunded', 'Refunded', 7, true)
ON CONFLICT (code) DO UPDATE SET
    name = EXCLUDED.name,
    display_order = EXCLUDED.display_order,
    is_terminal = EXCLUDED.is_terminal;

๐Ÿค– Agent Collaboration

Agent Role
Database Engineer Agent Generates seed scripts, configures idempotent reference data
Test Automation Engineer Agent Defines test data requirements, generates faker-based data scripts
Security Architect Agent Specifies data masking rules and ensures PII protection in non-prod
Backend Developer Agent Integrates seed data with application bootstrap and test fixtures

๐Ÿ”— Cross-Blueprint Intersections

The Database Blueprint does not exist in isolation. It is deeply interconnected with other blueprints in the Factory ecosystem, creating a coherent, multi-dimensional specification for every generated component.


๐Ÿ›ก๏ธ Security Blueprint Intersection

Intersection Area Security Blueprint Contribution Database Blueprint Contribution
Encryption at Rest Defines encryption requirements and key management Implements TDE, column-level encryption
Row-Level Security Defines tenant isolation requirements Implements RLS policies and tenant context propagation
Data Masking Specifies PII/PHI classification Implements masking rules and non-prod data generation
Audit Logging Requires audit trails for compliance Implements temporal tables and change tracking
Secrets Management Defines connection string storage in Key Vault Consumes connection secrets at runtime

๐Ÿ“ฆ Infrastructure Blueprint Intersection

Intersection Area Infrastructure Blueprint Contribution Database Blueprint Contribution
Database Provisioning Provisions Azure Database for PostgreSQL, networking Defines required SKU, configuration, extensions
Replication Setup Configures read replicas, geo-replication infrastructure Defines replication topology and lag thresholds
Backup Infrastructure Provisions backup storage, configures retention policies Defines RPO/RTO targets and recovery procedures
Networking Configures VNet integration, private endpoints Specifies connection requirements and firewall rules
Monitoring Deploys monitoring agents and dashboards Defines query metrics and performance alert thresholds

๐Ÿงช Test Blueprint Intersection

Intersection Area Test Blueprint Contribution Database Blueprint Contribution
Integration Tests Defines test scenarios requiring database interaction Provides schema definitions and seed data
Performance Tests Defines load test scenarios and SLA targets Provides performance budgets and baseline query plans
Migration Tests Validates migration scripts execute correctly Provides migration files and rollback scripts
Data Integrity Tests Validates foreign key consistency and constraint enforcement Defines constraints and relationship specifications
Test Data Management Requires repeatable test data setup and teardown Provides seeding scripts and data generation rules

๐Ÿ“Š Observability Blueprint Intersection

Intersection Area Observability Blueprint Contribution Database Blueprint Contribution
Query Metrics Collects and dashboards query performance telemetry Defines expected performance baselines and budgets
Connection Pool Monitoring Monitors connection usage, pool exhaustion Defines pool sizing and connection management strategy
Replication Lag Alerts on replication lag exceeding thresholds Defines acceptable lag thresholds per replica
Backup Monitoring Monitors backup success/failure, storage usage Defines backup schedules and retention requirements
Schema Drift Detection Detects when deployed schema differs from blueprint Provides canonical schema definitions for comparison

๐Ÿ” Cross-Blueprint Flow

flowchart TD
    DB[Database Blueprint] --> SEC[Security Blueprint]
    DB --> INFRA[Infrastructure Blueprint]
    DB --> TEST[Test Blueprint]
    DB --> OBS[Observability Blueprint]
    DB --> SVC[Service Blueprint]

    SEC -->|"Encryption, RLS, Masking"| DB
    INFRA -->|"Provisioning, Replicas, Networking"| DB
    TEST -->|"Integration Tests, Seed Data"| DB
    OBS -->|"Query Metrics, Alerts"| DB
    SVC -->|"ORM Mappings, Repository Patterns"| DB
Hold "Alt" / "Option" to enable pan & zoom

๐Ÿงญ Blueprint Location, Traceability, and Versioning

A Database Blueprint is not just content โ€” it's a traceable artifact, part of a multi-agent lineage graph, and lives at a predictable location in the Factory's file and memory hierarchy.

This enables cross-agent validation, rollback, comparison, and regeneration.


๐Ÿ“ File System Location

Each blueprint is stored in a consistent location within the Factory workspace:

blueprints/database/{component-name}/database-blueprint.md
blueprints/database/{component-name}/database-blueprint.json
blueprints/database/{component-name}/migrations/
blueprints/database/{component-name}/schemas/
blueprints/database/{component-name}/indexes/
  • Markdown is human-readable and Studio-rendered.
  • JSON is parsed by orchestrators and enforcement agents.
  • Migrations folder contains version-controlled SQL scripts.
  • Schemas folder contains per-table YAML definitions.
  • Indexes folder contains index strategy YAML files.

๐Ÿง  Traceability Fields

Each blueprint includes a set of required metadata fields for trace alignment:

Field Purpose
traceId Links blueprint to full generation pipeline
agentId Records which agent(s) emitted the artifact
originPrompt Captures human-initiated signal or intent
createdAt ISO timestamp for audit
dataProfile Type of data architecture (relational, document)
engine Database engine (postgresql, sqlserver, etc.)
tenantStrategy Multi-tenant isolation pattern
migrationVersion Current migration version

These fields ensure full trace and observability for regeneration, validation, and compliance review.


๐Ÿ” Versioning and Mutation Tracking

Mechanism Purpose
v1, v2, ... Manual or automatic version bumping by agents
diff-link: metadata References upstream and downstream changes
GitOps snapshot tags Bind blueprint versions to commit hashes or releases
Migration versioning Each schema change is a numbered, traceable migration
Drift monitors Alert if deployed schema deviates from blueprint

๐Ÿ“œ Mutation History Example

metadata:
  traceId: "trc_82db_OrderService_v3"
  agentId: "db-engineer-001"
  originPrompt: "Add multi-currency support to order processing"
  createdAt: "2025-07-15T09:30:00Z"
  version: "v3"
  diffFrom: "v2"
  changedFields:
    - "tables.orders.columns.currency_code"
    - "tables.orders.columns.exchange_rate"
    - "indexes.ix_orders_currency"
    - "migrations.V3.2.0__add_currency_support"
  migrationVersion: "3.2.0"
  engine: "postgresql"
  tenantStrategy: "shared-schema-with-rls"

These mechanisms ensure that database design is not an afterthought, but a tracked, versioned, observable system artifact.


๐Ÿš€ Database CI/CD Integration

The Database Blueprint defines how database changes flow through the CI/CD pipeline โ€” from schema validation to migration deployment to performance regression testing.


๐Ÿ”„ Pipeline Architecture

flowchart LR
    subgraph CI["Continuous Integration"]
        A[PR Created] --> B[Schema Lint]
        B --> C[Migration Validation]
        C --> D[Seed Data Test]
        D --> E[Integration Tests]
        E --> F[Performance Regression]
        F --> G{All Gates Pass?}
    end

    subgraph CD["Continuous Deployment"]
        G -->|Yes| H[Deploy to Staging]
        H --> I[Run Migrations]
        I --> J[Smoke Tests]
        J --> K[Performance Validation]
        K --> L{Staging OK?}
        L -->|Yes| M[Deploy to Production]
        M --> N[Run Migrations]
        N --> O[Health Check]
        O --> P[Monitor & Alert]
    end

    G -->|No| Q[Block & Notify]
    L -->|No| R[Rollback Staging]
Hold "Alt" / "Option" to enable pan & zoom

๐Ÿ“˜ Pipeline Stage Definitions

databasePipeline:
  ci:
    stages:
      - name: schema-lint
        description: "Validate schema YAML against naming conventions and rules"
        tool: custom-schema-linter
        failOn: error
        config:
          rules: "blueprints/database/validation-rules.yaml"

      - name: migration-validation
        description: "Validate migration scripts are syntactically correct and idempotent"
        tool: flyway-validate
        failOn: error
        config:
          locations: "blueprints/database/{component}/migrations/"

      - name: migration-dry-run
        description: "Execute migrations against ephemeral test database"
        tool: flyway-migrate
        failOn: error
        config:
          target: ephemeral-postgres
          cleanOnStart: true
          validateOnMigrate: true

      - name: seed-data-test
        description: "Apply seed data and validate reference data integrity"
        tool: custom-seed-runner
        failOn: error

      - name: integration-tests
        description: "Run database integration tests"
        tool: dotnet-test
        failOn: error
        config:
          filter: "Category=DatabaseIntegration"
          database: ephemeral-postgres

      - name: performance-regression
        description: "Compare query execution plans against baselines"
        tool: custom-perf-gate
        failOn: warning
        config:
          baselines: "blueprints/database/{component}/performance-baselines.yaml"
          regressionThreshold: 200%

  cd:
    stages:
      - name: deploy-migrations
        description: "Apply pending migrations to target environment"
        tool: flyway-migrate
        config:
          outOfOrder: false
          validateOnMigrate: true
          baselineOnMigrate: false

      - name: verify-schema
        description: "Compare deployed schema against blueprint definition"
        tool: custom-schema-diff
        failOn: error

      - name: health-check
        description: "Validate database connectivity and basic query functionality"
        tool: custom-health-check
        config:
          timeout: 30s
          queries:
            - "SELECT 1"
            - "SELECT count(*) FROM reference_currencies"

๐Ÿ” Rollback Strategy

rollback:
  strategy: migration-rollback
  config:
    automaticRollback:
      onMigrationFailure: true
      onHealthCheckFailure: true
      onPerformanceRegression: false
    manualRollback:
      approvalRequired: true
      approvers: ["data-architect", "devops-lead"]
    rollbackScripts:
      required: true
      location: "blueprints/database/{component}/migrations/"
      namingPattern: "V{version}__rollback.sql"

๐Ÿงช Schema Validation Gate Example

schemaValidationGate:
  name: "Database Schema Validation"
  trigger: pull-request
  checks:
    - name: naming-conventions
      description: "All tables, columns, and indexes follow naming standards"
      severity: error
    - name: required-columns
      description: "All business tables have tenant_id, created_at, updated_at, is_deleted"
      severity: error
    - name: foreign-key-indexes
      description: "All foreign key columns have supporting indexes"
      severity: error
    - name: migration-reversibility
      description: "Every migration has a corresponding rollback script"
      severity: warning
    - name: zero-downtime-compliance
      description: "No prohibited DDL operations in migration"
      severity: error
    - name: performance-budget
      description: "New queries have documented performance budgets"
      severity: warning

๐Ÿค– Agent Collaboration

Agent Role
DevOps Engineer Agent Designs pipeline stages, configures deployment automation
Database Engineer Agent Creates validation rules, defines performance baselines
Test Automation Engineer Agent Implements integration tests and performance regression tests
Backend Developer Agent Ensures application code is compatible with migration changes
Infrastructure Engineer Agent Provisions ephemeral databases for CI and manages environment configurations

๐Ÿง  Memory Graph Representation

The Database Blueprint is embedded in the ConnectSoft Memory Graph โ€” a semantic, vector-indexed knowledge graph that enables AI agents to reason about data architecture across the entire platform.


๐Ÿ”— Memory Node Structure

memoryNode:
  type: database-blueprint
  id: "blueprint:database:order-service:v3"
  traceId: "trc_82db_OrderService_v3"
  attributes:
    engine: postgresql
    tenantStrategy: shared-schema-with-rls
    migrationVersion: "3.2.0"
    tables: ["orders", "customers", "order_items", "products"]
    indexCount: 12
    rlsEnabled: true
    backupRpo: 15
    backupRto: 60
  links:
    - type: belongs-to
      target: "service:order-service"
    - type: secured-by
      target: "blueprint:security:order-service:v2"
    - type: provisioned-by
      target: "blueprint:infrastructure:order-service:v4"
    - type: tested-by
      target: "blueprint:test:order-service:v5"
    - type: observed-by
      target: "blueprint:observability:order-service:v3"
    - type: depends-on
      target: "blueprint:database:customer-service:v2"
  embedding:
    model: text-embedding-ada-002
    dimensions: 1536
    indexed: true
    concepts:
      - postgresql
      - multi-tenant
      - row-level-security
      - migration
      - index-optimization
      - backup-recovery

๐Ÿงฉ Graph Relationships

graph TD
    DB_BP["Database Blueprint\n(order-service v3)"] --> SVC["Service Blueprint\n(order-service)"]
    DB_BP --> SEC_BP["Security Blueprint\n(order-service v2)"]
    DB_BP --> INFRA_BP["Infrastructure Blueprint\n(order-service v4)"]
    DB_BP --> TEST_BP["Test Blueprint\n(order-service v5)"]
    DB_BP --> OBS_BP["Observability Blueprint\n(order-service v3)"]
    DB_BP --> CUST_DB["Database Blueprint\n(customer-service v2)"]

    DB_BP --> MIG["Migration: V3.2.0"]
    DB_BP --> IDX["Index Strategy"]
    DB_BP --> RLS["RLS Policies"]
    DB_BP --> BACKUP["Backup Config"]

    MIG --> TRACE["Trace: trc_82db"]
    IDX --> TRACE
    RLS --> TRACE
Hold "Alt" / "Option" to enable pan & zoom

๐Ÿง  AI Agent Reasoning

The memory graph enables agents to:

  • ๐Ÿ” Find related blueprints โ€” "What security policies apply to the orders database?"
  • ๐Ÿ”„ Detect drift โ€” "Has the deployed schema diverged from the blueprint?"
  • ๐Ÿงช Generate tests โ€” "What integration tests are needed for the new migration?"
  • ๐Ÿ“Š Optimize performance โ€” "Which indexes are unused across all services?"
  • ๐Ÿ”— Trace lineage โ€” "What prompt created this table? Which agent last modified it?"

๐Ÿ“Š Memory Query Examples

memoryQueries:
  - query: "Find all database blueprints using RLS"
    filter:
      type: database-blueprint
      attributes.rlsEnabled: true
    returns: [id, traceId, attributes.tables]

  - query: "Find blueprints with RPO > 30 minutes"
    filter:
      type: database-blueprint
      attributes.backupRpo: { $gt: 30 }
    returns: [id, attributes.engine, attributes.backupRpo]

  - query: "Find cross-service database dependencies"
    filter:
      type: database-blueprint
      links.type: depends-on
    returns: [id, links.target]

๐Ÿ Final Summary

The Database Blueprint is a comprehensive, agent-generated artifact that serves as the data persistence contract of record for every component in the ConnectSoft AI Software Factory.


๐Ÿ“Š Blueprint Coverage Summary

Dimension Coverage
Schema Design Tables, columns, types, constraints, naming conventions, normalization
Entity Relationships Foreign keys, aggregate boundaries, reference data, ER diagrams
Migrations Version-controlled, zero-downtime, rollback-safe, CI/CD integrated
Index Strategy B-Tree, GIN, GiST, BRIN, partial, covering, composite, maintenance
Multi-Tenant Isolation Shared-schema + RLS, schema-per-tenant, database-per-tenant, hybrid
Query Performance Budgets, baselines, execution plans, regression detection
Backup & Recovery RPO/RTO targets, continuous backup, PITR, geo-replication, DR testing
Data Seeding Reference data, test data generation, data masking, faker integration
CI/CD Integration Schema linting, migration validation, performance gates, rollback
Cross-Blueprint Links Security, Infrastructure, Test, Observability, Service blueprints
Memory Graph Vectorized, concept-indexed, trace-linked, agent-queryable

๐Ÿค– Agent Participation Summary

Agent Primary Responsibilities
Database Engineer Agent Schema design, migration generation, index optimization, query tuning
Data Architect Agent Data modeling, entity relationships, bounded contexts, normalization
Backend Developer Agent ORM configuration, data access layers, repository patterns
Infrastructure Engineer Agent Database provisioning, replication, networking, backup infrastructure
Security Architect Agent RLS policies, encryption, data masking, audit trails
DevOps Engineer Agent Migration pipelines, CI/CD gates, monitoring, backup automation
Test Automation Engineer Agent Integration tests, performance tests, seed data, DR drills

๐Ÿ“ Artifact Output Summary

Format Purpose Location
database-blueprint.md Human-readable specification blueprints/database/{component}/
database-blueprint.json Machine-readable schema definition blueprints/database/{component}/
*.sql Migration scripts (up and rollback) blueprints/database/{component}/migrations/
*.yaml Schema and index definitions blueprints/database/{component}/schemas/
Embedding Vector-indexed memory node Memory graph

๐Ÿง  Key Design Principles

  • Agent-First: Every schema, migration, and index is generated and validated by specialized agents
  • Trace-Linked: Every artifact carries a traceId linking it to the full generation pipeline
  • Zero-Downtime: All production migrations follow expand-contract or additive-only patterns
  • Multi-Tenant Native: Tenant isolation is not a feature โ€” it's a systemic guarantee
  • Performance-Budgeted: Every query has an expected execution plan and regression threshold
  • Disaster-Ready: Backup and recovery are contractual SLA elements, not optional configurations
  • Memory-Integrated: The blueprint lives in the semantic memory graph, enabling AI reasoning

๐Ÿ—„๏ธ The Database Blueprint transforms database design from a manual, error-prone activity into a deterministic, traceable, agent-orchestrated discipline โ€” ensuring every service has a reliable, performant, and secure data foundation.