๐๏ธ Database Blueprint¶
๐ What Is a Database Blueprint?¶
A Database Blueprint is a structured, agent-generated artifact that defines the database schema design, migration strategies, query optimization plans, and multi-tenant data isolation for a ConnectSoft-generated component โ whether it's a microservice, module, bounded context, or shared data platform.
It represents the data persistence contract of record, created during the generation pipeline and continuously evaluated by downstream data engineering agents, DevOps pipelines, and runtime observability systems.
In the AI Software Factory, the Database Blueprint is not just a schema dump โ it's a machine-readable, agent-enforceable contract for how data is structured, accessed, protected, migrated, and scaled.
๐ง Blueprint Roles in the Factory¶
The Database Blueprint plays a pivotal role in making data persistence composable, observable, and resilient:
- ๐๏ธ Defines table structures, relationships, constraints, and index strategies
- ๐ Manages version-controlled migrations with rollback plans and zero-downtime strategies
- ๐ง Enables AI agents to enforce schema standards, naming conventions, and performance budgets
- ๐ข Encodes multi-tenant data isolation patterns โ RLS, schema-per-tenant, or database-per-tenant
- ๐ Drives query performance baselines, index optimization, and execution plan monitoring
- ๐พ Specifies backup schedules, RPO/RTO targets, and disaster recovery procedures
It ensures that data architecture is not an afterthought, but a first-class agent responsibility in the generation pipeline.
๐ฏ Purpose and Motivation¶
The Database Blueprint exists to solve one of the most persistent and costly problems in modern software delivery:
"Database schemas are inconsistently designed, migrations are fragile, queries degrade silently, and tenant data boundaries are poorly enforced."
In the ConnectSoft AI Software Factory, data persistence is integrated at the blueprint level, making it:
- โ Deterministic (agent-generated, based on traceable domain models)
- โ Repeatable (diffable migrations validated through CI/CD)
- โ Observable (integrated with query metrics, drift detection, and performance budgets)
- โ Composable (aligned with service, security, and infrastructure blueprints)
๐จ Problems It Solves¶
| Problem Area | How the Database Blueprint Helps |
|---|---|
| ๐งฉ Inconsistent Schema Designs | Enforces naming conventions, type standards, and constraint patterns |
| ๐ Unmanaged Migrations | Generates version-controlled, rollback-safe migrations with zero-downtime |
| ๐ Poor Query Performance | Defines performance budgets, index strategies, and execution plan baselines |
| ๐ข Tenant Data Leakage | Enforces RLS policies, schema isolation, or database-per-tenant partitioning |
| ๐ No Index Optimization Strategy | Specifies covering indexes, composite indexes, filtered indexes, and maintenance |
| ๐พ Backup/Recovery Gaps | Defines RPO/RTO targets, backup schedules, and geo-replication strategies |
| ๐งฌ Schema Drift Across Environments | Tracks blueprint mutations and configuration drift over time |
| ๐ Opaque Data Contracts | Makes data expectations visible to other agents and orchestrators |
๐ง Why Blueprints, Not Just Migrations?¶
While traditional environments rely on ad hoc migration scripts or ORM-generated schemas, the Factory approach uses blueprints because:
- Blueprints are memory-linked to every module, entity, and trace ID
- They are machine-generated and human-readable
- They support forward/backward analysis across schema versions and data model changes
- They coordinate multiple agents across Data Architecture, Backend Dev, Infra, Security, and DevOps
- They encode intent โ not just the final DDL โ enabling intelligent regeneration and optimization
This allows database design to be treated as code โ but also as a living architectural asset.
๐งโ๐คโ๐ง Blueprint Consumers and Usage¶
The Database Blueprint is not an isolated artifact. It is actively consumed across the ConnectSoft AI Software Factory by agents, humans, and CI systems to design, implement, validate, and operate data persistence layers.
Each consumer interprets the blueprint differently based on its context, but all share a common source of truth.
๐ค Agent Consumers¶
| Agent | Role in Consumption |
|---|---|
๐๏ธ Database Engineer Agent |
Designs schemas, generates migrations, optimizes indexes, tunes query performance |
๐ง Data Architect Agent |
High-level data modeling, entity relationships, bounded context alignment |
๐ป Backend Developer Agent |
Implements data access layers, ORM configuration, repository patterns |
๐ฆ Infrastructure Engineer Agent |
Provisions database infrastructure, configures replicas and failover |
๐ก๏ธ Security Architect Agent |
Defines row-level security, encryption at rest, data masking policies |
๐ DevOps Engineer Agent |
Manages database CI/CD pipelines, migration deployment automation |
๐งช Test Automation Engineer Agent |
Database integration tests, data seeding, test data generation |
๐ค Human Stakeholders¶
| Role | Value Derived from Database Blueprint |
|---|---|
| Data Architect | Validates data models, reviews entity relationships and bounded context boundaries |
| Backend Developer | Understands table structures, relationships, and ORM mapping expectations |
| DBA | Reviews index strategies, query plans, replication, and backup configurations |
| DevOps Engineer | Applies migration pipelines, ensures database provisioning aligns across environments |
| QA Lead | Confirms test data seeding strategies and integration test coverage |
| Security Officer | Verifies encryption, RLS policies, and data masking for compliance |
๐ง Machine Consumers¶
- GitOps systems: verify drift in schema definitions and migration state
- CI/CD pipelines: run migration validation gates and performance regression tests
- ORM frameworks: consume schema metadata for code generation and mapping
- Monitoring systems: compare query performance against declared baselines
- Memory Indexing Layer: links blueprint data assertions to downstream agents and alerts
The Database Blueprint becomes a persistence contract, providing guarantees that must be respected by services, infrastructure, CI/CD, and security processes alike.
๐ง Agent-Created, Trace-Ready Artifact¶
In the ConnectSoft AI Software Factory, the Database Blueprint is not written manually โ it is generated, enriched, and validated by multiple agents, then stored as part of the system's memory graph.
This ensures every data persistence contract is:
- ๐ Traceable to its origin domain model or product feature
- ๐ Regenerable with context-aware mutation
- ๐ Auditable through observability-first design
- ๐ง Embedded into the long-term agentic memory system
๐ค Agents Involved in Creation¶
| Agent | Responsibility |
|---|---|
๐ง Data Architect Agent |
Designs high-level data models, defines aggregate boundaries and relationships |
๐๏ธ Database Engineer Agent |
Translates models into schemas, indexes, constraints, and migration scripts |
๐ป Backend Developer Agent |
Aligns ORM mappings and repository patterns with schema definitions |
๐ก๏ธ Security Architect Agent |
Injects RLS policies, encryption specs, and data masking rules |
๐ฆ Infrastructure Engineer Agent |
Provisions database servers, replicas, and backup infrastructure |
๐ DevOps Engineer Agent |
Integrates migration pipelines and schema validation into CI/CD |
Each agent contributes signals, decisions, and enriched metadata to create a complete, executable artifact.
๐ Memory Traceability¶
Database Blueprints are:
- ๐ Linked to the project-wide trace ID
- ๐ Associated with the microservice, module, or bounded context
- ๐ง Indexed in vector memory for AI reasoning and enforcement
- ๐ Versioned and tagged (
v1,approved,migrated,drifted, etc.)
This makes the blueprint machine-auditable, AI-searchable, and human-explainable.
๐ Example Storage and Trace Metadata¶
traceId: trc_82db_OrderService_v3
agentId: db-engineer-001
serviceName: OrderService
dataProfile: relational-multi-tenant
tags:
- postgresql
- rls
- migrations
- indexes
- multi-tenant
version: v3
state: approved
createdAt: "2025-07-15T09:30:00Z"
diffFrom: v2
changedFields:
- schemas.orders.columns
- indexes.orders_tenant_date_idx
- migrations.v3_add_order_status
๐ฆ What It Captures¶
The Database Blueprint encodes a comprehensive set of data persistence dimensions that affect a service or module throughout its lifecycle โ from initial design to production runtime.
It defines what data structures exist, how they relate, how they evolve, and how they perform โ making it a living contract between the generated component and its data layer.
๐๏ธ Core Database Elements Captured¶
| Category | Captured Details |
|---|---|
| Schema Definitions | Tables, columns, data types, constraints, defaults, computed columns |
| Entity-Relationship Models | ER diagrams, aggregate boundaries, reference data, foreign key graphs |
| Migration Strategies | Version-controlled migrations, rollback plans, zero-downtime DDL changes |
| Index Optimization Plans | Covering indexes, composite indexes, filtered indexes, index maintenance windows |
| Multi-Tenant Data Isolation | Schema-per-tenant, RLS, shared-schema with tenant ID, database-per-tenant |
| Query Performance Baselines | Expected query patterns, execution plan targets, performance budgets |
| Backup and Recovery Specs | RPO/RTO targets, backup schedules, point-in-time recovery, geo-replication |
| Replication and Sharding | Read replicas, horizontal sharding strategies, data partitioning schemes |
| Data Seeding | Seed data definitions, test data generation rules, data masking for non-prod |
| Temporal Data | Temporal tables, audit trails, soft deletes, history tracking |
๐ Blueprint Snippet (Example)¶
schema:
name: OrderService
engine: postgresql
version: "16"
tables:
- name: orders
schema: public
columns:
- name: id
type: uuid
primaryKey: true
default: gen_random_uuid()
- name: tenant_id
type: uuid
nullable: false
index: true
- name: customer_id
type: uuid
nullable: false
foreignKey:
table: customers
column: id
onDelete: RESTRICT
- name: order_number
type: varchar(50)
nullable: false
unique: true
- name: status
type: varchar(20)
nullable: false
default: "'pending'"
check: "status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')"
- name: total_amount
type: decimal(18,2)
nullable: false
- name: currency_code
type: char(3)
nullable: false
default: "'USD'"
- name: created_at
type: timestamptz
nullable: false
default: now()
- name: updated_at
type: timestamptz
nullable: false
default: now()
- name: is_deleted
type: boolean
nullable: false
default: "false"
indexes:
- name: ix_orders_tenant_id
columns: [tenant_id]
- name: ix_orders_customer_id
columns: [tenant_id, customer_id]
- name: ix_orders_status_created
columns: [tenant_id, status, created_at]
where: "is_deleted = false"
rowLevelSecurity:
enabled: true
policy: "tenant_id = current_setting('app.current_tenant')::uuid"
multiTenant:
strategy: shared-schema-with-rls
tenantColumn: tenant_id
enforcement: row-level-security
migrations:
tool: flyway
baselineVersion: "1.0.0"
currentVersion: "3.2.0"
backupStrategy:
schedule: "0 2 * * *"
retention: 30d
pointInTimeRecovery: true
rpoMinutes: 15
rtoMinutes: 60
๐ง Cross-Blueprint Intersections (Overview)¶
- Security Blueprint โ defines encryption at rest, RLS policies, data masking
- Infrastructure Blueprint โ defines database provisioning, replicas, networking
- Service Blueprint โ defines ORM mappings, connection management, data access patterns
- Test Blueprint โ defines integration tests, seed data, and performance test baselines
- Observability Blueprint โ defines query metrics, slow query alerts, connection pool monitoring
The Database Blueprint aggregates, links, and applies data persistence rules across all of these โ ensuring coherence and alignment.
๐๏ธ Output Formats¶
The Database Blueprint is generated and consumed across multiple layers of the AI Software Factory โ from human-readable design reviews to machine-enforced CI/CD migration gates.
To support both automation and collaboration, it is produced in multiple coordinated formats, each aligned with a different set of use cases.
๐ Human-Readable Markdown (.md)¶
Used in Studio, code reviews, data architecture reviews, and documentation layers.
- Sectioned by category: schemas, migrations, indexes, multi-tenancy, backup
- Rich formatting with ER diagrams and Mermaid visuals
- Includes YAML/SQL code samples and policy excerpts
- Links to upstream and downstream blueprints
๐ Machine-Readable JSON (.json)¶
Used by agents, pipelines, and enforcement scripts.
- Flattened and typed schema definitions
- Includes metadata and trace headers
- Validated against a shared schema specification
- Compatible with ORM code generators and migration tools
Example excerpt:
{
"traceId": "trc_82db_OrderService_v3",
"engine": "postgresql",
"version": "16",
"schemas": {
"public": {
"tables": {
"orders": {
"columns": {
"id": { "type": "uuid", "primaryKey": true },
"tenant_id": { "type": "uuid", "nullable": false },
"customer_id": { "type": "uuid", "nullable": false },
"order_number": { "type": "varchar(50)", "unique": true },
"status": { "type": "varchar(20)", "default": "pending" },
"total_amount": { "type": "decimal(18,2)" },
"created_at": { "type": "timestamptz", "default": "now()" }
},
"indexes": [
{ "name": "ix_orders_tenant_id", "columns": ["tenant_id"] },
{ "name": "ix_orders_status_created", "columns": ["tenant_id", "status", "created_at"], "filtered": true }
],
"rls": {
"enabled": true,
"tenantColumn": "tenant_id"
}
}
}
}
},
"migrations": {
"tool": "flyway",
"currentVersion": "3.2.0"
}
}
๐ SQL Migration Files (.sql)¶
Direct migration scripts generated from blueprint changes.
- Version-numbered and idempotent
- Include both
upanddownscripts - Validated against zero-downtime migration rules
๐ CI/CD Compatible YAML (.yaml)¶
Used to configure database pipelines, backup schedules, and monitoring hooks.
- Migration deployment steps
- Schema validation gate configurations
- Performance regression test definitions
๐ง Embedded Memory Shape (Vectorized)¶
- Captured in agent long-term memory
- Indexed by concept (e.g.,
rls,migration,index-strategy,backup) - Linked to all agent discussions, generations, and validations
- Enables trace-based enforcement and reuse
๐ Storage Convention¶
Each Database Blueprint is stored in a consistent, discoverable location within the Factory workspace:
blueprints/database/{component-name}/database-blueprint.md
blueprints/database/{component-name}/database-blueprint.json
blueprints/database/{component-name}/migrations/
blueprints/database/{component-name}/schemas/
blueprints/database/{component-name}/indexes/
๐ Directory Structure Example¶
blueprints/database/order-service/
โโโ database-blueprint.md # Human-readable specification
โโโ database-blueprint.json # Machine-readable schema definition
โโโ migrations/
โ โโโ V1.0.0__initial_schema.sql
โ โโโ V1.1.0__add_order_status.sql
โ โโโ V2.0.0__add_rls_policies.sql
โ โโโ V3.0.0__add_audit_columns.sql
โ โโโ V3.1.0__add_composite_index.sql
โ โโโ V3.2.0__add_currency_support.sql
โโโ schemas/
โ โโโ orders.yaml
โ โโโ customers.yaml
โ โโโ order_items.yaml
โ โโโ reference_data.yaml
โโโ indexes/
โ โโโ orders_indexes.yaml
โ โโโ customers_indexes.yaml
โ โโโ index_maintenance.yaml
โโโ seed/
โโโ reference_data.sql
โโโ test_tenants.sql
โโโ test_orders.sql
- Markdown is human-readable and Studio-rendered.
- JSON is parsed by orchestrators and enforcement agents.
- SQL migration files are executed by Flyway, Liquibase, or EF Core Migrations.
- YAML schema/index files are consumed by agents for validation and regeneration.
๐๏ธ Schema Design Principles¶
The Database Blueprint enforces a rigorous set of schema design principles that ensure consistency, maintainability, and performance across all generated components.
These principles are not guidelines โ they are agent-enforced constraints embedded in every schema definition.
๐ Naming Conventions¶
| Element | Convention | Example |
|---|---|---|
| Tables | snake_case, plural |
orders, order_items |
| Columns | snake_case, descriptive |
created_at, total_amount |
| Primary Keys | id (UUID preferred) |
id uuid PRIMARY KEY |
| Foreign Keys | {referenced_table}_id |
customer_id, tenant_id |
| Indexes | ix_{table}_{columns} |
ix_orders_tenant_id |
| Unique Constraints | uq_{table}_{columns} |
uq_orders_order_number |
| Check Constraints | ck_{table}_{description} |
ck_orders_status_valid |
| Sequences | seq_{table}_{column} |
seq_orders_order_number |
๐ Normalization Strategy¶
The blueprint prescribes Third Normal Form (3NF) as the default, with explicit, documented exceptions for denormalization:
schemaDesign:
defaultNormalization: 3NF
denormalizationPolicy:
allowed: true
requiresJustification: true
commonPatterns:
- pattern: materialized-view
useCase: "Read-heavy dashboards and reporting queries"
- pattern: json-column
useCase: "Flexible metadata or configuration storage"
- pattern: computed-column
useCase: "Derived values that avoid expensive JOINs"
- pattern: denormalized-aggregate
useCase: "Pre-computed totals for high-frequency reads"
๐ Temporal Tables and Audit Trails¶
Every table that requires audit history is configured with temporal tracking:
temporalTables:
enabled: true
strategy: system-versioned
tables:
- name: orders
historyTable: orders_history
periodColumns:
start: valid_from
end: valid_to
- name: customers
historyTable: customers_history
periodColumns:
start: valid_from
end: valid_to
Equivalent SQL (PostgreSQL with temporal extension):
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
customer_id UUID NOT NULL REFERENCES customers(id),
order_number VARCHAR(50) NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
total_amount DECIMAL(18,2) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
is_deleted BOOLEAN NOT NULL DEFAULT false,
valid_from TIMESTAMPTZ NOT NULL DEFAULT now(),
valid_to TIMESTAMPTZ NOT NULL DEFAULT 'infinity'
);
CREATE TABLE orders_history (LIKE orders INCLUDING ALL);
๐๏ธ Soft Delete Strategy¶
All entities use soft deletes by default, with hard delete reserved for compliance-driven data purging:
softDelete:
enabled: true
column: is_deleted
type: boolean
default: false
globalFilter: true
hardDeletePolicy:
allowed: true
requiresApproval: true
useCases:
- gdpr-right-to-erasure
- data-retention-expiry
๐ Blueprint Schema Validation Rules¶
schemaValidation:
rules:
- rule: all-tables-must-have-primary-key
severity: error
- rule: all-foreign-keys-must-have-index
severity: error
- rule: no-nullable-foreign-keys
severity: warning
- rule: all-tables-must-have-created-at
severity: error
- rule: all-tables-must-have-updated-at
severity: error
- rule: varchar-must-have-max-length
severity: error
- rule: decimal-must-have-precision
severity: error
- rule: tenant-column-required-for-business-tables
severity: error
- rule: soft-delete-column-required
severity: warning
๐ค Agent Collaboration¶
| Agent | Role |
|---|---|
Data Architect Agent |
Defines naming conventions, normalization rules, and domain models |
Database Engineer Agent |
Validates schemas against rules, generates DDL, applies constraints |
Backend Developer Agent |
Ensures ORM entities align with schema definitions |
Security Architect Agent |
Verifies sensitive columns are encrypted or masked |
๐ Migration Lifecycle¶
Migrations are the mechanism by which schema changes are applied to database environments. The Database Blueprint treats migrations as first-class, version-controlled, agent-generated artifacts with full traceability and rollback capabilities.
๐ Migration Lifecycle Flow¶
flowchart TD
A[Schema Change Request] --> B[Data Architect Agent]
B --> C[Database Engineer Agent]
C --> D[Generate Migration Script]
D --> E[Validate Against Rules]
E --> F{Passes Validation?}
F -->|Yes| G[Generate Rollback Script]
F -->|No| H[Return to Engineer Agent]
H --> C
G --> I[Test in Staging]
I --> J{Performance OK?}
J -->|Yes| K[Add to Migration Pipeline]
J -->|No| L[Optimize & Retry]
L --> D
K --> M[Deploy to Production]
M --> N[Verify & Update Blueprint]
N --> O[Update Memory Graph]
๐งฉ Migration Types¶
| Migration Type | Description | Example |
|---|---|---|
| Schema Migration | DDL changes: CREATE, ALTER, DROP | Add new table, modify column type |
| Data Migration | DML changes: INSERT, UPDATE, DELETE | Backfill computed column values |
| Index Migration | CREATE INDEX, DROP INDEX, REINDEX | Add covering index for new query |
| Security Migration | RLS policies, GRANT/REVOKE, encryption | Enable RLS on new table |
| Reference Data | Seed or update lookup/reference tables | Add new order status values |
| Rollback Migration | Reverse a previously applied migration | Undo column rename |
๐ Migration Naming Convention¶
Examples:
V1.0.0__initial_schema.sql
V1.1.0__add_order_items_table.sql
V1.2.0__add_rls_policies.sql
V2.0.0__add_audit_columns.sql
V2.0.0__rollback.sql
๐ Migration Script Example (Up)¶
-- V3.2.0__add_currency_support.sql
-- Migration: Add multi-currency support to orders table
-- Agent: db-engineer-001
-- TraceId: trc_82db_OrderService_v3
-- Zero-Downtime: Yes (additive only)
BEGIN;
ALTER TABLE orders
ADD COLUMN currency_code CHAR(3) NOT NULL DEFAULT 'USD';
ALTER TABLE orders
ADD COLUMN exchange_rate DECIMAL(12,6) DEFAULT 1.000000;
ALTER TABLE orders
ADD CONSTRAINT ck_orders_currency_valid
CHECK (currency_code ~ '^[A-Z]{3}$');
CREATE INDEX CONCURRENTLY ix_orders_currency
ON orders (tenant_id, currency_code)
WHERE is_deleted = false;
COMMENT ON COLUMN orders.currency_code IS 'ISO 4217 currency code';
COMMENT ON COLUMN orders.exchange_rate IS 'Exchange rate to base currency at time of order';
COMMIT;
๐ Migration Script Example (Down / Rollback)¶
-- V3.2.0__rollback.sql
-- Rollback: Remove multi-currency support from orders table
BEGIN;
DROP INDEX CONCURRENTLY IF EXISTS ix_orders_currency;
ALTER TABLE orders
DROP CONSTRAINT IF EXISTS ck_orders_currency_valid;
ALTER TABLE orders
DROP COLUMN IF EXISTS exchange_rate;
ALTER TABLE orders
DROP COLUMN IF EXISTS currency_code;
COMMIT;
๐ Zero-Downtime Migration Patterns¶
The blueprint enforces zero-downtime migration strategies for production deployments:
| Pattern | Description | When to Use |
|---|---|---|
| Expand-Contract | Add new โ migrate data โ remove old | Column renames, type changes |
| Additive Only | Only add columns/tables, never remove in same release | New features |
| Concurrent Index | CREATE INDEX CONCURRENTLY to avoid table locks |
All index additions |
| Background Data Migration | Async backfill with batched updates | Large data transformations |
| Feature Flag Guard | Code reads old and new columns, flag controls which is authoritative | Breaking schema changes |
๐ Zero-Downtime YAML Configuration¶
migrationPolicy:
zeroDowtime: true
patterns:
allowedOperations:
- ADD_COLUMN
- ADD_INDEX_CONCURRENTLY
- ADD_CONSTRAINT_NOT_VALID
- CREATE_TABLE
- ADD_RLS_POLICY
prohibitedOperations:
- DROP_COLUMN_IMMEDIATE
- ALTER_COLUMN_TYPE_DIRECT
- DROP_TABLE_WITH_DATA
- ADD_NOT_NULL_WITHOUT_DEFAULT
expandContract:
enabled: true
maxPhases: 3
phaseDurationDays: 7
validation:
requireRollbackScript: true
requirePerformanceTest: true
requireZeroDowntimeProof: true
๐ค Agent Collaboration¶
| Agent | Role |
|---|---|
Database Engineer Agent |
Generates migration scripts, validates zero-downtime compliance |
Data Architect Agent |
Approves schema-level changes that affect domain model boundaries |
DevOps Engineer Agent |
Integrates migrations into deployment pipelines |
Test Automation Engineer Agent |
Runs migration tests against staging databases |
Backend Developer Agent |
Updates ORM mappings to align with migrated schemas |
๐ Index Strategy and Optimization¶
Indexes are the primary mechanism for query acceleration in relational databases. The Database Blueprint defines a comprehensive index strategy that balances read performance, write overhead, and storage costs.
๐งฉ Index Types and Selection¶
| Index Type | Description | Use Case |
|---|---|---|
| B-Tree | Default balanced tree index | Equality and range queries |
| Hash | Hash-based lookup | Exact equality only (rare in PostgreSQL) |
| GIN | Generalized Inverted Index | Full-text search, JSONB containment |
| GiST | Generalized Search Tree | Geometric, range, and proximity queries |
| BRIN | Block Range Index | Large tables with naturally ordered data |
| Partial/Filtered | Index with WHERE clause | Active records, non-deleted rows |
| Covering | Includes non-key columns via INCLUDE | Index-only scans for common query patterns |
| Composite | Multi-column index | Queries filtering on multiple columns |
๐ Index Blueprint Definition¶
indexes:
orders:
- name: ix_orders_tenant_id
type: btree
columns: [tenant_id]
purpose: "Tenant isolation filter"
- name: ix_orders_tenant_customer
type: btree
columns: [tenant_id, customer_id]
purpose: "Customer order lookups within tenant"
- name: ix_orders_tenant_status_date
type: btree
columns: [tenant_id, status, created_at]
where: "is_deleted = false"
purpose: "Active order filtering by status and date"
- name: ix_orders_order_number
type: btree
columns: [order_number]
unique: true
purpose: "Unique order number lookups"
- name: ix_orders_total_amount_covering
type: btree
columns: [tenant_id, created_at]
include: [total_amount, currency_code, status]
purpose: "Index-only scan for order summaries"
- name: ix_orders_jsonb_metadata
type: gin
columns: [metadata]
purpose: "JSONB containment queries on order metadata"
๐ Generated SQL Examples¶
-- Partial (filtered) index for active orders
CREATE INDEX CONCURRENTLY ix_orders_tenant_status_date
ON orders (tenant_id, status, created_at DESC)
WHERE is_deleted = false;
-- Covering index for index-only scans
CREATE INDEX CONCURRENTLY ix_orders_total_amount_covering
ON orders (tenant_id, created_at DESC)
INCLUDE (total_amount, currency_code, status);
-- GIN index for JSONB queries
CREATE INDEX CONCURRENTLY ix_orders_jsonb_metadata
ON orders USING gin (metadata jsonb_path_ops);
-- BRIN index for time-series data
CREATE INDEX ix_audit_log_created_brin
ON audit_log USING brin (created_at)
WITH (pages_per_range = 128);
๐ง Index Maintenance Strategy¶
indexMaintenance:
schedule:
reindex: "weekly-sunday-02:00"
analyzeFrequency: "daily-03:00"
bloatThreshold: 30%
monitoring:
unusedIndexThreshold: 30d
duplicateDetection: true
bloatTracking: true
automation:
autoDropUnused: false
alertOnUnused: true
alertOnBloat: true
alertOnDuplicates: true
๐ Index Monitoring Queries¶
-- Find unused indexes
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE 'uq_%'
AND indexrelname NOT LIKE 'pk_%'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Find duplicate indexes
SELECT
indrelid::regclass AS table_name,
array_agg(indexrelid::regclass) AS duplicate_indexes
FROM pg_index
GROUP BY indrelid, indkey
HAVING COUNT(*) > 1;
-- Index bloat estimation
SELECT
schemaname || '.' || tablename AS table,
indexname AS index,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size,
ROUND(100 * (1 - (pg_relation_size(indexname::regclass)::float /
NULLIF(pg_total_relation_size(tablename::regclass), 0))), 2) AS bloat_pct
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexname::regclass) DESC;
๐ค Agent Collaboration¶
| Agent | Role |
|---|---|
Database Engineer Agent |
Designs index strategy, generates DDL, monitors performance impact |
Data Architect Agent |
Reviews index alignment with query patterns and data access paths |
DevOps Engineer Agent |
Schedules maintenance windows, automates REINDEX operations |
Test Automation Engineer Agent |
Runs query performance tests to validate index effectiveness |
๐ข Multi-Tenant Data Architecture¶
ConnectSoft's AI Software Factory supports multi-tenant SaaS systems by default. The Database Blueprint encodes strict controls for data partitioning, tenant-aware access, and scoped operations at the database layer.
This section defines the three primary isolation patterns, their trade-offs, and when each should be used.
๐งฉ Isolation Pattern Comparison¶
| Pattern | Isolation Level | Complexity | Cost | Compliance | Scale |
|---|---|---|---|---|---|
| Shared Schema with RLS | Logical | Low | Low | Moderate | High |
| Schema-Per-Tenant | Logical+ | Medium | Medium | High | Medium |
| Database-Per-Tenant | Physical | High | High | Very High | Per-tenant |
๐ Pattern 1: Shared Schema with Row-Level Security (RLS)¶
The default and recommended pattern for most ConnectSoft services. All tenants share the same tables, with PostgreSQL RLS policies enforcing data isolation.
flowchart TD
subgraph SharedDatabase["Shared Database"]
subgraph Tables["Shared Tables"]
T1["orders (all tenants)"]
T2["customers (all tenants)"]
T3["products (all tenants)"]
end
RLS["Row-Level Security Policies"]
end
App1["Tenant A Request"] -->|"SET app.current_tenant = 'A'"| RLS
App2["Tenant B Request"] -->|"SET app.current_tenant = 'B'"| RLS
RLS --> Tables
SQL Implementation¶
-- Enable RLS on table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
ALTER TABLE orders FORCE ROW LEVEL SECURITY;
-- Create tenant isolation policy
CREATE POLICY tenant_isolation_orders ON orders
USING (tenant_id = current_setting('app.current_tenant')::uuid);
-- Create tenant-scoped insert policy
CREATE POLICY tenant_insert_orders ON orders
FOR INSERT
WITH CHECK (tenant_id = current_setting('app.current_tenant')::uuid);
-- Application sets tenant context per connection
SET app.current_tenant = '550e8400-e29b-41d4-a716-446655440000';
YAML Blueprint Declaration¶
multiTenant:
strategy: shared-schema-with-rls
tenantColumn: tenant_id
tenantColumnType: uuid
enforcement: row-level-security
connectionManagement:
pooling: shared
tenantContextSetting: "app.current_tenant"
setOnConnectionOpen: true
tables:
- orders
- customers
- order_items
- products
- invoices
excludedTables:
- reference_currencies
- reference_countries
- system_configurations
Pros and Cons¶
| Pros | Cons |
|---|---|
| โ Lowest operational complexity | โ ๏ธ Noisy neighbor risk on shared resources |
| โ Shared connection pool | โ ๏ธ Must ensure RLS policies on every table |
| โ Easy schema migrations (one database) | โ ๏ธ Harder compliance for strict data sovereignty |
| โ Cost-effective at scale | โ ๏ธ Cross-tenant queries require elevated access |
๐ Pattern 2: Schema-Per-Tenant¶
Each tenant gets a dedicated PostgreSQL schema within a shared database. Provides stronger isolation without separate database instances.
flowchart TD
subgraph SharedDatabase["Shared Database"]
subgraph SchemaA["Schema: tenant_alpha"]
TA1["orders"]
TA2["customers"]
end
subgraph SchemaB["Schema: tenant_beta"]
TB1["orders"]
TB2["customers"]
end
subgraph SchemaC["Schema: tenant_gamma"]
TC1["orders"]
TC2["customers"]
end
end
App1["Tenant Alpha"] -->|"SET search_path = 'tenant_alpha'"| SchemaA
App2["Tenant Beta"] -->|"SET search_path = 'tenant_beta'"| SchemaB
App3["Tenant Gamma"] -->|"SET search_path = 'tenant_gamma'"| SchemaC
SQL Implementation¶
-- Create tenant schema
CREATE SCHEMA tenant_alpha AUTHORIZATION app_user;
-- Create tables within tenant schema
CREATE TABLE tenant_alpha.orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
customer_id UUID NOT NULL REFERENCES tenant_alpha.customers(id),
order_number VARCHAR(50) NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
total_amount DECIMAL(18,2) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Tenant onboarding: clone template schema
CREATE SCHEMA tenant_new_tenant;
DO $$
DECLARE
rec RECORD;
BEGIN
FOR rec IN
SELECT tablename FROM pg_tables WHERE schemaname = 'template_schema'
LOOP
EXECUTE format(
'CREATE TABLE tenant_new_tenant.%I (LIKE template_schema.%I INCLUDING ALL)',
rec.tablename, rec.tablename
);
END LOOP;
END $$;
YAML Blueprint Declaration¶
multiTenant:
strategy: schema-per-tenant
schemaPrefix: "tenant_"
templateSchema: "template_schema"
connectionManagement:
pooling: per-schema
searchPathSetting: true
provisioning:
onTenantCreated: clone-template-schema
onTenantDeleted: archive-and-drop-schema
migrations:
strategy: iterate-all-schemas
parallelDegree: 4
failurePolicy: stop-and-alert
Pros and Cons¶
| Pros | Cons |
|---|---|
| โ Strong logical isolation | โ ๏ธ Migrations must iterate across all schemas |
| โ Per-tenant backup/restore possible | โ ๏ธ Higher connection pool overhead |
| โ Schema-level GRANT/REVOKE controls | โ ๏ธ Schema proliferation at scale |
| โ Easier compliance than shared-schema | โ ๏ธ Cross-tenant analytics requires schema merging |
๐ Pattern 3: Database-Per-Tenant¶
Each tenant gets a completely separate database instance. Maximum isolation, typically used for enterprise or regulated tenants.
flowchart TD
Router["Connection Router"] --> DB_A["Database: tenant_alpha_db"]
Router --> DB_B["Database: tenant_beta_db"]
Router --> DB_C["Database: tenant_gamma_db"]
DB_A --> RA["Full Schema + Data"]
DB_B --> RB["Full Schema + Data"]
DB_C --> RC["Full Schema + Data"]
YAML Blueprint Declaration¶
multiTenant:
strategy: database-per-tenant
namingPattern: "{tenant_slug}_db"
connectionManagement:
routing: connection-string-per-tenant
connectionStringSource: azure-key-vault
pooling: per-database
provisioning:
provider: azure-database-for-postgresql
skuPerTier:
starter: "B_Standard_B1ms"
professional: "GP_Standard_D2ds_v5"
enterprise: "MO_Standard_E4ds_v5"
onTenantCreated: provision-new-database
onTenantDeleted: backup-and-decommission
migrations:
strategy: rolling-per-database
parallelDegree: 10
failurePolicy: continue-and-report
backup:
perTenantSchedule: true
retentionDays: 90
Pros and Cons¶
| Pros | Cons |
|---|---|
| โ Maximum data isolation | โ ๏ธ Highest operational cost |
| โ Per-tenant performance tuning | โ ๏ธ Complex migration orchestration |
| โ Strongest compliance posture | โ ๏ธ Connection routing complexity |
| โ Independent backup/restore/scaling | โ ๏ธ Cross-tenant analytics extremely difficult |
| โ No noisy neighbor risk | โ ๏ธ Resource waste for low-activity tenants |
๐ Decision Matrix¶
tenantIsolationDecision:
defaultStrategy: shared-schema-with-rls
overrideRules:
- condition: "tenant.complianceLevel == 'SOC2_TYPE_II' OR tenant.dataResidency == true"
strategy: schema-per-tenant
- condition: "tenant.tier == 'enterprise' OR tenant.regulatoryRequirement == 'HIPAA'"
strategy: database-per-tenant
- condition: "tenant.expectedDataVolume > '100GB'"
strategy: database-per-tenant
hybridSupport:
enabled: true
description: "Mix strategies within the same platform based on tenant tier"
๐ค Agent Collaboration¶
| Agent | Role |
|---|---|
Data Architect Agent |
Selects isolation pattern based on requirements and constraints |
Database Engineer Agent |
Implements RLS policies, schema templates, or database provisioning |
Security Architect Agent |
Validates isolation enforcement and compliance alignment |
Infrastructure Engineer Agent |
Provisions database infrastructure per selected pattern |
DevOps Engineer Agent |
Automates tenant onboarding/offboarding and migration rollouts |
๐ Query Performance Management¶
The Database Blueprint defines query performance expectations as first-class contract elements. Every service's data access patterns are documented, baselined, and monitored for regression.
๐ฏ Performance Budget Model¶
queryPerformance:
budgets:
- query: "GET /api/orders (list by tenant)"
maxDurationMs: 50
expectedRows: 100
scanType: index-only
index: ix_orders_tenant_status_date
- query: "GET /api/orders/:id (single lookup)"
maxDurationMs: 5
expectedRows: 1
scanType: index-scan
index: ix_orders_order_number
- query: "GET /api/orders/summary (aggregation)"
maxDurationMs: 200
expectedRows: 1
scanType: index-only
index: ix_orders_total_amount_covering
- query: "POST /api/orders (insert)"
maxDurationMs: 20
expectedIndexUpdates: 4
- query: "Dashboard: daily order totals"
maxDurationMs: 500
expectedRows: 30
scanType: bitmap-index
acceptsMatView: true
๐ Execution Plan Baselines¶
The blueprint captures expected execution plan shapes for critical queries:
executionPlanBaselines:
- queryId: "orders_by_tenant_status"
expectedPlan:
type: "Index Only Scan"
index: ix_orders_tenant_status_date
estimatedCost: 0.43
actualRows: 100
regressionThreshold:
costIncrease: 200%
scanTypeChange: true
seqScanDetection: true
๐ Query Performance Monitoring¶
queryMonitoring:
enabled: true
tool: pg_stat_statements
configuration:
trackUtility: true
trackPlanning: true
maxStatements: 10000
alerts:
- condition: "mean_exec_time > 2x baseline"
action: alert-data-team
- condition: "seq_scan_count > 100 per hour on indexed table"
action: alert-db-engineer
- condition: "lock_wait_time > 5s"
action: alert-devops
dashboards:
- name: "Query Performance Overview"
metrics:
- mean_exec_time_ms
- calls_per_minute
- rows_returned_avg
- shared_blks_hit_ratio
๐ Performance Monitoring SQL¶
-- Top 10 slowest queries
SELECT
queryid,
ROUND(mean_exec_time::numeric, 2) AS avg_ms,
calls,
ROUND(total_exec_time::numeric, 2) AS total_ms,
ROUND((shared_blks_hit::float / NULLIF(shared_blks_hit + shared_blks_read, 0) * 100)::numeric, 2) AS cache_hit_pct,
LEFT(query, 100) AS query_preview
FROM pg_stat_statements
WHERE dbid = (SELECT oid FROM pg_database WHERE datname = current_database())
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Detect sequential scans on indexed tables
SELECT
schemaname,
relname AS table_name,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch,
ROUND(100.0 * seq_scan / NULLIF(seq_scan + idx_scan, 0), 2) AS seq_scan_pct
FROM pg_stat_user_tables
WHERE seq_scan > 100
AND idx_scan > 0
AND seq_scan > idx_scan
ORDER BY seq_scan DESC;
๐ Regression Detection Pipeline¶
flowchart LR
A[Query Runs in CI] --> B[Capture EXPLAIN ANALYZE]
B --> C[Compare to Baseline]
C --> D{Regression Detected?}
D -->|Yes| E[Block Deployment]
D -->|No| F[Pass Gate]
E --> G[Alert Database Engineer Agent]
G --> H[Optimize Query / Index]
H --> A
F --> I[Deploy to Production]
๐ค Agent Collaboration¶
| Agent | Role |
|---|---|
Database Engineer Agent |
Defines performance budgets, creates baseline plans |
Backend Developer Agent |
Aligns data access code with expected query patterns |
Test Automation Engineer Agent |
Runs performance regression tests in CI pipeline |
DevOps Engineer Agent |
Configures monitoring dashboards and alert routing |
๐พ Backup, Recovery, and Disaster Recovery¶
The Database Blueprint defines comprehensive backup and disaster recovery specifications for every database component. These are not optional โ they are contractual SLA elements that must be provisioned, tested, and monitored.
๐ฏ RPO/RTO Definitions¶
| Metric | Definition | Blueprint Field |
|---|---|---|
| RPO | Recovery Point Objective โ max acceptable data loss | rpoMinutes |
| RTO | Recovery Time Objective โ max acceptable downtime | rtoMinutes |
๐ Backup Strategy Blueprint¶
backupAndRecovery:
engine: postgresql
provider: azure-database-for-postgresql-flexible
rpo:
minutes: 15
justification: "Business-critical order data with financial implications"
rto:
minutes: 60
justification: "SLA requires 99.95% availability"
backup:
automated:
type: continuous
walArchiving: true
fullBackup:
schedule: "0 2 * * *"
retention: 35d
differentialBackup:
schedule: "0 */6 * * *"
retention: 7d
transactionLogBackup:
continuous: true
retention: 7d
pointInTimeRecovery:
enabled: true
granularity: "any point within retention window"
maxRecoveryWindow: 35d
storage:
type: geo-redundant
primaryRegion: eastus2
secondaryRegion: westus2
encryption: AES-256
geoReplication:
enabled: true
mode: asynchronous
readReplicas:
- region: westus2
purpose: read-offload
lagThresholdSeconds: 30
- region: northeurope
purpose: disaster-recovery
lagThresholdSeconds: 60
disasterRecovery:
strategy: geo-failover
automaticFailover:
enabled: true
detectionThreshold: 60s
gracePeridSeconds: 300
manualFailover:
runbookLocation: "runbooks/database/failover-procedure.md"
notifyChannels:
- opsgenie
- teams
drTesting:
schedule: quarterly
type: full-failover-drill
lastTested: "2025-06-15T14:00:00Z"
results: "runbooks/database/dr-test-results-2025-q2.md"
๐ Backup Architecture Flow¶
flowchart TD
subgraph Primary["Primary Region (East US 2)"]
DB_Primary["PostgreSQL Primary"]
WAL["WAL Archiving"]
FullBackup["Full Backup (Daily 2AM)"]
DiffBackup["Differential (Every 6h)"]
end
subgraph Secondary["Secondary Region (West US 2)"]
DB_ReadReplica["Read Replica"]
DB_DR["DR Standby"]
end
subgraph Storage["Geo-Redundant Storage"]
BackupStore["Backup Storage (AES-256)"]
PITR["Point-in-Time Recovery Store"]
end
DB_Primary --> WAL
WAL --> BackupStore
WAL --> PITR
DB_Primary --> FullBackup --> BackupStore
DB_Primary --> DiffBackup --> BackupStore
DB_Primary -->|"Async Replication"| DB_ReadReplica
DB_Primary -->|"Async Replication"| DB_DR
DB_DR -->|"Failover"| DB_Primary
๐ Recovery Procedures¶
| Scenario | Procedure | RTO Target |
|---|---|---|
| Accidental Data Deletion | Point-in-time recovery to pre-deletion state | < 30 minutes |
| Database Corruption | Restore from latest full + WAL replay | < 60 minutes |
| Region Outage | Promote DR standby in secondary region | < 15 minutes |
| Full Infrastructure Failure | Restore from geo-redundant backup | < 4 hours |
| Ransomware/Security Breach | Restore from immutable backup + audit trail | < 2 hours |
๐งช DR Testing Configuration¶
drTesting:
schedule:
frequency: quarterly
nextScheduled: "2025-09-15T14:00:00Z"
tests:
- type: failover-drill
description: "Promote read replica to primary, validate application connectivity"
acceptanceCriteria:
- rtoMet: true
- dataIntegrity: verified
- applicationHealthy: true
- type: point-in-time-recovery
description: "Recover to specific timestamp, validate data consistency"
acceptanceCriteria:
- rpoMet: true
- rowCountMatch: true
- type: full-restore
description: "Restore from geo-redundant backup to new instance"
acceptanceCriteria:
- schemaMatch: true
- dataMatch: true
- indexesIntact: true
๐ค Agent Collaboration¶
| Agent | Role |
|---|---|
Infrastructure Engineer Agent |
Provisions backup infrastructure, configures replication |
Database Engineer Agent |
Defines RPO/RTO targets, validates backup coverage |
DevOps Engineer Agent |
Automates backup scheduling, monitors backup health |
Security Architect Agent |
Ensures backup encryption and access controls |
Test Automation Engineer Agent |
Schedules and validates DR test exercises |
๐ฑ Data Seeding and Test Data¶
The Database Blueprint defines how seed data, reference data, and test data are managed across environments. This ensures consistent, reliable, and privacy-safe data for development, testing, and staging.
๐งฉ Seed Data Categories¶
| Category | Description | Environments |
|---|---|---|
| Reference Data | Lookup tables: currencies, countries, status codes | All |
| System Configuration | Default settings, feature flags, role definitions | All |
| Sample Tenants | Pre-configured tenant accounts for development/testing | Dev, Test, Staging |
| Test Entities | Orders, customers, products for integration tests | Dev, Test |
| Performance Data | Large-volume generated data for load testing | Staging, Perf |
๐ Seed Data Blueprint¶
dataSeeding:
referenceData:
source: "seed/reference_data.sql"
idempotent: true
appliesTo: [dev, test, staging, production]
tables:
- reference_currencies
- reference_countries
- reference_order_statuses
- reference_payment_methods
testData:
source: "seed/test_data.sql"
appliesTo: [dev, test]
generator:
tool: faker
language: csharp
config:
tenants: 5
customersPerTenant: 100
ordersPerCustomer: 20
locale: "en_US"
cleanup:
strategy: truncate-and-reseed
beforeEachTestRun: true
performanceData:
source: generated
appliesTo: [staging, performance]
generator:
tenants: 50
customersPerTenant: 10000
ordersPerCustomer: 500
totalExpectedRows: 250000000
parallelInsertDegree: 8
๐ Data Masking for Non-Production¶
dataMasking:
enabled: true
appliesTo: [dev, test, staging]
rules:
- column: "customers.email"
strategy: hash-and-domain
output: "{hash}@example.com"
- column: "customers.phone_number"
strategy: randomize
pattern: "+1-555-XXX-XXXX"
- column: "customers.first_name"
strategy: faker
fakerMethod: "Name.FirstName"
- column: "customers.last_name"
strategy: faker
fakerMethod: "Name.LastName"
- column: "orders.total_amount"
strategy: noise
variance: 15%
- column: "payment_methods.card_number"
strategy: redact
output: "XXXX-XXXX-XXXX-XXXX"
๐ Reference Data SQL Example¶
-- Reference data: idempotent seed script
-- Applied to all environments including production
INSERT INTO reference_currencies (code, name, symbol, decimal_places)
VALUES
('USD', 'US Dollar', '$', 2),
('EUR', 'Euro', 'โฌ', 2),
('GBP', 'British Pound', 'ยฃ', 2),
('JPY', 'Japanese Yen', 'ยฅ', 0),
('CAD', 'Canadian Dollar', 'C$', 2)
ON CONFLICT (code) DO UPDATE SET
name = EXCLUDED.name,
symbol = EXCLUDED.symbol,
decimal_places = EXCLUDED.decimal_places;
INSERT INTO reference_order_statuses (code, name, display_order, is_terminal)
VALUES
('pending', 'Pending', 1, false),
('confirmed', 'Confirmed', 2, false),
('processing', 'Processing', 3, false),
('shipped', 'Shipped', 4, false),
('delivered', 'Delivered', 5, true),
('cancelled', 'Cancelled', 6, true),
('refunded', 'Refunded', 7, true)
ON CONFLICT (code) DO UPDATE SET
name = EXCLUDED.name,
display_order = EXCLUDED.display_order,
is_terminal = EXCLUDED.is_terminal;
๐ค Agent Collaboration¶
| Agent | Role |
|---|---|
Database Engineer Agent |
Generates seed scripts, configures idempotent reference data |
Test Automation Engineer Agent |
Defines test data requirements, generates faker-based data scripts |
Security Architect Agent |
Specifies data masking rules and ensures PII protection in non-prod |
Backend Developer Agent |
Integrates seed data with application bootstrap and test fixtures |
๐ Cross-Blueprint Intersections¶
The Database Blueprint does not exist in isolation. It is deeply interconnected with other blueprints in the Factory ecosystem, creating a coherent, multi-dimensional specification for every generated component.
๐ก๏ธ Security Blueprint Intersection¶
| Intersection Area | Security Blueprint Contribution | Database Blueprint Contribution |
|---|---|---|
| Encryption at Rest | Defines encryption requirements and key management | Implements TDE, column-level encryption |
| Row-Level Security | Defines tenant isolation requirements | Implements RLS policies and tenant context propagation |
| Data Masking | Specifies PII/PHI classification | Implements masking rules and non-prod data generation |
| Audit Logging | Requires audit trails for compliance | Implements temporal tables and change tracking |
| Secrets Management | Defines connection string storage in Key Vault | Consumes connection secrets at runtime |
๐ฆ Infrastructure Blueprint Intersection¶
| Intersection Area | Infrastructure Blueprint Contribution | Database Blueprint Contribution |
|---|---|---|
| Database Provisioning | Provisions Azure Database for PostgreSQL, networking | Defines required SKU, configuration, extensions |
| Replication Setup | Configures read replicas, geo-replication infrastructure | Defines replication topology and lag thresholds |
| Backup Infrastructure | Provisions backup storage, configures retention policies | Defines RPO/RTO targets and recovery procedures |
| Networking | Configures VNet integration, private endpoints | Specifies connection requirements and firewall rules |
| Monitoring | Deploys monitoring agents and dashboards | Defines query metrics and performance alert thresholds |
๐งช Test Blueprint Intersection¶
| Intersection Area | Test Blueprint Contribution | Database Blueprint Contribution |
|---|---|---|
| Integration Tests | Defines test scenarios requiring database interaction | Provides schema definitions and seed data |
| Performance Tests | Defines load test scenarios and SLA targets | Provides performance budgets and baseline query plans |
| Migration Tests | Validates migration scripts execute correctly | Provides migration files and rollback scripts |
| Data Integrity Tests | Validates foreign key consistency and constraint enforcement | Defines constraints and relationship specifications |
| Test Data Management | Requires repeatable test data setup and teardown | Provides seeding scripts and data generation rules |
๐ Observability Blueprint Intersection¶
| Intersection Area | Observability Blueprint Contribution | Database Blueprint Contribution |
|---|---|---|
| Query Metrics | Collects and dashboards query performance telemetry | Defines expected performance baselines and budgets |
| Connection Pool Monitoring | Monitors connection usage, pool exhaustion | Defines pool sizing and connection management strategy |
| Replication Lag | Alerts on replication lag exceeding thresholds | Defines acceptable lag thresholds per replica |
| Backup Monitoring | Monitors backup success/failure, storage usage | Defines backup schedules and retention requirements |
| Schema Drift Detection | Detects when deployed schema differs from blueprint | Provides canonical schema definitions for comparison |
๐ Cross-Blueprint Flow¶
flowchart TD
DB[Database Blueprint] --> SEC[Security Blueprint]
DB --> INFRA[Infrastructure Blueprint]
DB --> TEST[Test Blueprint]
DB --> OBS[Observability Blueprint]
DB --> SVC[Service Blueprint]
SEC -->|"Encryption, RLS, Masking"| DB
INFRA -->|"Provisioning, Replicas, Networking"| DB
TEST -->|"Integration Tests, Seed Data"| DB
OBS -->|"Query Metrics, Alerts"| DB
SVC -->|"ORM Mappings, Repository Patterns"| DB
๐งญ Blueprint Location, Traceability, and Versioning¶
A Database Blueprint is not just content โ it's a traceable artifact, part of a multi-agent lineage graph, and lives at a predictable location in the Factory's file and memory hierarchy.
This enables cross-agent validation, rollback, comparison, and regeneration.
๐ File System Location¶
Each blueprint is stored in a consistent location within the Factory workspace:
blueprints/database/{component-name}/database-blueprint.md
blueprints/database/{component-name}/database-blueprint.json
blueprints/database/{component-name}/migrations/
blueprints/database/{component-name}/schemas/
blueprints/database/{component-name}/indexes/
- Markdown is human-readable and Studio-rendered.
- JSON is parsed by orchestrators and enforcement agents.
- Migrations folder contains version-controlled SQL scripts.
- Schemas folder contains per-table YAML definitions.
- Indexes folder contains index strategy YAML files.
๐ง Traceability Fields¶
Each blueprint includes a set of required metadata fields for trace alignment:
| Field | Purpose |
|---|---|
traceId |
Links blueprint to full generation pipeline |
agentId |
Records which agent(s) emitted the artifact |
originPrompt |
Captures human-initiated signal or intent |
createdAt |
ISO timestamp for audit |
dataProfile |
Type of data architecture (relational, document) |
engine |
Database engine (postgresql, sqlserver, etc.) |
tenantStrategy |
Multi-tenant isolation pattern |
migrationVersion |
Current migration version |
These fields ensure full trace and observability for regeneration, validation, and compliance review.
๐ Versioning and Mutation Tracking¶
| Mechanism | Purpose |
|---|---|
v1, v2, ... |
Manual or automatic version bumping by agents |
diff-link: metadata |
References upstream and downstream changes |
| GitOps snapshot tags | Bind blueprint versions to commit hashes or releases |
| Migration versioning | Each schema change is a numbered, traceable migration |
| Drift monitors | Alert if deployed schema deviates from blueprint |
๐ Mutation History Example¶
metadata:
traceId: "trc_82db_OrderService_v3"
agentId: "db-engineer-001"
originPrompt: "Add multi-currency support to order processing"
createdAt: "2025-07-15T09:30:00Z"
version: "v3"
diffFrom: "v2"
changedFields:
- "tables.orders.columns.currency_code"
- "tables.orders.columns.exchange_rate"
- "indexes.ix_orders_currency"
- "migrations.V3.2.0__add_currency_support"
migrationVersion: "3.2.0"
engine: "postgresql"
tenantStrategy: "shared-schema-with-rls"
These mechanisms ensure that database design is not an afterthought, but a tracked, versioned, observable system artifact.
๐ Database CI/CD Integration¶
The Database Blueprint defines how database changes flow through the CI/CD pipeline โ from schema validation to migration deployment to performance regression testing.
๐ Pipeline Architecture¶
flowchart LR
subgraph CI["Continuous Integration"]
A[PR Created] --> B[Schema Lint]
B --> C[Migration Validation]
C --> D[Seed Data Test]
D --> E[Integration Tests]
E --> F[Performance Regression]
F --> G{All Gates Pass?}
end
subgraph CD["Continuous Deployment"]
G -->|Yes| H[Deploy to Staging]
H --> I[Run Migrations]
I --> J[Smoke Tests]
J --> K[Performance Validation]
K --> L{Staging OK?}
L -->|Yes| M[Deploy to Production]
M --> N[Run Migrations]
N --> O[Health Check]
O --> P[Monitor & Alert]
end
G -->|No| Q[Block & Notify]
L -->|No| R[Rollback Staging]
๐ Pipeline Stage Definitions¶
databasePipeline:
ci:
stages:
- name: schema-lint
description: "Validate schema YAML against naming conventions and rules"
tool: custom-schema-linter
failOn: error
config:
rules: "blueprints/database/validation-rules.yaml"
- name: migration-validation
description: "Validate migration scripts are syntactically correct and idempotent"
tool: flyway-validate
failOn: error
config:
locations: "blueprints/database/{component}/migrations/"
- name: migration-dry-run
description: "Execute migrations against ephemeral test database"
tool: flyway-migrate
failOn: error
config:
target: ephemeral-postgres
cleanOnStart: true
validateOnMigrate: true
- name: seed-data-test
description: "Apply seed data and validate reference data integrity"
tool: custom-seed-runner
failOn: error
- name: integration-tests
description: "Run database integration tests"
tool: dotnet-test
failOn: error
config:
filter: "Category=DatabaseIntegration"
database: ephemeral-postgres
- name: performance-regression
description: "Compare query execution plans against baselines"
tool: custom-perf-gate
failOn: warning
config:
baselines: "blueprints/database/{component}/performance-baselines.yaml"
regressionThreshold: 200%
cd:
stages:
- name: deploy-migrations
description: "Apply pending migrations to target environment"
tool: flyway-migrate
config:
outOfOrder: false
validateOnMigrate: true
baselineOnMigrate: false
- name: verify-schema
description: "Compare deployed schema against blueprint definition"
tool: custom-schema-diff
failOn: error
- name: health-check
description: "Validate database connectivity and basic query functionality"
tool: custom-health-check
config:
timeout: 30s
queries:
- "SELECT 1"
- "SELECT count(*) FROM reference_currencies"
๐ Rollback Strategy¶
rollback:
strategy: migration-rollback
config:
automaticRollback:
onMigrationFailure: true
onHealthCheckFailure: true
onPerformanceRegression: false
manualRollback:
approvalRequired: true
approvers: ["data-architect", "devops-lead"]
rollbackScripts:
required: true
location: "blueprints/database/{component}/migrations/"
namingPattern: "V{version}__rollback.sql"
๐งช Schema Validation Gate Example¶
schemaValidationGate:
name: "Database Schema Validation"
trigger: pull-request
checks:
- name: naming-conventions
description: "All tables, columns, and indexes follow naming standards"
severity: error
- name: required-columns
description: "All business tables have tenant_id, created_at, updated_at, is_deleted"
severity: error
- name: foreign-key-indexes
description: "All foreign key columns have supporting indexes"
severity: error
- name: migration-reversibility
description: "Every migration has a corresponding rollback script"
severity: warning
- name: zero-downtime-compliance
description: "No prohibited DDL operations in migration"
severity: error
- name: performance-budget
description: "New queries have documented performance budgets"
severity: warning
๐ค Agent Collaboration¶
| Agent | Role |
|---|---|
DevOps Engineer Agent |
Designs pipeline stages, configures deployment automation |
Database Engineer Agent |
Creates validation rules, defines performance baselines |
Test Automation Engineer Agent |
Implements integration tests and performance regression tests |
Backend Developer Agent |
Ensures application code is compatible with migration changes |
Infrastructure Engineer Agent |
Provisions ephemeral databases for CI and manages environment configurations |
๐ง Memory Graph Representation¶
The Database Blueprint is embedded in the ConnectSoft Memory Graph โ a semantic, vector-indexed knowledge graph that enables AI agents to reason about data architecture across the entire platform.
๐ Memory Node Structure¶
memoryNode:
type: database-blueprint
id: "blueprint:database:order-service:v3"
traceId: "trc_82db_OrderService_v3"
attributes:
engine: postgresql
tenantStrategy: shared-schema-with-rls
migrationVersion: "3.2.0"
tables: ["orders", "customers", "order_items", "products"]
indexCount: 12
rlsEnabled: true
backupRpo: 15
backupRto: 60
links:
- type: belongs-to
target: "service:order-service"
- type: secured-by
target: "blueprint:security:order-service:v2"
- type: provisioned-by
target: "blueprint:infrastructure:order-service:v4"
- type: tested-by
target: "blueprint:test:order-service:v5"
- type: observed-by
target: "blueprint:observability:order-service:v3"
- type: depends-on
target: "blueprint:database:customer-service:v2"
embedding:
model: text-embedding-ada-002
dimensions: 1536
indexed: true
concepts:
- postgresql
- multi-tenant
- row-level-security
- migration
- index-optimization
- backup-recovery
๐งฉ Graph Relationships¶
graph TD
DB_BP["Database Blueprint\n(order-service v3)"] --> SVC["Service Blueprint\n(order-service)"]
DB_BP --> SEC_BP["Security Blueprint\n(order-service v2)"]
DB_BP --> INFRA_BP["Infrastructure Blueprint\n(order-service v4)"]
DB_BP --> TEST_BP["Test Blueprint\n(order-service v5)"]
DB_BP --> OBS_BP["Observability Blueprint\n(order-service v3)"]
DB_BP --> CUST_DB["Database Blueprint\n(customer-service v2)"]
DB_BP --> MIG["Migration: V3.2.0"]
DB_BP --> IDX["Index Strategy"]
DB_BP --> RLS["RLS Policies"]
DB_BP --> BACKUP["Backup Config"]
MIG --> TRACE["Trace: trc_82db"]
IDX --> TRACE
RLS --> TRACE
๐ง AI Agent Reasoning¶
The memory graph enables agents to:
- ๐ Find related blueprints โ "What security policies apply to the orders database?"
- ๐ Detect drift โ "Has the deployed schema diverged from the blueprint?"
- ๐งช Generate tests โ "What integration tests are needed for the new migration?"
- ๐ Optimize performance โ "Which indexes are unused across all services?"
- ๐ Trace lineage โ "What prompt created this table? Which agent last modified it?"
๐ Memory Query Examples¶
memoryQueries:
- query: "Find all database blueprints using RLS"
filter:
type: database-blueprint
attributes.rlsEnabled: true
returns: [id, traceId, attributes.tables]
- query: "Find blueprints with RPO > 30 minutes"
filter:
type: database-blueprint
attributes.backupRpo: { $gt: 30 }
returns: [id, attributes.engine, attributes.backupRpo]
- query: "Find cross-service database dependencies"
filter:
type: database-blueprint
links.type: depends-on
returns: [id, links.target]
๐ Final Summary¶
The Database Blueprint is a comprehensive, agent-generated artifact that serves as the data persistence contract of record for every component in the ConnectSoft AI Software Factory.
๐ Blueprint Coverage Summary¶
| Dimension | Coverage |
|---|---|
| Schema Design | Tables, columns, types, constraints, naming conventions, normalization |
| Entity Relationships | Foreign keys, aggregate boundaries, reference data, ER diagrams |
| Migrations | Version-controlled, zero-downtime, rollback-safe, CI/CD integrated |
| Index Strategy | B-Tree, GIN, GiST, BRIN, partial, covering, composite, maintenance |
| Multi-Tenant Isolation | Shared-schema + RLS, schema-per-tenant, database-per-tenant, hybrid |
| Query Performance | Budgets, baselines, execution plans, regression detection |
| Backup & Recovery | RPO/RTO targets, continuous backup, PITR, geo-replication, DR testing |
| Data Seeding | Reference data, test data generation, data masking, faker integration |
| CI/CD Integration | Schema linting, migration validation, performance gates, rollback |
| Cross-Blueprint Links | Security, Infrastructure, Test, Observability, Service blueprints |
| Memory Graph | Vectorized, concept-indexed, trace-linked, agent-queryable |
๐ค Agent Participation Summary¶
| Agent | Primary Responsibilities |
|---|---|
Database Engineer Agent |
Schema design, migration generation, index optimization, query tuning |
Data Architect Agent |
Data modeling, entity relationships, bounded contexts, normalization |
Backend Developer Agent |
ORM configuration, data access layers, repository patterns |
Infrastructure Engineer Agent |
Database provisioning, replication, networking, backup infrastructure |
Security Architect Agent |
RLS policies, encryption, data masking, audit trails |
DevOps Engineer Agent |
Migration pipelines, CI/CD gates, monitoring, backup automation |
Test Automation Engineer Agent |
Integration tests, performance tests, seed data, DR drills |
๐ Artifact Output Summary¶
| Format | Purpose | Location |
|---|---|---|
database-blueprint.md |
Human-readable specification | blueprints/database/{component}/ |
database-blueprint.json |
Machine-readable schema definition | blueprints/database/{component}/ |
*.sql |
Migration scripts (up and rollback) | blueprints/database/{component}/migrations/ |
*.yaml |
Schema and index definitions | blueprints/database/{component}/schemas/ |
| Embedding | Vector-indexed memory node | Memory graph |
๐ง Key Design Principles¶
- Agent-First: Every schema, migration, and index is generated and validated by specialized agents
- Trace-Linked: Every artifact carries a
traceIdlinking it to the full generation pipeline - Zero-Downtime: All production migrations follow expand-contract or additive-only patterns
- Multi-Tenant Native: Tenant isolation is not a feature โ it's a systemic guarantee
- Performance-Budgeted: Every query has an expected execution plan and regression threshold
- Disaster-Ready: Backup and recovery are contractual SLA elements, not optional configurations
- Memory-Integrated: The blueprint lives in the semantic memory graph, enabling AI reasoning
๐๏ธ The Database Blueprint transforms database design from a manual, error-prone activity into a deterministic, traceable, agent-orchestrated discipline โ ensuring every service has a reliable, performant, and secure data foundation.