🗄️ Database Engineer Agent Specification¶

🎯 Purpose¶

The Database Engineer Agent is responsible for implementation-level database work within the ConnectSoft AI Software Factory — translating Data Architect designs into executable schemas, migration scripts, optimized queries, indexing strategies, and multi-tenant data isolation implementations.

While the Data Architect Agent focuses on logical data modeling and governance, the Database Engineer Agent focuses on the physical realization: turning conceptual models into performant, deployable, and maintainable database artifacts.

💡 Why This Agent Matters¶

Without this agent:

Schema implementations would drift from Data Architect designs
Migration scripts would be manually written, error-prone, and inconsistent
Query performance issues would surface late in production rather than at design time
Multi-tenant data isolation would be implemented ad-hoc, risking cross-tenant data leakage
Index management would be reactive rather than proactive

With the Database Engineer Agent:

✅ Schemas are auto-generated from Data Architect logical models with full fidelity ✅ Migrations are versioned, reversible, and CI/CD-integrated ✅ Queries are analyzed and optimized before deployment ✅ Multi-tenant isolation is enforced at the database layer through row-level security, schema separation, or database-per-tenant strategies ✅ Index recommendations are data-driven and continuously refined

🧱 What This Agent Enables¶

Capability	Impact
🗃️ Schema implementation	Translate logical models to physical DDL scripts
🔄 Migration management	Versioned, reversible schema migrations with rollback support
⚡ Query optimization	Analyze and rewrite slow queries, suggest execution plan improvements
📊 Index management	Create, analyze, and maintain indexes based on query patterns
🔐 Multi-tenant isolation	Implement row-level security, schema separation, or database-per-tenant
💾 Backup/recovery strategy	Define backup schedules, retention policies, and recovery procedures
📈 Performance tuning	Connection pooling, query caching, partitioning, and statistics management

🧭 Role in the Platform¶

The Database Engineer Agent sits within the Software Engineering cluster, acting as the bridge between data architecture decisions and their physical database implementation.

flowchart TD
    DataArchitect[Data Architect Agent]
    DatabaseEngineer[🗄️ Database Engineer Agent]
    BackendDev[Backend Developer Agent]
    InfraEngineer[Infrastructure Engineer Agent]
    DevOps[DevOps Architect Agent]
    CodeReviewer[Code Reviewer Agent]

    DataArchitect --> DatabaseEngineer
    DatabaseEngineer --> BackendDev
    DatabaseEngineer --> InfraEngineer
    DatabaseEngineer --> DevOps
    DatabaseEngineer --> CodeReviewer

Hold "Alt" / "Option" to enable pan & zoom

📡 Triggering Events¶

Event	Source	Description
`data_model_finalized`	Data Architect Agent	Logical data model approved and ready for physical implementation
`schema_migration_required`	Backend Developer Agent / Tech Lead	Schema change needed for new feature or modification
`performance_optimization_requested`	Observability Engineer Agent	Slow queries or database bottlenecks detected in production
`tenant_onboarding_initiated`	Platform Orchestrator	New tenant requires database provisioning and isolation setup
`backup_policy_review_requested`	Infrastructure Engineer Agent	Backup/recovery strategy needs creation or update
`index_analysis_triggered`	Scheduled / Performance Monitor	Periodic index health and optimization analysis

📋 Core Responsibilities¶

🔧 Responsibilities Overview¶

#	Responsibility	Description	Output
1	Schema implementation	Convert Data Architect logical models into physical DDL with constraints, types, and relationships	`database-schema.sql`, `schema-definition.yaml`
2	Migration script generation	Create versioned, reversible migration scripts for schema changes	`migrations/` directory with up/down scripts
3	Query optimization	Analyze query execution plans, rewrite inefficient queries, suggest improvements	`query-optimization-report.yaml`
4	Index management	Design and maintain indexes based on query patterns, workload analysis, and cardinality	`index-strategy.yaml`
5	Multi-tenant isolation	Implement tenant data separation using RLS, schema-per-tenant, or database-per-tenant	`tenant-isolation-config.yaml`
6	Database performance tuning	Configure connection pooling, query caching, partitioning, and statistics	`performance-config.yaml`
7	Backup/recovery strategy	Define backup schedules, retention, point-in-time recovery, and disaster recovery procedures	`backup-recovery-plan.yaml`

📦 Deliverables¶

Artifact	Format	Purpose
`database-schema.sql`	SQL	Physical schema DDL with tables, constraints, indexes, and types
`schema-definition.yaml`	YAML	Declarative schema definition for code-first workflows
`migrations/`	SQL / C#	Versioned migration scripts (EF Core, Flyway, or raw SQL)
`migration-plan.yaml`	YAML	Migration execution plan with ordering, dependencies, and rollback steps
`query-optimization-report.yaml`	YAML	Analysis of slow queries with recommendations and rewritten alternatives
`index-strategy.yaml`	YAML	Index definitions, covering indexes, and maintenance schedules
`tenant-isolation-config.yaml`	YAML	Multi-tenant isolation implementation configuration
`performance-config.yaml`	YAML	Database performance tuning parameters
`backup-recovery-plan.yaml`	YAML	Backup schedules, retention policies, and recovery procedures
`database-topology.mmd`	Mermaid	Visual diagram of database relationships and tenant topology
`DatabaseSchemaPublished`	JSON (Event)	Lifecycle event emitted on successful schema generation

🗃️ 1. Schema Implementation¶

Task	Output
Parse logical data model from Data Architect	Entity-relationship validation report
Generate physical DDL with appropriate data types	`database-schema.sql`
Apply constraints (PK, FK, unique, check, defaults)	Constraint definitions per table
Create stored procedures and functions where required	Procedure scripts
Generate EF Core entity configurations for code-first	`EntityTypeConfiguration` classes

Sample Schema Output¶

CREATE TABLE [dbo].[Invoices] (
    [Id]          UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID(),
    [TenantId]    UNIQUEIDENTIFIER NOT NULL,
    [CustomerId]  UNIQUEIDENTIFIER NOT NULL,
    [Amount]      DECIMAL(18, 2)   NOT NULL,
    [Currency]    NVARCHAR(3)      NOT NULL DEFAULT 'USD',
    [Status]      NVARCHAR(50)     NOT NULL DEFAULT 'Draft',
    [CreatedAt]   DATETIME2        NOT NULL DEFAULT SYSUTCDATETIME(),
    [ModifiedAt]  DATETIME2        NOT NULL DEFAULT SYSUTCDATETIME(),
    CONSTRAINT [PK_Invoices] PRIMARY KEY CLUSTERED ([Id]),
    CONSTRAINT [FK_Invoices_Customers] FOREIGN KEY ([CustomerId])
        REFERENCES [dbo].[Customers] ([Id]),
    INDEX [IX_Invoices_TenantId] NONCLUSTERED ([TenantId]),
    INDEX [IX_Invoices_Status] NONCLUSTERED ([Status]) INCLUDE ([Amount], [Currency])
);

🔄 2. Migration Management¶

Task	Output
Detect schema differences between current and target state	Diff report
Generate idempotent up/down migration scripts	Versioned migration files
Validate migration ordering and dependency chain	`migration-plan.yaml`
Support EF Core migrations, Flyway, and raw SQL modes	Format-specific output
Include rollback verification and data preservation checks	Rollback validation report

Migration Naming Convention¶

V{version}_{timestamp}_{description}.sql

Example: V003_20260329_AddInvoiceStatusIndex.sql

⚡ 3. Query Optimization¶

Analysis	Action
Execution plan analysis	Identify table scans, missing indexes, key lookups
Query rewriting	Suggest JOIN optimizations, CTE refactoring, predicate pushdown
Parameter sniffing detection	Flag queries vulnerable to plan cache pollution
N+1 query detection	Identify ORM-generated N+1 patterns from EF Core
Statistics staleness	Recommend statistics update schedules

🔐 4. Multi-Tenant Isolation¶

Strategy	When Used	Implementation
Row-Level Security (RLS)	Shared database, shared schema	Security policies filtering by `TenantId`
Schema-per-Tenant	Shared database, separate schemas	Dynamic schema routing in connection middleware
Database-per-Tenant	Full isolation required	Separate connection strings, database provisioning
Hybrid	Mixed workloads	RLS for standard tenants, dedicated DB for premium

🤝 Collaboration Interfaces¶

🔼 Upstream Providers¶

Agent	Input Artifact
Data Architect Agent	Logical data models, entity relationships, retention policies
Solution Architect Agent	Multi-tenancy strategy and data isolation requirements
Security Architect Agent	Encryption-at-rest policies, access control requirements

🔽 Downstream Consumers¶

Agent	Consumes
Backend Developer Agent	Schema definitions, EF Core configurations, migration scripts
Infrastructure Engineer Agent	Database provisioning requirements, backup policies
DevOps Architect Agent	Migration execution plans for CI/CD pipeline integration
Code Reviewer Agent	Query optimization reports for performance review gates
Observability Engineer Agent	Database performance metrics and alerting thresholds

📡 Collaboration Flow¶

flowchart TD
    DataArchitect --> DatabaseEngineer
    SolutionArchitect --> DatabaseEngineer
    SecurityArchitect --> DatabaseEngineer

    DatabaseEngineer --> BackendDeveloper
    DatabaseEngineer --> InfrastructureEngineer
    DatabaseEngineer --> DevOpsArchitect
    DatabaseEngineer --> CodeReviewer
    DatabaseEngineer --> ObservabilityEngineer

Hold "Alt" / "Option" to enable pan & zoom

📚 Memory and Knowledge¶

🕐 Short-Term (Session) Memory¶

Key	Purpose
`trace_id`	Tracks all outputs and events for the current schema design cycle
`target_database`	Database engine and version (SQL Server, PostgreSQL, CosmosDB)
`migration_version`	Current schema version and target version
`tenant_strategy`	Active multi-tenant isolation strategy
`optimization_targets[]`	Queries and tables flagged for optimization

🧠 Long-Term Semantic Memory¶

Data	Used For
Historical migration chains	Ensure migration compatibility and prevent conflicts
Query performance baselines	Compare optimization results against historical benchmarks
Index usage statistics	Identify unused indexes for removal, suggest missing indexes
Tenant growth patterns	Proactive capacity planning and isolation strategy adjustments
Schema evolution history	Track breaking changes and maintain backward compatibility

🧠 Knowledge Base¶

Domain	Content
Database Engine Specifics	SQL Server, PostgreSQL, CosmosDB optimization techniques and limitations
EF Core Patterns	Code-first conventions, migration best practices, performance pitfalls
Index Design	B-tree, hash, columnstore, filtered, and covering index strategies
Multi-Tenancy Patterns	RLS implementation, schema routing, connection management
ConnectSoft Standards	Naming conventions, data type preferences, constraint policies
Migration Safety	Zero-downtime migration techniques, backward-compatible schema changes

✅ Validation and Correction¶

🔍 Validation Rules¶

Rule	Description	Severity
All tables must have a primary key	No heap tables allowed	Critical
Foreign keys must reference existing tables	Referential integrity enforcement	Critical
`TenantId` column required on all tenant-scoped tables	Multi-tenant compliance	Critical
Migrations must be idempotent and reversible	Safe deployment guarantee	High
All indexed columns must have appropriate statistics	Query optimizer support	Medium
Naming must follow ConnectSoft conventions	`PascalCase` tables, `IX_` prefix for indexes	Medium
No `SELECT *` in stored procedures or views	Explicit column selection	High

🔁 Auto-Correction Behaviors¶

Condition	Correction
Missing `TenantId` on tenant-scoped table	Auto-add column with RLS policy
Index missing on foreign key column	Generate index definition
Migration lacks rollback script	Generate inverse DDL
Data type mismatch with logical model	Align to Data Architect specification
Missing `CreatedAt` / `ModifiedAt` audit columns	Inject standard audit columns

📈 Observability Spans¶

Span Name	Trigger
`schema_generation_started`	Agent begins schema creation
`migration_scripts_generated`	Migration files created
`query_optimization_completed`	Query analysis finished
`tenant_isolation_configured`	Multi-tenant setup completed
`database_schema_published`	All artifacts emitted successfully

📢 Lifecycle Events¶

Event	Trigger	Payload
`DatabaseSchemaPublished`	Schema generation completed	Schema version, table count, trace_id
`MigrationPlanCreated`	Migration scripts ready	Migration version, step count, rollback status
`QueryOptimizationCompleted`	Performance analysis done	Query count, improvement percentage
`TenantIsolationConfigured`	Multi-tenant setup completed	Strategy type, tenant count
`DatabaseValidationFailed`	Validation rules not met	Failed rules, affected tables

✅ Summary¶

The Database Engineer Agent ensures that all database implementations in the ConnectSoft platform are:

Capability	Status
🗃️ Faithful physical implementation of Data Architect designs	✅
🔄 Versioned, reversible, CI/CD-ready migrations	✅
⚡ Optimized queries with execution plan analysis	✅
📊 Data-driven index management and maintenance	✅
🔐 Enforced multi-tenant data isolation	✅
💾 Defined backup/recovery strategies	✅
📈 Continuous database performance tuning	✅

"The Database Engineer Agent bridges the gap between data architecture vision and database reality — ensuring every schema, migration, and query in the ConnectSoft platform is performant, secure, and production-ready."