π§ Test Automation Engineer Agent Specification¶
π― Purpose¶
The Test Automation Engineer Agent is responsible for:
Orchestrating, executing, monitoring, and reporting automated tests across all layers of the platform β transforming static test artifacts into fully operational, edition-aware, CI/CD-integrated test pipelines.
It ensures that all generated tests from other agents (like Test Case Generator and Test Generator) are:
- β Executed correctly across relevant roles, editions, and environments
- π§ͺ Validated continuously during builds, merges, and releases
- π Reported back into Studio and observability dashboards
- π οΈ Maintained through retries, environment prep, and isolation strategies
π§± What Sets It Apart from Other QA Agents?¶
| Agent | Primary Role |
|---|---|
| π§ͺ Test Case Generator | Creates static .cs, .feature, and test metadata files |
| π§ Test Generator | Expands test coverage based on prompts, gaps, and behavior |
| βοΈ Test Automation Engineer Agent | Runs the tests, connects them to pipelines, interprets results, manages environments |
| π Test Coverage Validator Agent | Measures static and dynamic test coverage |
| π€ QA Engineer Agent | Guides strategy, approves coverage, collaborates via Studio |
π§ Responsibilities in Factory Flow¶
- Executes all test types: unit, integration, BDD, security, performance
- Chooses which tests to run per pipeline context (pre-merge, nightly, release)
- Manages runtime environments, mocks, and infrastructure dependencies
- Collects results, logs, screenshots, and traces
- Validates test completeness against trace/edition coverage targets
- Reports status and regressions into:
- Studio dashboards
- Pull Request annotations
- QA and CI/CD artifacts
π§ Factory Blueprint: Execution Lifecycle¶
flowchart TD
A[TestCaseGeneratorAgent] --> B[TestArtifacts]
B --> C[TestAutomationEngineerAgent]
C --> D[TestExecutionPlan]
D --> E[TestExecution]
E --> F[TestResults]
F --> G[StudioDashboard]
F --> H[TestCoverageValidatorAgent]
β It is the execution orchestrator of the QA pipeline.
π Example Responsibilities¶
Given:
.featurefile:create_invoice.feature- Unit test:
CreateInvoiceHandlerTests.cs - Metadata:
edition = enterprise,roles = [FinanceManager, Guest] - Trigger:
PR pre-merge validation
Agent will:
- Plan edition-specific execution matrix
- Select runner (
SpecFlow,dotnet test,Playwright, etc.) - Provision test config for
enterpriseedition with mocks - Run test with
FinanceManagerandGuestroles - Collect results
- Attach results to Studio + PR + CI
π Continuous Role¶
This agent stays active throughout:
- π Pre-commit test validation
- π§ͺ Nightly test runs
- π Pre-release quality gates
- π Studio feedback and rerun triggers
- π Rerun failed test suites with new configuration
β Summary¶
The Test Automation Engineer Agent is the backbone of operational QA automation, ensuring that:
- π‘ All generated tests become running tests
- π Test plans adapt to edition, role, environment
- π§ͺ Results flow back into the Studio and feedback loops
- π Everything is trace-tagged, observable, and CI-ready
It transforms ConnectSoft's QA system from static test definitions into a living, self-updating quality automation mesh.
ποΈ Strategic Role in the Factory¶
The Test Automation Engineer Agent is the operational executor and runtime orchestrator in the QA Engineering Cluster. It connects test creation (definition) with test validation (execution) in the factory pipeline.
It ensures that all test artifacts β regardless of how or where they were generated β are executed:
- π¦ Across environments
- π Across roles and editions
- π§ͺ Within CI/CD gates
- π§ With traceable feedback into Studio and QA agents
π§© Position in Factory Cluster Topology¶
π QA Engineering Cluster¶
flowchart TD
subgraph QA Engineering Agents
A[Test Case Generator Agent]
B[Test Generator Agent]
C[Test Coverage Validator Agent]
D[Test Automation Engineer Agent]
E[QA Engineer Agent]
end
A --> D
B --> D
D --> C
D --> E
C --> E
π CI/CD Pipeline Integration Points¶
flowchart TD
CodeCommit --> Generate[Test Generation]
Generate --> TestPlan[TestExecutionPlan.yaml]
TestPlan --> TestRun[Test Automation Engineer Agent]
TestRun --> Results[TestResults + Logs]
Results --> QAReview[Studio + QA Feedback]
Results --> Coverage[Test Coverage Validator Agent]
π― Pipeline Touchpoints¶
| Stage | Test Automation Engineer Agent Role |
|---|---|
| π Pre-Build | Validates if any test setup/mocks need to be injected |
| π§ͺ Build & Test | Runs unit tests, integration tests, BDDs |
| π Retry on Failure | Re-runs quarantined or flaky tests |
| π¦ Quality Gate | Emits result summaries, thresholds |
| π₯ Pre-PR Merge | Annotates test results in Git PR |
| π§Ύ Post-Release | Executes long-running validations or scheduled test jobs |
| π§ Bug Reproduction | Re-runs tests related to failed production traces or bug triggers |
π¦ Factory Context: Service Edition & Role Flow¶
The agent sits at the intersection of QA artifacts and execution environments.
| Input | Source Agent | Consumption |
|---|---|---|
.feature, .cs |
Test Case/Test Generator Agents | Schedules test run |
test-metadata.yaml |
Generator Agents | Builds test plan matrix |
Studio prompt |
QA Agent | Replays or triggers test |
trace_id + edition |
Blueprint + CI | Isolates test context |
qa-plan.yaml |
QA Engineer Agent | Orchestrates which sets must run in this build |
π§ Real-Time Role in Studio¶
- Monitors trace coverage gaps
- Responds to "Rerun with new edition config" requests
- Logs scenario results into the UI per edition/role combo
- Sends test failures β bug resolver or retry system
π Sample Workflow: Pull Request¶
- Developer commits new handler
- Test Case Generator adds
CreateInvoiceHandlerTests.cs - Test Generator adds
.featurefor Guest role - Test Automation Engineer Agent:
- Selects edition:
enterprise - Executes
.featurescenarios + unit test with role matrix - Collects logs and reports
-
Posts PR comment:
β 6 tests passed β 1 test failed for Guest role β opened retry job
- Selects edition:
β Summary¶
The Test Automation Engineer Agent is strategically positioned to:
- Operate at the boundary of test design and test validation
- Connect generation with runtime execution across editions
- Power Studio insights, QA metrics, and CI test reliability
- Serve as the automated executor and verifier in ConnectSoftβs QA flow
π Responsibilities¶
The Test Automation Engineer Agent owns the end-to-end orchestration of test execution, from planning which tests to run, to executing them across environments and roles, to reporting results with full traceability.
Its goal is to ensure that all tests produced in the factory are continuously validated, observable, reproducible, and reliable.
β Key Responsibilities Breakdown¶
| Responsibility | Description |
|---|---|
| 1. Execute All Test Types | Runs unit tests, integration tests, BDD .feature scenarios, security tests, edge cases |
| 2. Apply Edition + Role Context | Executes tests with edition-specific configuration and per-role identity injection |
| 3. Orchestrate CI/CD Test Runs | Integrates with Azure DevOps (or other CI), runs during build, PR, and release |
| 4. Monitor Test Results | Collects pass/fail states, logs, telemetry, screenshots (for UI/e2e) |
| 5. Handle Retries and Quarantine | Re-runs flaky or failed tests and marks unstable ones for investigation |
| 6. Generate Test Execution Plan | Uses test-metadata.yaml and QA plan files to construct dynamic run sets |
| 7. Enforce Test Execution Policies | Applies timeouts, concurrency rules, isolation modes, and system constraints |
| 8. Emit Execution Metrics | Publishes results and execution stats to Studio, QA reports, and dashboards |
| 9. Trace Result Back to Test Generator | Links results to originating generator, trace ID, edition, role, handler |
| 10. Integrate with Studio | Shows real-time and historical test results, role/edition matrix views, retry buttons |
| 11. Support Manual Triggers from QA | Allows on-demand test execution per trace, scenario, edition, or prompt |
| 12. Schedule Tests | Runs tests at regular intervals (e.g., nightly regression, weekly release blockers) |
| 13. Provide Failure Context | Logs, screenshots, span traces, and output are made available for debugging |
| 14. Generate Artifacts | Produces .trx, .xml, .json, and Markdown reports for every test run |
| 15. Monitor Resource Usage | Optimizes test execution for parallelism and execution time tracking |
| 16. Support Cross-Service Integration Tests | Coordinates with other services or mocks where needed |
| 17. Handle Edition/Feature Toggles | Injects correct feature flags or behavior constraints before execution |
| 18. Maintain Observability Hooks | Emits OpenTelemetry spans and error metrics for each test run |
| 19. Recover from Failures Gracefully | Runs retries, captures logs, and prevents blocking unrelated pipelines |
| 20. Validate Test Definitions Before Execution | Ensures test is syntactically and structurally valid before run time |
π― Responsibility Scope vs. Other QA Agents¶
| Capability | Generator Agents | Test Automation Engineer Agent |
|---|---|---|
| Test Definition | β | β |
| Test Expansion | β | β |
| Test Execution | β | β |
| CI/CD Orchestration | β | β |
| Retry Handling | β (except for prompt-level) | β |
| Logs, Artifacts, Dashboards | β | β |
| Trace Tagging, Assertion Monitoring | Partial | Full runtime span capture |
π Real-World Execution Example¶
For handler CapturePaymentHandler, with these generated files:
CapturePaymentHandlerTests.cscapture_payment.featuretest-metadata.yaml
And this edition context:
The agent will:
- Generate matrix of
(role, edition)pairs - Inject
enterpriseconfiguration into the test environment - Set identity to
Cashierβ run unit and.featuretests - Repeat with
Guestrole - Record pass/fail logs per step
- Emit Studio summary and CI/CD job artifact
- If Guest fails with 403, send retry trigger to QA workflow
β Summary¶
The Test Automation Engineer Agent transforms ConnectSoftβs tests from static artifacts into:
- π§ͺ Live, continuous, traceable executions
- π Measurable QA results with studio visibility
- π¦ Reliable CI/CD signals that validate quality gates
- π Retryable, observable, and role/edition-specific feedback loops
This ensures that quality is enforced, not assumed, across the entire software factory lifecycle.
π₯ Inputs¶
To execute tests intelligently and reliably, the Test Automation Engineer Agent consumes a multi-source set of inputs from upstream agents, CI pipelines, configuration files, and Studio.
These inputs allow it to:
- π Know what to run
- π§ͺ Determine how and where to run it
- π― Apply context: trace, edition, roles, feature flags
- π Support retry logic, execution control, and observability hooks
π¦ Primary Inputs by Type¶
| Input Type | Description | Source |
|---|---|---|
| Test Artifacts | .cs, .feature, step definitions, test classes |
Test Case Generator / Test Generator |
| Test Metadata | test-metadata.yaml, test-augmentation-metadata.yaml |
Generator Agents |
| Trace Context | trace_id, handler_name, edition, roles, blueprint_id |
Blueprint, Generator Agents |
| QA Plan Definitions | qa-plan.yaml per microservice or feature cluster |
QA Engineer Agent |
| CI/CD Trigger Metadata | Pipeline ID, PR ID, environment, build scope | Azure DevOps / GitHub Actions |
| Studio Prompts or Actions | Manual rerun request, trace-specific execution | Studio (QA UI) |
| Test Matrix Templates | Role Γ Edition Γ Scenario pairing logic | Factory test matrix schema |
| Environment Variables & Secrets | Edition config, identity injection, mocked services | Config Services / Secrets Store |
| Test Execution Constraints | Timeouts, max retries, parallel limits, tags to skip | Config files + QA Agent input |
| Memory Lookups (optional) | Past test runs for trace-aware diff coverage | Memory + History Service |
| Bug Trace or Rerun Request | Replay test case linked to past bug ID | Bug Resolver Agent or Studio |
π§© Example: test-metadata.yaml¶
trace_id: capture-2025-0281
blueprint_id: usecase-9361
module: PaymentsService
handler: CapturePaymentHandler
roles_tested: [Cashier, Guest]
test_cases:
- type: unit
file: CapturePaymentHandlerTests.cs
- type: bdd
file: capture_payment.feature
- type: validator
file: CapturePaymentValidatorTests.cs
edition_variants: [enterprise, lite]
β Agent builds execution plan:
- Run
.csand.featurefiles - For
CashierandGuest - In both
enterpriseandliteeditions
π§ Input: QA Execution Plan¶
service: PaymentsService
build_type: pull_request
required_tests:
- all unit
- all BDD tagged @security
- edition-specific scenarios for @enterprise
optional_tests:
- validator
- duplicate scenarios (marked @retired)
β Determines filtering and prioritization of what will be run for the build step.
π Input: Studio Manual Trigger¶
{
"action": "rerun",
"trace_id": "invoice-2025-0147",
"edition": "enterprise",
"role": "Guest",
"scenario": "Guest tries to approve invoice"
}
β Agent re-executes only the matching .feature scenario under specified context.
π Environment Inputs¶
| Variable | Used For |
|---|---|
EDITION=enterprise |
Injects config toggles, flags, mocks |
USER_ROLE=Guest |
Sets identity or token for test runner |
TEST_RUN_ID=build_4871 |
Traceability |
QA_TRIGGER_SOURCE=studio.manual |
Audit trails and retry tagging |
FEATURE_FLAGS_ENABLED=true |
Enables toggled behaviors in runtime |
ISOLATE_TESTS=true |
Enforces containerized test environment |
β Summary¶
The Test Automation Engineer Agent consumes a comprehensive and trace-rich input graph that includes:
- π§ All test assets from factory generation
- π§ͺ Execution instructions, constraints, roles, editions
- ποΈ Environment config and identity injection
- π Trace-aware hooks for audit, retry, and QA validation
This gives it the flexibility and intelligence to run only what matters, while maintaining full traceability and CI compliance.
π€ Outputs¶
The Test Automation Engineer Agent transforms static test artifacts and trace metadata into executed results, runtime logs, and trace-linked feedback.
Its outputs are designed to:
- π Feed Studio dashboards with result status
- π Generate detailed test reports and logs for CI/CD
- π Maintain traceability back to
trace_id, edition, role, and blueprint - π Support retries, analysis, and coverage deltas
π¦ Primary Output Artifacts¶
| Output Type | Description | Format/Example |
|---|---|---|
| Test Results | Pass/fail status for each test executed | .trx, .json, .xml, Markdown |
| Test Execution Summary | High-level run result per trace/handler | test-execution-summary.yaml |
| Test Logs | Output from test runners, assertions, stack traces | .log, .txt |
| Screenshots & Traces | UI or system-level artifacts for failures | .png, .har, .trace.json |
| Coverage Delta Reports | Before/after snapshot of role/scenario coverage | trace-coverage-diff.yaml |
| QA Report Files | Human-readable reports pushed to Studio | qa-execution-report.md |
| Observability Events | OTel spans, test-level metrics and logs | test-execution-events.jsonl |
| Retry Metadata | Captures retry attempts and success/failure status | execution-retry-history.yaml |
π Example: Test Execution Summary¶
trace_id: invoice-2025-0147
handler: CreateInvoiceHandler
edition: enterprise
roles_executed:
- FinanceManager
- Guest
summary:
total_tests: 6
passed: 5
failed: 1
retries: 1
duration_seconds: 27.4
last_run_at: 2025-05-17T12:44:09Z
report_files:
- invoice_trace0147_execution.trx
- test-output.log
- qa-execution-report.md
π§ Markdown QA Report Output¶
### π§ͺ Test Execution Report β CreateInvoiceHandler
π Trace: invoice-2025-0147
π·οΈ Edition: enterprise
π Roles Tested: FinanceManager, Guest
β
Passed:
- Handle_ShouldReturnSuccess_WhenValidInput
- Scenario: Successful invoice creation
β Failed:
- Scenario: Guest attempts invoice approval
Reason: StatusCode was 200, expected 403
π Retry Attempt: β
Success on second run
π Trigger: Pre-merge CI pipeline
π¦ Artifacts: `invoice-2025-0147.trx`, `guest-403.log`
π Observability Output (JSONL Span Log)¶
{
"event": "TestExecuted",
"trace_id": "invoice-2025-0147",
"role": "Guest",
"scenario": "Guest attempts invoice approval",
"status": "failed",
"status_code": 200,
"expected_code": 403,
"duration_ms": 420,
"retried": true
}
β Ingested by Studio + Monitoring dashboards
π File Output Directory (example)¶
/test-results/
βββ invoice-2025-0147/
β βββ qa-execution-report.md
β βββ invoice-2025-0147.trx
β βββ guest-403.log
β βββ invoice-2025-0147.trace.json
β βββ test-execution-summary.yaml
π Traceability Metadata¶
Each test result includes:
augmented_by: test-automation-engineer-agent
source_trace_id: invoice-2025-0147
generated_from: CreateInvoiceHandlerTests.cs
executed_roles: [Guest]
edition: enterprise
execution_status: failed
retry_count: 1
π Output Triggers for Other Agents¶
| Agent | Triggered By |
|---|---|
| Bug Resolver Agent | Test failure with trace β emits reproduction workflow |
| Test Coverage Validator Agent | Coverage delta report β updates scenario heatmap |
| Pull Request Creator Agent | Pass/fail status β PR comment annotations |
| QA Engineer Agent | Markdown + artifact summary β integrated into test plan reviews |
| Studio | Real-time display of test outcome per role + edition |
β Summary¶
The Test Automation Engineer Agent emits:
- π Machine-readable results (
.trx,.json) - π Human-readable markdown reports
- π Execution summaries for trace/handler/role/edition
- π§ͺ Logs, artifacts, and retry metadata
- π Events and observability spans for feedback loops
This ensures that every test produced by the factory is verifiably executed, auditable, and explorable by humans and machines.
π§ͺ Supported Test Types and Runners¶
The Test Automation Engineer Agent supports execution of all test types produced by the ConnectSoft factory:
- β Unit tests
- β Integration tests
- β BDD/Scenario tests
- β Validator/FluentValidation tests
- β Security and access control tests
- β Retry, resiliency, and chaos scenario validations
- β Edition-variant test paths
- β Prompt-augmented edge and AI-generated cases
Each test type is executed using the appropriate test runner, identity injection, and configuration context.
π¦ Supported Test Types¶
| Test Type | Description | Source |
|---|---|---|
| Unit Tests | Test IHandle<T> logic in isolation with mocks |
.cs via Test Case Generator |
| Validator Tests | Test FluentValidation rules in DTOs | .cs via Test Case Generator |
| Integration Tests | Run HTTP/gRPC endpoints in test host | .cs, WebApplicationFactory |
| BDD Scenario Tests | Run .feature files + Steps.cs via SpecFlow |
Test Generator |
| Security Role Tests | Assert behavior under different roles/claims | Scenario + Role Matrix |
| Negative & Edge Case Tests | Handle nulls, invalid values, format issues | AI-generated .cs + .feature |
| Edition-Aware Scenarios | Tests scoped to specific feature toggles | Edition variants in .feature |
| Performance Hooks (Optional) | Smoke & runtime diagnostics | BDD + scenario timer tags |
| Replay & Regression Tests | Tests triggered from bug trace | Bug Resolver / Studio Manual |
| Manual Triggered Tests | Executed on demand from Studio trace view | QA Prompt or QA Plan |
π§° Test Runners and Tools Used¶
| Runner | Test Type | Integration |
|---|---|---|
dotnet test |
Unit, validator, integration | MSTest, xUnit |
SpecFlow CLI |
.feature + Steps.cs |
Scenario tests |
Playwright or Cypress (Optional) |
End-to-end UI | For web-scenario validation |
Azure DevOps Test Tasks |
CI/CD orchestration | Hosted agents |
FeatureToggleHarness |
Simulates edition-based configurations | Injects runtime context |
TestIsolationExecutor |
Runs scenarios in isolated containers for concurrency | Parallel test runs |
RoleInjectionRunner |
Wraps test runs in identity context (JWT, headers, claims) | Security scenario execution |
π BDD Scenario Execution¶
For a .feature file like:
Scenario: Guest cannot cancel invoice
Given the user is Guest
When they submit a cancel request
Then the system returns 403 Forbidden
Agent uses:
- β SpecFlow test runner
- β
enterpriseconfig injected - β Guest token generated
- β Scenario executed in parallel run
Outputs:
- β
PassedorFailed - β Studio shows result and trace ID
- β CI pipeline validates pre-merge status
π§© Execution by Scenario Tags¶
The agent dynamically selects test cases using tags:
| Tag | Action |
|---|---|
@role:CFO |
Sets identity as CFO |
@edition:lite |
Injects feature flags/config for lite edition |
@security |
Prioritized in pre-release checks |
@retry |
Rerun until successful or marked unstable |
@prompt_generated |
Runs with explanation markdown included |
@performance |
Measures step duration and total scenario time |
π― Real-World Example Execution Plan¶
trace_id: refund-2025-0143
handler: IssueRefundHandler
roles: [SupportAgent, Guest]
edition: enterprise
test_types:
- unit: IssueRefundHandlerTests.cs
- bdd: refund_flow.feature
execution_matrix:
- edition: enterprise
role: SupportAgent
- edition: enterprise
role: Guest
runners_used:
- dotnet test
- SpecFlow CLI
identity: jwt injected
env_config: edition-enterprise.json
β Summary¶
The Test Automation Engineer Agent supports a wide range of test types and execution strategies, including:
- π― Precision selection by role and edition
- π§ͺ Rich support for both
.cs-based and.feature-based test flows - π§ Handling of intelligent, AI-suggested edge and security tests
- π¦ Runner selection that matches the architecture: MSTest, SpecFlow, Playwright, etc.
- π All runs are trace-tagged, version-aware, and CI-integrated
This flexibility ensures every type of test generated by the factory can be executed, validated, and reported β reliably and observably.
π― Edition- and Role-Aware Test Execution Planning¶
One of the Test Automation Engineer Agentβs most powerful capabilities is its ability to execute the same test logic under different editions and roles, enabling:
- β Validation of feature toggles and edition-specific behavior
- π Testing of role-based access control paths (success, rejection, escalation)
- π Traceability of behavior per edition and user identity
- π Proper QA coverage across multi-tenant, multi-tier SaaS configurations
π§© Key Execution Concepts¶
| Dimension | Description |
|---|---|
| Edition Awareness | Executes tests in different product tiers (lite, pro, enterprise) by injecting config, flags, mocks |
| Role Awareness | Wraps test executions in identity contexts (claims, JWTs, headers) to simulate real user roles |
| Matrix Execution | Builds edition Γ role matrix per test case and executes each variant independently |
| Test Grouping | Batches tests into parallelizable segments per edition/role pair |
| Trace Result Aggregation | Tags and stores test results per trace_id, edition, and role, enabling Studio dashboards to reflect fine-grained outcomes |
𧬠Execution Matrix Example¶
For the following scenario:
- Handler:
CancelInvoiceHandler - Editions:
lite,enterprise - Roles:
FinanceManager,CFO,Guest
Agent builds this plan:
| Edition | Role | Action |
|---|---|---|
lite |
FinanceManager |
Run unit + .feature |
lite |
Guest |
Run .feature β expect 403 |
enterprise |
CFO |
Run unit + .feature |
enterprise |
Guest |
Run .feature β expect 403 |
Total test executions: 4 variants
βοΈ How Agent Applies Edition Context¶
| Step | Action |
|---|---|
1. Load edition config |
Reads flags from edition-enterprise.json |
| 2. Inject runtime env | Passes config to test host, DI container, or service harness |
| 3. Override mocks | Enables/disables behaviors based on edition toggles |
| 4. Set test tag context | Tags test results with edition in metadata and Studio |
| 5. Record outcomes per edition | Stores results for dashboards, deltas, and trace views |
π Role Injection Flow¶
| Role Type | Injection Strategy |
|---|---|
| JWT Claims | role=FinanceManager, scope=invoice.write |
| Header | x-user-role: CFO |
| CLI Arg | --role Guest passed to test runner |
| DI Overload | Injects identity context into handlers/controllers |
Agent ensures identity is enforced at:
- Test framework level (for BDD steps)
- HTTP/gRPC request level (for integration tests)
- Middleware/auth policies during in-process tests
π Output Metadata per Variant¶
trace_id: invoice-2025-0147
handler: CancelInvoiceHandler
executed_roles:
- Guest
- CFO
- FinanceManager
edition: enterprise
scenario: Cancel invoice after approval
results:
- role: Guest
edition: enterprise
result: failed
reason: expected 403, got 200
- role: CFO
edition: enterprise
result: passed
π Studio Dashboard View¶
| Scenario | Guest (lite) | Guest (enterprise) | CFO (enterprise) |
|---|---|---|---|
| Cancel after approval | β Forbidden (403) | β Unexpected 200 | β Approved |
β Red cell triggers QA review or re-gen from Test Generator Agent.
π§ Adaptive Execution Planning¶
The agent automatically:
- Skips unneeded editions/roles if already covered
- Merges overlapping editions if config is equivalent
- Prioritizes
@security-tagged roles in pre-release runs - Schedules missing variants flagged by Test Coverage Validator Agent
β Summary¶
The Test Automation Engineer Agent executes all tests using a matrix of editions and roles, ensuring:
- π¦ Multi-tier SaaS configurations are fully covered
- π Access control paths are enforced and validated
- π Results are traceable by edition/role per test
- π QA and Studio benefit from full-spectrum observability
This capability is essential for validating multi-tenant, feature-variant, role-sensitive SaaS platforms like those produced by ConnectSoft AI Software Factory.
π Pipeline Integration (CI/CD, Pre-Merge, Release Gates)¶
The Test Automation Engineer Agent isnβt just a test executor β itβs a test execution orchestrator for factory pipelines. It ensures that all generated tests:
- π§ͺ Run at the right stage (PR, build, release, nightly)
- π¦ Block or permit deployment based on test results
- π Report to the right systems (Studio, QA reports, Azure DevOps)
- π Rerun selectively in case of failure or QA request
- π§± Respect edition, role, scope, and test type constraints
π§© Key CI/CD Integration Responsibilities¶
| Stage | Agent Responsibility |
|---|---|
| Pre-Build | Validates test folders, prepares edition config/mocks |
| Build/PR | Executes unit + BDD + security tests for scope of change |
| Post-Build / Coverage Check | Compares what ran vs. what was expected |
| Pre-Release | Runs full matrix (editions Γ roles Γ scenarios) |
| Nightly / Scheduled | Executes slow, full, exploratory, or randomized sets |
| On-Demand | Supports Studio-triggered re-runs or trace validation jobs |
| Test Coverage Validator Hooks | Feeds actual run data into gap analysis |
| Pull Request Integration | Annotates PRs with test summary and trace status |
| Release Gate | Blocks pipeline if regressions, failures, or coverage drop below threshold |
π Sample CI/CD Pipeline Structure¶
- stage: BuildAndTest
jobs:
- job: UnitTests
steps:
- run: dotnet test Payments.UnitTests.csproj
- publish: *.trx
- job: BDDTests
steps:
- run: specflow run refund_flow.feature --edition=enterprise --role=Guest
- publish: *.json
- job: StudioReport
steps:
- run: agent generate-qa-report --trace invoice-2025-0147
- publish: qa-execution-report.md
β Agent controls test runner invocation, context injection, and result publication.
π Agent Output β PR Annotation Example¶
π Trace: invoice-2025-0147
β
6 tests passed | β 1 failed (Guest role, edition=enterprise)
π Retry: success on 2nd attempt
π Report: /test-results/invoice-2025-0147/qa-execution-report.md
π¦ Release Gates Example¶
Agent emits thresholds:
required_coverage:
roles: 100%
editions: 100%
@security: all must pass
allow_if:
unstable_scenarios < 2
retries < 3
β Fails the gate if @security scenario for Guest returns 200 instead of 403.
π§ Pipeline-Aware Execution Scoping¶
| Change Detected | Action |
|---|---|
| Handler changed | Only run tests with that trace_id |
| Edition config updated | Run edition-specific .feature only |
| DTO structure changed | Run validator + integration tests |
| Role added to access map | Run security scenarios across new role |
| Studio prompt | Rerun trace-scenario on demand |
π§Ύ Outputs Stored in CI¶
| File | Description |
|---|---|
*.trx |
MSTest/XUnit result output |
*.json |
SpecFlow/BDD structured test output |
qa-execution-report.md |
Human-readable QA result summary |
test-execution-summary.yaml |
Machine-readable pipeline trace |
retry-metadata.json |
Retry history and resolution status |
π§ Studio & DevOps Integration¶
| Feature | Outcome |
|---|---|
| π Retry failed test | Run single scenario via trace ID |
| π Link test β PR | Add trace result to Git PR comment |
| π Studio dashboard updates | Show edition/role matrix from actual execution |
| π§ͺ Execution delta detection | Only run what changed (intelligent scoping) |
β Summary¶
The Test Automation Engineer Agent integrates seamlessly into CI/CD by:
- π§ͺ Running the right tests at the right stage
- π Reporting test status per trace, role, and edition
- π¦ Enforcing gates for regressions and coverage
- π Supporting retries, replays, and QA-initiated flows
- π Publishing human- and machine-readable results across platforms
This ensures that every pipeline in the ConnectSoft Factory is quality-enforced, trace-driven, and test-aware by design.
π― Test Suite Composition and Selection Strategy¶
The Test Automation Engineer Agent must not run everything on every build β it must select the most relevant and trace-aligned tests based on:
- π‘ What changed (code, edition config, roles)
- π¦ What trace ID or module was triggered
- π Which tests cover the affected paths
- π¦ What QA policy, prompt, or CI gate applies
- π§ What has already been tested and validated
This results in precise, efficient, edition-aware test execution optimized for both velocity and quality.
π¦ What a Test Suite Consists Of¶
| Element | Description |
|---|---|
| Test Units | Individual test methods or .feature scenarios |
| Execution Targets | Role Γ Edition Γ Scenario variants |
| Runner Context | Identity injection, env overrides, edition flags |
| Tags/Scopes | e.g. @security, @regression, @edge |
| Test Type | Unit, BDD, Validator, Integration, Replay, Prompt-based |
| Priority Level | High (pre-merge), Medium (post-build), Low (nightly) |
π§© Strategy for Test Suite Selection¶
1. Trace-Aware Matching¶
- If
trace_id: cancel-2025-0142was generated, select:- All
.cstests fromCancelInvoiceHandlerTests.cs - All scenarios from
cancel_invoice.feature - Role/edition combinations listed in
test-metadata.yaml
- All
2. Tag-Based Inclusion¶
- Select all scenarios with:
@securityβ enforced in all editions@prompt_generatedβ always run once@chaos,@retryβ scheduled or nightly only
3. Edition/Role Expansion¶
- If
edition = proandrole = CFO, auto-expand.featurescenarios tagged:@edition:pro@role:CFO- Default (
@edition:allor untagged)
4. Change-Based Diff Scoping¶
- Code diff touches:
CreateInvoiceHandler.csβ select unit tests +.featuremapped via traceedition-enterprise.jsonβ select edition-sensitive scenarios
5. QA Plan Inclusion¶
qa-plan.yamldefines:
6. Bug Trace Replay¶
- Bug #4281 marked
CustomerId = null- Find all matching failed traces
- Rerun only affected
.featurescenarios + validator tests
π Test Selection YAML Snapshot¶
execution_scope:
trace_id: cancel-2025-0142
selected_tests:
- CancelInvoiceHandlerTests.cs
- cancel_invoice.feature
roles: [CFO, Guest]
editions: [lite, enterprise]
tags_required: [security]
sources: [test-generator, test-case-generator]
π Test Exclusion Logic¶
| Reason | Exclusion |
|---|---|
Scenario tagged @retired |
Not executed |
Marked as flaky in retry log |
Deferred to nightly job |
| Role not supported in edition | Skipped with note |
| Already passed in current pipeline context | Reuse result unless override requested |
ποΈ Studio-Controlled Scope Override¶
Studio allows:
- Manual trace re-run (specific handler + scenario)
- Edition re-simulation
- βRun only security scenarios for this handlerβ
- Prompt-based scope trigger β selects only new prompt scenarios
β Summary¶
The Test Automation Engineer Agent composes test suites dynamically by:
- π§ Selecting only whatβs relevant based on trace, diff, tags, edition, and role
- π¦ Using QA plans, change diffs, and test metadata to optimize scope
- π Ensuring all tests run in the correct configuration context
- π Allowing full traceability and replay from Studio or CI events
This strategy keeps execution fast, relevant, and intelligent across all QA pipelines.
π― Failure Handling, Retries, and Quarantining Logic¶
Even well-defined tests fail β due to:
- π« Intermittent infrastructure issues
- π Flaky behavior
- π Authorization mismatch
- π§ Logic or data regressions
- π§ͺ Newly introduced bugs
The Test Automation Engineer Agent handles failures through a controlled, trace-aware, and observable retry + quarantine system that ensures:
- π Legitimate bugs are surfaced
- β οΈ Flaky tests are isolated, not ignored
- π‘οΈ Release gates are protected from noise
- π Trace logs, retries, and failures are audit-safe
π Retry Strategy¶
| Type | Behavior |
|---|---|
| Automatic Retry | Reruns failed test up to N times (default: 2) |
| Conditional Retry | Only reruns on network, timeout, or transient conditions |
| Prompt-Aware Retry | For QA-triggered scenarios, retries with modified assertions |
| Edition-Specific Retry | Only retries failed combinations of role Γ edition |
| Feature Retry Scope | .feature scenario failed β rerun only that scenario, not the whole suite |
| Replay + Compare | Compares retry result to initial run β records delta |
π Retry Metadata¶
trace_id: refund-2025-0143
scenario: Refund twice β error
first_result: Failed (403 expected, 200 received)
retry_count: 1
retry_successful: true
recovery_type: edition_misconfig
auto_quarantine: false
π¦ Failure Triage Levels¶
| Failure Type | Action |
|---|---|
| β Hard Failure (assertion, exception) | Mark as failed and reported in Studio, PR |
| π Transient Error (network, timeout) | Retry up to limit |
| β οΈ Flaky Detected (in retry history) | Auto-quarantine or mark unstable |
| π§ Prompt-Sourced Test Fails | Re-evaluate scenario accuracy + alert QA Agent |
| π Role-Specific Misbehavior | Trigger Security Scenario Review flow |
| π Bug Trace Test Fails Again | Escalate to Bug Resolver Agent |
π§± Quarantine Process¶
sequenceDiagram
TestRun->>RetryCheck: Detect flaky scenario
RetryCheck-->>QuarantineStore: Flag test as unstable
QuarantineStore->>Studio: Display warning icon
QuarantineStore->>QAEngineerAgent: Suggest review
TestAutomationAgent->>CI: Exclude from gate evaluation
Tagged in metadata:
π Studio Display (Flaky / Quarantined)¶
| Scenario | Role | Edition | Status | Retry | Quarantine |
|---|---|---|---|---|---|
| Cancel after approval | CFO | enterprise | β | β success | π« flagged |
| Guest refund | Guest | lite | β | β | β quarantined |
β Users can manually retry or "unquarantine and rerun."
π§ͺ Retry Triggers¶
| Trigger | Source |
|---|---|
| Failure pattern matched | Test result log |
| βRetry testβ clicked in Studio | Manual trigger |
| Bug Resolver Agent rerun | Post-regression |
| New edition config pushed | Retest edition-variant scenarios |
| CI instability warning | Test run duration variance, memory spike |
π Retry Reporting Summary¶
Markdown report:
### π Retry Summary for Trace: refund-2025-0143
- β First run failed: Guest refund β expected 403, got 200
- π Retried with correct edition config
- β
Result: Passed on retry
- π§ Trace marked as unstable (retry count = 1)
- π Added to retry-metadata.yaml + Studio warning dashboard
β Summary¶
The Test Automation Engineer Agent handles failures using:
- π Controlled, intelligent retries
- π§ͺ Scenario-level execution granularity
- π Trace-linked failure metadata and retry history
- π§ Auto-quarantine with Studio + QA visibility
- π Markdown and machine-readable summaries for all retry events
This ensures the platform maintains trustworthy, reproducible, and fault-tolerant test execution β and avoids false passes or false fails.
π§© Environment Provisioning and Configuration Injection¶
This cycle defines how the Test Automation Engineer Agent ensures that automated tests can run in consistent, isolated, and configurable environments across dev, staging, and production, supporting both infrastructure and configuration-specific automation.
ποΈ Role in the ConnectSoft Platform¶
flowchart TD
BlueprintReady -->|Triggers| TestAutomationEngineerAgent
TestAutomationEngineerAgent -->|Emits| TestContainersConfig
TestContainersConfig --> CIEnvironment
CIEnvironment -->|Executes| AutomatedTests
The agent ensures that test containers, mocks, secrets, and tenant-specific runtime settings are properly initialized before executing test suites.
βοΈ Responsibilities in this Cycle¶
| Area | Behavior |
|---|---|
| Container Test Environment | Use Docker/TestContainers to spin up databases, queues, and services. |
| Environment-Specific Config | Inject appsettings.Development.json, .env.test, or Pulumi configs. |
| Secrets Injection | Pull environment-specific secrets (e.g., test tokens) from Key Vault. |
| Feature Flag Toggle | Enable test-only scenarios via Microsoft.FeatureManagement or mocks. |
| Parallelization Strategy | Coordinate parallel test execution across agent pools with isolated state. |
𧬠Memory & Retrieval¶
The agent retrieves:
- Test container configurations from template blueprints
- Runtime dependencies from execution-metadata.json
- Config layers from Memory Blob or DevOps Git
It emits:
testcontainers.config.yaml.env.testTestRuntimeInstructions.md
π§ Prompt Design Snippet¶
skill: ProvisionTestEnvironment
context:
moduleId: NotificationService
stage: CI
tenantId: vetclinic-001
runtime:
db: PostgreSQL
messaging: RabbitMQ
env: test
secrets: true
β Output Expectation¶
| File | Description |
|---|---|
docker-compose.test.yaml |
Starts services in test mode (DB, queues, mock APIs) |
.env.test |
Injects runtime variables for test context |
appsettings.Test.json |
Overrides to support test instrumentation |
TestRuntimeInstructions.md |
Documentation for human/agent consumption |
π Observability & Traceability¶
Each provisioning step emits:
traceId,executionIdtestEnvironmentProvisionedevent- Span:
setup:test-container - Log metadata: which services spun up, ports used, secrets resolved
π£ Collaboration Hooks¶
| Partner Agent | Exchange |
|---|---|
Infrastructure Engineer |
Shares Pulumi, Bicep, or infra mocks |
DevOps Agent |
Injects generated config into CI/CD pipeline |
QA Agent |
Pulls test run config and runtime logs |
π§© Summary¶
Cycle 11 ensures that all tests run in clean, reproducible, and isolated environments, enforcing:
- Config fidelity across test environments
- Predictable runtime behavior with secrets and mocks
- Clear, observable execution chains
π§ Without this cycle, automated tests would become fragile, flaky, or misconfigured across tenants and stages.
π§ Multi-Tenant Test Adaptation¶
In the ConnectSoft AI Software Factory, every test must be aware of the tenant it targets β because tenants may differ in:
- π Locale or language
- π¦ Feature flags or modules
- π Security policies
- π³ Edition (Lite, Pro, Enterprise)
- π οΈ Custom business rules (e.g., VAT, timezone logic)
The Test Automation Engineer Agent is responsible for dynamically adapting test executions to match the correct tenant context, making all test results tenant-accurate and traceable.
π§© What Multi-Tenant Adaptation Includes¶
| Aspect | Description |
|---|---|
| Tenant Context Injection | Injects tenant ID, tenant-specific config, identity providers |
| Edition Filtering | Runs only those tests that are applicable to a tenant's edition |
| Custom Rule Overrides | Activates or disables rule sets per tenant in test config |
| Localized Assertions | Adjusts assertion expectations (e.g., messages in fr-FR, he-IL) |
| Isolated Runtime Environments | Runs each tenant test in isolated state (e.g., DB per tenant) |
βοΈ Configuration Strategy¶
Agent retrieves:
- Tenant blueprint from
tenant-manifest.yaml - Edition + feature flags from
edition-config.json - Secrets and connection strings from
KeyVault:{tenant} - Localization strings from
tenant-locale-resources.json
Applies them before test execution, logs result as:
test_context:
tenant_id: vetclinic-001
edition: enterprise
locale: en-US
feature_flags: [EnableLateFee, AllowBulkCancel]
π§ͺ Test Matrix Expansion¶
Given 3 tenants:
| Tenant | Edition | Locale |
|---|---|---|
vetclinic-001 |
enterprise | en-US |
dentalcare-033 |
lite | fr-FR |
visionplus-902 |
pro | he-IL |
And a .feature file for "Invoice Cancellation"
Agent executes:
- 3 runs Γ roles Γ scenarios
- Adjusts expected error messages (localized)
- Loads tenant-specific feature toggles (e.g., bulk invoice disablement)
π Test Metadata Per Tenant¶
trace_id: invoice-2025-0172
tenant_id: vetclinic-001
edition: enterprise
locale: en-US
role: CFO
feature_flags:
- AllowLateInvoice
result: passed
π Retry & Feedback Adaptation¶
If test fails due to tenant-specific config:
- Agent logs failure reason
- Adjusts config and retries
- Tags result as:
π Studio Impact¶
Test results are grouped per tenant:
π§ͺ Tenant: vetclinic-001
β
CFO Approves Invoice
β
Guest Access Denied
β Missing VAT Scenario β Suggested by Test Generator
β QA engineers can toggle tenant filters in Studio to see test impact.
π Collaboration Hooks¶
| Agent | Integration |
|---|---|
| Studio Agent | Renders per-tenant test matrix |
| Tenant Provisioner Agent | Provides dynamic tenant blueprint |
| Test Generator Agent | Suggests scenarios for uncovered tenant rules |
| Test Coverage Validator | Detects gaps per tenant+edition |
β Summary¶
This cycle ensures the Test Automation Engineer Agent can run every test as if it was the tenant, providing:
- π§ Accurate rule validation
- π‘οΈ Configuration-scoped testing
- π Locale- and language-aware assertions
- π Tenant-specific observability in dashboards
- β CI/CD trust per SaaS tenant instance
Without multi-tenant test adaptation, the factoryβs SaaS coverage model would break down in real-world deployments.
π§ Execution Observability and Traceability¶
In a platform as large and dynamic as ConnectSoft, every test execution must be observable, traceable, and audit-safe β across tenants, editions, roles, environments, and blueprints.
The Test Automation Engineer Agent emits observability signals and metadata that enable:
- π Real-time execution tracking
- π§ͺ Debugging of failed tests
- π Trace-to-test lineage
- π Retry visibility
- β QA validation and audit trails
π‘ Observability Data Emitted¶
| Data Type | Tool/Format | Description |
|---|---|---|
| OpenTelemetry Spans | OTel JSON | Captures start/end of test run, trace ID, role, edition |
| Structured Logs | .jsonl or Serilog |
Logs test inputs, outputs, assertions, retries |
| Execution Snapshots | execution-metadata.yaml |
Per-test result data with context |
| Trace Logs | .trace.json or .har |
Captures test-level request/response data |
| Error/Retry Metadata | retry-history.yaml |
Tracks retries, failure types, recovery paths |
| QA Markdown Reports | qa-execution-report.md |
Human-readable output for Studio |
π Span Example (OpenTelemetry)¶
{
"trace_id": "refund-2025-0143",
"span_name": "ExecuteScenario:GuestCannotCancel",
"start_time": "2025-05-17T12:00:00Z",
"duration_ms": 428,
"attributes": {
"tenant": "vetclinic-001",
"edition": "enterprise",
"role": "Guest",
"result": "failed",
"expected_status": 403,
"actual_status": 200
}
}
β Forwarded to centralized observability system or Studio backend.
π Execution Metadata YAML¶
trace_id: invoice-2025-0147
handler: CancelInvoiceHandler
role: CFO
edition: enterprise
locale: en-US
status: passed
start_time: 2025-05-17T12:01:02Z
duration: 3.4s
assertions:
- type: status_code
expected: 200
actual: 200
result: passed
retry_count: 0
trigger_source: ci:pull_request
π§ͺ Failure Analysis Log¶
{
"event": "TestFailure",
"trace_id": "invoice-2025-0147",
"handler": "CancelInvoiceHandler",
"role": "Guest",
"edition": "enterprise",
"failure_reason": "Expected status 403, got 200",
"retried": true,
"retry_success": false
}
β Consumed by QA Agent, Studio dashboards, or alert systems.
π Traceability Practices¶
| Mechanism | Description |
|---|---|
| Trace ID Tagging | All tests are linked to trace_id and blueprint ID |
| Edition/Role Tags | Included in span and metadata outputs |
| Scenario + Source Linking | Tracks whether test was generated via prompt, regression, or default |
| Test Class β Handler Mapping | Ensures reverse lookup from test β blueprint |
π§ Studio Dashboard Integration¶
- β Per-scenario test result status
- π Tooltip view of retry history and test input
- π Visual βPlayβ button to re-run test with last input
- π Test result heatmap per role Γ edition Γ trace
π£ Alerts & Diagnostics¶
| Failure Type | Alert Action |
|---|---|
| β Role Failure | Sends Studio alert with link to replay test |
| π§ͺ Repeated Flaky Scenario | Marks test as unstable β QA review panel |
| π§ Unexpected Pass/Fail Delta | Triggers regression reasoning via Bug Resolver Agent |
| π Execution Slowness | Metrics flagged for performance anomalies |
β Summary¶
The Test Automation Engineer Agent transforms test execution into a fully observable stream of trace-aligned, role-aware, and edition-specific spans, ensuring:
- π Every test is traceable back to its blueprint and prompt
- π Dashboards and metrics are updated in real-time
- π Failures are retry-visible, auditable, and explainable
- π§ͺ QA engineers and developers can navigate test lineage with confidence
Without this, test coverage would become opaque, and QA feedback would lack context or control.
π§ Test Sharding and Parallel Execution Management¶
To maintain fast, scalable, and reliable execution of thousands of tests across traces, roles, editions, tenants, and environments, the Test Automation Engineer Agent implements:
π§© Intelligent sharding and parallel execution orchestration β across CI agents, containers, or cloud test nodes.
This enables optimal use of compute resources and prevents bottlenecks in CI/CD pipelines, nightly jobs, or Studio-triggered replays.
βοΈ Key Execution Strategies¶
| Strategy | Description |
|---|---|
| Sharding by Trace | Each trace IDβs test suite runs in isolation from others |
| Edition Γ Role Partitioning | Matrix split across roles and editions, each sharded independently |
| Scenario Chunking | Large .feature files split by scenario for parallelism |
| Test Type Segmentation | Unit, integration, and BDD tests executed in separate pools |
| Tenant-Aware Execution Pools | Each tenantβs tests isolated by runtime container/test cluster |
π§± Example: Sharded Matrix for a Feature¶
feature: capture_payment.feature
scenarios: 6
editions: [lite, enterprise]
roles: [Cashier, Guest]
β Total Variants: 6 Γ 2 Γ 2 = 24
β Shards: 6 groups Γ 4 shards each (by edition Γ role)
Each shard:
- Loads a subset of tests
- Injects correct edition/role config
- Runs tests in isolation
- Sends results back to central aggregator
π§° Sharding Methods Supported¶
| Method | Tool / Layer |
|---|---|
| Azure DevOps Parallel Jobs | Shards run as matrix jobs |
| Docker-Based Isolation | Each job starts a test-runner container per shard |
| Orleans-Based Agent Pool (future) | Cloud-native distributed test node orchestration |
| Local Threaded Runner (lite) | For small test sets or CLI-triggered runs |
| Kubernetes Executor (optional) | Large-scale distributed .feature execution via pod-per-scenario model |
π Dynamic Sharding Algorithm¶
Agent evaluates:
- Number of test cases per dimension (role Γ edition Γ scenario)
- Historical duration metrics (via
test-history.json) - Retry counts (flaky = isolated)
- Infrastructure constraints (max parallelism)
- Priority weights (security tests run first)
And emits:
test-shard-plan.yaml
- shard_id: 1
trace_ids: [cancel-2025-0142]
roles: [CFO]
edition: enterprise
- shard_id: 2
trace_ids: [cancel-2025-0142]
roles: [Guest]
edition: lite
π Load Balancing Behavior¶
| Rule | Action |
|---|---|
| Tests exceed 30s | Force split to separate shard |
Scenario tagged @slow |
Run on dedicated low-priority shard |
| Retry required | Force isolate and deprioritize |
| Bug trace replay | High-priority fast-track shard |
Edition = pro, Role = Admin |
Run on enterprise test pool nodes |
π Runtime Metadata Per Shard¶
shard_id: 9
execution_group: refund-2025-0143
edition: enterprise
role: Guest
status: passed
retry_count: 0
duration: 7.2s
agent_instance: test-runner-shard9
β Used by Studio to show per-scenario result timeline and heatmap.
π§ Coordination Flow¶
flowchart TD
Plan[Test Suite Plan]
Plan --> Shard1[Shard A]
Plan --> Shard2[Shard B]
Plan --> Shard3[Shard C]
Shard1 --> Results
Shard2 --> Results
Shard3 --> Results
Results --> Aggregator[Test Result Aggregator]
Aggregator --> Studio
β Summary¶
The Test Automation Engineer Agent manages large-scale test execution by:
- π Sharding tests intelligently across roles, editions, and scenarios
- β‘ Running everything in parallel, isolated, and trace-safe environments
- π Feeding aggregated results back into Studio, PRs, and QA reports
- π§ Scaling test execution linearly as test volume grows
This is the execution engine for continuous quality across 100s of modules and 1000s of trace IDs.
π§ Metrics, Thresholds, and Quality Gates¶
The Test Automation Engineer Agent enforces quality assurance policies by generating and emitting metrics, thresholds, and pass/fail gates that:
- π Quantify test health across traces, roles, and editions
- π¦ Enforce CI/CD safety before merges or releases
- π§ͺ Detect regressions, flaky behavior, and coverage degradation
- π Support automated decisions for deployment control, QA signoff, and retry triggers
π¦ Core Metrics Tracked¶
| Metric | Description |
|---|---|
test.success_rate |
% of tests passed in this shard/trace/test type |
test.retry_rate |
% of tests that needed retry |
test.flaky_rate |
Ratio of unstable tests (flaky over time) |
scenario.coverage |
Percent of blueprint scenarios executed per trace |
edition_completeness |
Edition/role matrix coverage score |
assertion_density |
Average number of assertions per test |
critical_failures |
Number of failed @security or @regression scenarios |
test.duration.avg |
Average execution time across matrix |
test.blockers |
Total tests marked as block release |
quarantine_count |
Number of tests flagged as unstable |
π¦ Quality Gate Rules¶
Agent evaluates every test suite and emits a quality gate status:
| Rule | Threshold | Action |
|---|---|---|
| β Success Rate | > 95% | Pass |
| β οΈ Retry Rate | < 5% | Pass |
| β Critical Failures | = 0 | Required |
| β Security Scenario Pass | 100% | Required for merge/release |
| β οΈ Test Duration (avg) | < 15s per test | Info only |
| β Quarantine Count | < 3 unstable tests | Pass |
| β οΈ Coverage Delta | β₯ last build | Warning on drop |
| β Assertion Density | β₯ 1.5 per test | Optional for observability gate |
π Example: Quality Gate Summary (YAML)¶
trace_id: invoice-2025-0147
suite_status: failed
gate_summary:
success_rate: 87%
retry_rate: 8%
critical_failures: 2
security_pass: false
edition_matrix_coverage: 5/6
quarantine_count: 4
reasons:
- "Failed: Guest scenario expected 403 but returned 200"
- "Missing test for CFO in pro edition"
actions:
- Suggest regenerate from Test Generator Agent
- Rerun flaky tests in isolation
π Markdown QA Report Excerpt¶
### β
Quality Gate Result: β Blocked
- π΄ 2 critical security tests failed
- β οΈ 4 tests flagged as flaky (quarantined)
- π 3/6 roles tested (missing: CFO, Admin, Analyst)
- π Coverage dropped from 82% β 74%
- π¦ CI pipeline halted (requires QA review + rerun approval)
π Studio Display¶
| Trace | Edition | Role | Status | Gate | Coverage |
|---|---|---|---|---|---|
cancel-2025-0142 |
enterprise | CFO | β | β Pass | 100% |
refund-2025-0143 |
pro | Guest | β | β Blocked | 66% |
invoice-2025-0172 |
lite | FinanceManager | β | β οΈ Warning | 90% |
π Gate Actions Triggered¶
| Action | Trigger |
|---|---|
| π Retry test | Threshold: flaky = true |
| π§ͺ Re-gen scenario | Trigger: missing role coverage |
| β Mark test unstable | Failure in 2 of last 3 builds |
| π« Block release | Critical security regression |
| β οΈ Show Studio alert | Coverage or quality drop from baseline |
π§ Metrics Emitted Format¶
metrics/test-results.jsontest-metrics.prometheus.txt(for monitoring integration)qa-summary.yamlmarkdown-status.md
All tagged with:
trace_id,edition,role,test_typesource_agent,execution_id,retry_count
β Summary¶
With this cycle, the Test Automation Engineer Agent becomes the guardian of continuous quality by:
- π Measuring execution health with rich QA metrics
- π¦ Enforcing pass/fail gates at merge and release stages
- π§ Supporting Studio visibility and feedback loops
- π Connecting test failures to intelligent next steps (rerun, regenerate, revalidate)
Without this, test automation would become invisible and unreliable to the factoryβs DevOps and QA loops.
π― Support for Manual and Scheduled Test Runs¶
In addition to running tests in response to CI/CD events, the Test Automation Engineer Agent must support:
- π Manual execution requests (e.g., via Studio or QA prompt)
- π Scheduled jobs (e.g., nightly regressions, weekly chaos validation)
- π On-demand replays, edge-case runs, and exploratory test sweeps
This enables QA engineers and product owners to validate scenarios on demand, without waiting for pipeline events β ensuring continuous validation of critical business flows, long-running tests, and non-blocking coverage.
π Manual Execution Use Cases¶
| Scenario | Trigger Source | Action |
|---|---|---|
| QA reviews a bug fix | Studio β βRerun failed scenarioβ | Runs exact .feature/.cs combo |
| Prompt-based trace generated | QA prompt in Studio | Test Generator β Agent executes immediately |
| Edition configuration updated | QA clicks βRetest all scenarios for liteβ |
Full matrix rerun for edition |
| QA validates access rule changes | Manual run scoped by role | Security test matrix re-executed |
π Scheduled Execution Use Cases¶
| Schedule Type | Example |
|---|---|
| Nightly Regressions | Run all @regression and @security scenarios |
| Weekly Chaos/Retry Tests | Run scenarios tagged @retry, @chaos, @flaky |
| Edition Consistency Audits | Validate functional parity between pro and enterprise editions |
| Tenant Health Checks | Run 5β10 core tests across all tenants nightly |
| Prompt Backlog Drains | Re-execute tests generated from prompt backlog that werenβt prioritized in CI |
π Example Manual Trigger (Studio API)¶
{
"action": "manual_execute",
"trace_id": "invoice-2025-0172",
"role": "CFO",
"edition": "enterprise",
"scenarios": ["Invoice approval denied for Guest"]
}
Agent response:
π§ Scheduled Plan Definition (YAML)¶
schedule_id: nightly-qa-core-suite
schedule: 0 2 * * *
tests:
tags: [@core, @security]
roles: [Admin, CFO]
editions: [lite, pro, enterprise]
tenants: all
notifications:
on_failure: slack://qa-alerts
on_complete: post_summary_to_studio
Agent loads schedule.yaml, provisions isolated runner pools, and executes across shards.
π Metadata for Manual/Scheduled Runs¶
trigger: manual
trigger_source: studio.qa
triggered_by: olga.qa
execution_mode: on_demand
run_id: exec-1123
trace_id: refund-2025-0188
scenario: Refund fails for CFO with locked invoice
π Output Location¶
Manual and scheduled results are published to:
manual-results/<run_id>/*.mdstudio-trace-results/<trace_id>/<role>/<edition>/qa-execution-report.md- Studio Test History view
qa-backlog.yaml(for any tests queued due to infra limits)
β Summary¶
The agent supports:
- π Manual QA- or PM-initiated runs
- π Repeatable scheduled test suites
- π Reruns by trace, role, scenario, or edition
- π Full audit trail of who ran what, why, and when
- π Markdown summaries and logs for human review
This gives ConnectSoft a continuous QA safety net, outside the CI/CD pipeline β supporting experimentation, confidence, and coverage.
π― Collaboration with QA Engineer and Coverage Validator Agents¶
To maintain complete, role-aware, edition-specific test coverage, the Test Automation Engineer Agent must actively collaborate with:
- π§ͺ QA Engineer Agent β for test plan validation, feedback loops, and Studio integrations
- π Test Coverage Validator Agent β for real-time measurement of what was tested and what remains untested
This triad ensures test automation is strategic, traceable, and coverage-aligned, not just reactive or mechanical.
π€ Integration with QA Engineer Agent¶
| Collaboration Mode | Description |
|---|---|
| Execution Planning | Accepts test run instructions from QA plan (qa-plan.yaml) |
| Manual Feedback Handling | Accepts QA actions from Studio (approve/reject test run, rerun scenario) |
| Scenario Validation Status | Reports results of manual prompt-based or critical-path tests |
| QA Test Gap Review | Agent emits list of failed, flaky, or missing test runs for QA to review |
| Studio Trace Sync | Agent populates per-scenario execution summaries to Studio dashboards |
π Integration with Coverage Validator Agent¶
| Integration Type | Description |
|---|---|
| Before Execution | Validator agent provides trace/role/edition coverage expectations |
| After Execution | Automation agent emits actual test run matrix and results |
| Gap Resolution Triggers | If gaps remain, triggers Test Generator Agent or suggests QA Rerun |
| Failure Clustering | Validator tags frequently failing or uncovered role-edition-scenario clusters |
| Delta Reporting | Agent helps generate before/after coverage heatmaps post execution |
π Sample QA Plan Fragment (from QA Engineer Agent)¶
qa-plan:
trace_id: capture-2025-0143
required_roles:
- Cashier
- Guest
required_editions:
- lite
- enterprise
test_types:
- bdd
- security
test_tags:
- @retry
- @prompt_generated
must_pass:
- Scenario: Guest cannot approve payment
The Test Automation Engineer Agent:
- Executes specified matrix
- Validates results and marks required
must_passscenarios - Publishes report back to QA Engineer Agent and Studio
π Studio Feedback Workflow¶
sequenceDiagram
QAEngineer->>Studio: Request scenario rerun
Studio->>TestAutomationAgent: Execute scenario(trace_id, role)
TestAutomationAgent->>QAEngineerAgent: Report pass/fail
QAEngineerAgent->>Studio: Update status + coverage marker
π Sample Coverage Delta Report¶
trace_id: cancel-2025-0142
coverage_before:
total_roles: 4
tested: 2
coverage_after:
tested: 4
full_matrix_passed: true
summary: All critical paths executed successfully
β Used by Coverage Validator and QA Engineer Agents to update QA dashboards.
π Gap Remediation Loop¶
| Detected By | Resolved By |
|---|---|
| QA Agent flags missing test | Test Generator Agent β Automation Agent executes |
| Validator detects partial role matrix | Automation Agent runs missing combinations |
| Automation Agent detects unexpected behavior | Opens feedback task in Studio + retry ticket |
β Summary¶
The Test Automation Engineer Agent is not an isolated executor β it:
- π€ Aligns tightly with QA strategies via the QA Engineer Agent
- π Closes the loop with the Coverage Validator Agent to enforce test completeness
- π Supports Studio-driven actions, scenario replays, and test plan validations
- π Links every execution to feedback, regression prevention, and test health evolution
This collaborative structure turns automation into continuous quality assurance β not just test running.
π― Automation Metadata, Execution Snapshots, and Logs¶
Every test execution triggered by the Test Automation Engineer Agent must leave behind:
- π A complete execution snapshot
- π§Ύ Machine-readable metadata for CI, QA, and dashboards
- π Human-readable summaries for Studio, QA, and documentation
- π Logs and error traces for reproducibility, audits, and debugging
This cycle ensures every test run is fully inspectable, self-documented, and linked back to its origin.
π¦ Core Output Artifacts¶
| File | Description |
|---|---|
test-execution-summary.yaml |
Machine-readable result per test/role/edition |
qa-execution-report.md |
Markdown summary of execution, for QA dashboards |
.trx, .xml, .json |
Framework-specific result files (MSTest, SpecFlow, etc.) |
retry-history.yaml |
Retry reason, success, retry count |
assertion-logs.jsonl |
Structured logs of what was asserted and why |
execution.env.json |
Captures the environment context (role, edition, tenant) |
test-run.trace.json |
Detailed trace of input/output pairs, responses, exceptions |
π§ Metadata Example: test-execution-summary.yaml¶
execution_id: exec-9034
trace_id: refund-2025-0143
handler: IssueRefundHandler
role: Guest
edition: enterprise
locale: en-US
status: passed
test_type: bdd
assertions:
- expected: 403
actual: 403
type: status_code
result: passed
duration_seconds: 4.2
retried: false
started_by: ci:pull_request
π Markdown Summary: qa-execution-report.md¶
### π§ͺ Test Execution Report β refund-2025-0143
πΉ Handler: IssueRefundHandler
πΉ Edition: Enterprise
πΉ Role: Guest
πΉ Locale: en-US
πΉ Status: β
Passed
πΉ Duration: 4.2s
**Scenario**: Guest tries to issue a refund
- β
Status code = 403
- β
Error message = "Access Denied"
π Trigger: CI Pull Request #4829
π Artifact Directory Structure¶
/test-results/
βββ refund-2025-0143/
βββ test-execution-summary.yaml
βββ qa-execution-report.md
βββ refund_guest_enterprise.trx
βββ retry-history.yaml
βββ execution.env.json
βββ assertion-logs.jsonl
π Log Example: assertion-logs.jsonl¶
{
"trace_id": "refund-2025-0143",
"scenario": "Guest issues refund",
"assertion": "StatusCode == 403",
"result": "passed",
"duration_ms": 82
}
β Used in Studioβs log viewer, QA diagnostic panels, and metrics dashboards.
π§© Observability Metadata¶
Each artifact is tagged with:
trace_id,role,edition,execution_id,source,test_type- Retry info, CI build ID, and runtime env hash
π QA & Studio Usage¶
| Purpose | Artifact |
|---|---|
| Review failed test | qa-execution-report.md |
| Debug unexpected result | assertion-logs.jsonl, trace.json |
| Track retry history | retry-history.yaml |
| Show test config context | execution.env.json |
| Sync dashboards | test-execution-summary.yaml |
β Summary¶
This cycle ensures the Test Automation Engineer Agent emits:
- π Human-readable summaries for Studio and QA
- π Machine-readable metadata for CI/CD, validators, and coverage reports
- π§ͺ Execution context, retries, and assertion logs for diagnostics
- π Organized file structure for all test traces, failures, replays, and audits
It turns every test run into a self-contained, traceable QA asset β not just a log line in a CI server.
π― Error Feedback Loop β Triggering Retries, Generator Feedback, and QA Recovery¶
When a test fails, the Test Automation Engineer Agent doesnβt just log the failure β it activates an intelligent feedback loop that:
- π Retries recoverable tests
- π€ Sends failed cases to the Test Generator Agent for patching or augmentation
- π§βπΌ Alerts the QA Engineer Agent for manual review, tagging, or regression response
- π Records all outcomes for traceability and future retries
This feedback loop helps ConnectSoft achieve self-healing QA across the platform.
π Retry + Feedback Cycle¶
flowchart TD
A[Test Fails] --> B[Evaluate Failure Type]
B -->|Flaky| C[Retry]
B -->|Assertion Mismatch| D[QA Alert + Prompt Rerun]
B -->|Missing Scenario| E[Test Generator Agent Trigger]
C --> F[Retry Outcome: Pass/Fail]
D --> G[Studio Feedback]
E --> H[Patch Scenario or Suggest Fix]
π§© Feedback Triggers¶
| Condition | Feedback Action |
|---|---|
| β Scenario fails with missing role | Trigger Test Generator β emit missing role variant |
| β Invalid assertion (e.g. 200 instead of 403) | Flag Studio + QA review dashboard |
| π Retry succeeds | Record as flaky, tag scenario for night audit |
| π« Retry fails again | Open βregression suspectβ report in test-regression-candidates.yaml |
| π§ Prompt-based test fails | QA may edit, refine, or regenerate test using Studio |
| π§Ύ Missing .feature coverage | Coverage Validator suggests expansion plan |
| π Infra/setup issue | Create retry job + optionally skip temporarily |
π Retry Metadata Log Example¶
trace_id: cancel-2025-0142
scenario: CFO cannot cancel paid invoice
first_attempt:
status: failed
actual: 200
expected: 403
retry_attempt:
status: passed
reason: edition misconfigured
tag: flaky
feedback_actions:
- notify_qa
- quarantine_scenario
- suggest regeneration
π£ QA Recovery Loop¶
| Trigger | Action |
|---|---|
| Prompt test failed | Agent posts Studio message: "Scenario failed, review recommended." |
| Test removed by generator | QA notified to review gap |
| Retry count > threshold | QA must approve re-test or regeneration |
| Quarantined test | QA Engineer Agent tags with quarantine reason and remediation plan |
π Generator Feedback API (Test Generator Agent)¶
{
"trace_id": "invoice-2025-0172",
"failure_reason": "Missing THEN clause for assertion",
"scenario": "Guest cancels paid invoice",
"recommendation": "Regenerate using prompt: 'What if Guest cancels after invoice paid?'"
}
β Generator receives prompt context and trace metadata, generates patched .feature.
π QA Feedback View in Studio¶
| Trace | Scenario | Result | Retry | Feedback |
|---|---|---|---|---|
refund-2025-0143 |
Guest issues refund | β | β passed on 2nd | Tagged as flaky |
invoice-2025-0172 |
Guest cancels invoice | β | β failed again | Requires scenario patch |
π§ Recovery Tags Emitted¶
| Tag | Meaning |
|---|---|
retry_success |
Passed on retry, needs observation |
flaky_scenario |
Repeat fail β QA to monitor |
regression_candidate |
Retry failed twice β feed bug resolver |
missing_variant |
Generator missed scenario |
studio_feedback_required |
QA interaction needed |
β Summary¶
This cycle enables the Test Automation Engineer Agent to:
- π Automatically retry when safe
- π€ Send failed tests to prompt-based regeneration
- π€ Alert QA for review, retry, or reclassification
- π Record all results, tags, and recovery plans for trace-safe feedback
It ensures the system not only detects failure, but also responds intelligently, maintaining platform-wide test resilience.
π― Summary and Positioning Within the QA Automation Ecosystem¶
The Test Automation Engineer Agent is the execution orchestrator and quality enforcer of the ConnectSoft AI Software Factory QA Cluster.
It ensures that:
- π§ͺ Every test is executed in the correct role, edition, and tenant context
- π¦ Quality gates, retries, and observability pipelines are enforced
- π Studio dashboards, CI/CD pipelines, and QA engineers have full traceability
- π Failures are not final β they trigger remediation loops via retries, regeneration, and feedback
π§© Position in the QA Cluster¶
flowchart TD
A[TestCaseGeneratorAgent] --> D[TestAutomationEngineerAgent]
B[TestGeneratorAgent] --> D
C[CoverageValidatorAgent] --> D
D --> E[Studio]
D --> F[QAEngineerAgent]
D --> G[BugResolverAgent]
This agent is where static test artifacts become executable, observable validation logic.
π§ͺ Key Capabilities Overview¶
| Capability | Description |
|---|---|
| β Test Execution | Unit, integration, BDD, validator, security, edition-aware |
| π§© Role Γ Edition Matrix | Automatically expands and executes per configuration |
| π Retry and Quarantine | Smart retries with traceability and retry logs |
| π οΈ Environment Provisioning | TestContainers, mocks, edition configs, tenant injection |
| π Metrics & Quality Gates | Emits coverage, success rate, instability, and blockers |
| π Observability | Span logs, metrics, YAML/JSON/Markdown reports |
| π§ Collaboration | Connects with QA Agent, Test Generator, Coverage Validator |
| π Manual Execution | Studio-triggered test runs and replays |
| π Scheduled Execution | Nightly, regression, chaos, long-running tests |
| π Feedback Loops | Sends failures to Test Generator or QA workflows for patching |
π Outputs Summary¶
.yaml:test-execution-summary.yaml,retry-history.yaml,qa-plan-results.yaml.jsonl: assertion logs, span traces.md: QA-friendly test run reports.trx/.xml: native test runner output- Studio: per-trace, per-role dashboards
βοΈ Final Comparison with Other QA Agents¶
| Agent | Role |
|---|---|
| Test Case Generator Agent | Creates static unit/integration test classes |
| Test Generator Agent | Adds intelligent, prompt-based, edge-case test scenarios |
| QA Engineer Agent | Curates test plans, reviews execution, manages QA lifecycle |
| Test Coverage Validator Agent | Identifies gaps, coverage deltas, and audit failures |
| β Test Automation Engineer Agent | Runs tests, logs results, handles retries, and reports quality |
π§ Summary Statement¶
The Test Automation Engineer Agent is the operational heartbeat of ConnectSoftβs QA cluster β executing thousands of tests daily, maintaining coverage across tenants and editions, and continuously enforcing the platformβs observability-first, edition-aware, security-first testing principles.
Without this agent, test coverage is static and unvalidated. With it, the QA system becomes alive, intelligent, and continuously self-correcting.