State & Memory¶

Overview¶

The Factory runtime manages two distinct but related concepts:

Run State — Operational state for active and recent runs (ephemeral, per-run)
AI Memory — Long-term knowledge stored in the Knowledge & Memory System (persistent, cross-project)

This separation enables efficient operational execution while building a persistent knowledge base that improves over time.

Run State Store¶

What Is Stored¶

The Run State Store maintains operational state for Factory runs:

Run Metadata¶

Run Identifiers — runId, tenantId, projectId, templateRecipeId
Run Status — Current state (Requested, Validated, Queued, Running, Succeeded, Failed, Cancelled)
Timestamps — requestedAt, startedAt, completedAt, updatedAt
Request Context — User who requested, request parameters, configuration

Step/Job Status¶

Job Identifiers — jobId, stepName, attempt number
Job Status — Current state (Pending, Running, Succeeded, Failed, Cancelled)
Job Results — Success/failure status, error messages, execution duration
Artifact References — Links to generated artifacts (repo URLs, pipeline IDs, etc.)

Execution Context¶

Correlation IDs — traceId, spanId for distributed tracing
External System IDs — Azure DevOps buildId, repoId, pipelineId, workItemId
Execution Metadata — Worker instance, execution environment, resource usage

Storage Choice¶

The Run State Store is typically implemented as:

Relational Database (SQL) — For structured queries, joins, and transactional consistency
- Examples: PostgreSQL, SQL Server, Azure SQL Database
- Benefits: ACID transactions, complex queries, referential integrity
Document Database — For flexible schema and horizontal scaling
- Examples: Cosmos DB, MongoDB
- Benefits: Schema flexibility, horizontal scaling, JSON-native storage

Considerations:

Query Patterns — Relational DB for complex queries (e.g., "all runs for project X in last 30 days")
Scale Requirements — Document DB for high-scale, multi-tenant scenarios
Consistency Needs — Relational DB for strong consistency requirements

Job State & Idempotency Keys¶

Idempotency Key Structure¶

Jobs use structured idempotency keys to ensure safe retries:

Format: {runId}:{stepName}:{attempt}
Example: run-abc123:generate-repo:1
Purpose: Uniquely identifies a job execution attempt

State Fields for Safe Retries¶

Job state includes fields that enable safe retries:

idempotencyKey — Unique key for deduplication
status — Current job status (Pending, Running, Succeeded, Failed)
attemptNumber — Current retry attempt (1, 2, 3, ...)
lastAttemptAt — Timestamp of last execution attempt
result — Execution result (success/failure, error details)
checkpoint — Progress checkpoint for resumable jobs

Atomic State Updates¶

State updates are atomic to prevent race conditions:

Optimistic Locking — Use version numbers or timestamps to detect concurrent updates
Transactional Updates — Use database transactions for multi-field updates
Idempotent Operations — State updates are idempotent (applying same update twice has no effect)

Artifacts & Metadata¶

Artifact Storage¶

Generated artifacts are stored in external systems, not in the Run State Store:

Git Repositories — Code, tests, documentation stored in Azure DevOps or GitHub
Azure DevOps — Pipelines, work items, artifacts stored in Azure DevOps
Blob Storage — Large artifacts (diagrams, models) stored in Azure Blob Storage
Container Registries — Docker images stored in Azure Container Registry

Artifact References¶

The Run State Store maintains references to artifacts:

Repository URLs — Links to generated Git repositories
Pipeline IDs — References to generated CI/CD pipelines
Work Item IDs — References to created Azure DevOps work items
Blob URLs — Links to stored artifacts in blob storage
Artifact Metadata — Size, type, creation timestamp, checksums

Artifact Lifecycle¶

Artifacts follow a lifecycle managed by the Factory:

Generation — Artifacts are generated by workers
Storage — Artifacts are stored in external systems (Git, Azure DevOps, etc.)
Reference — Artifact references are stored in Run State Store
Indexing — Selected artifacts are indexed in Knowledge & Memory System
Retention — Artifacts are retained according to retention policies

AI Memory & Knowledge System Integration¶

Operational State vs. Long-Term Memory¶

The Factory maintains a clear separation:

Aspect	Operational State (Run State Store)	Long-Term Memory (Knowledge System)
Purpose	Track active runs, enable execution	Learn patterns, enable reuse
Lifetime	Ephemeral (weeks/months)	Persistent (years)
Scope	Per-run, per-project	Cross-project, cross-tenant
Query Pattern	Structured queries (SQL)	Semantic search (vector)
Update Frequency	High (real-time)	Low (batch/indexing)

How Factory Interacts with Knowledge System¶

During Execution¶

Pattern Retrieval — Agents query Knowledge System for similar past solutions
Template Lookup — Look up templates and patterns from Knowledge System
Context Enrichment — Retrieve relevant historical context for agents

After Execution¶

Run Summaries — Store run summaries, outcomes, and learnings
Pattern Extraction — Extract reusable patterns from generated artifacts
Failure Analysis — Store failure patterns and resolutions for future reference
Success Patterns — Index successful solutions for reuse

Vector Indexes¶

The Knowledge System uses vector indexes for semantic search:

Template Knowledge — Vector embeddings of templates, blueprints, and patterns
Past Runs — Vector embeddings of run summaries, decisions, and outcomes
Code Patterns — Vector embeddings of code snippets and architectural patterns
Domain Knowledge — Vector embeddings of domain-specific solutions

Example Query:

Query: "multi-tenant user management with role-based access"
→ Vector search finds:
  - Past runs that generated similar solutions
  - Templates for multi-tenant patterns
  - Code patterns for RBAC implementation

State & Memory Architecture¶

graph TD
    RunStore[(Run State DB)]
    Queue[(Job Queue)]
    Artifacts[(Repos, Pipelines, Docs)]
    Memory[Knowledge & Memory System<br/>Vector DB, search]

    Worker --> RunStore
    Worker --> Queue
    Worker --> Artifacts
    Orchestrator --> RunStore
    Orchestrator --> Queue

    RunStore --> Memory
    Artifacts --> Memory
    Worker --> Memory
    Orchestrator --> Memory

Hold "Alt" / "Option" to enable pan & zoom

Data Flows:

Operational Flow — Workers update RunStore with execution state
Artifact Flow — Workers create artifacts in external systems, store references in RunStore
Indexing Flow — Selected artifacts and run summaries are indexed in Memory System
Query Flow — Agents query Memory System for patterns and context

State Retention and Cleanup¶

Run State Retention¶

Run state is retained for operational and audit purposes:

Active Runs — Retained indefinitely while run is active
Completed Runs — Retained for configurable period (e.g., 90 days, 1 year)
Failed Runs — Retained longer for debugging and analysis (e.g., 1 year)
Archived Runs — Old runs can be archived to cold storage

Memory System Retention¶

Knowledge & Memory System retains data indefinitely:

Patterns — Retained permanently for pattern reuse
Run Summaries — Retained for historical context and learning
Artifact Indexes — Retained for semantic search and retrieval
Failure Patterns — Retained for failure analysis and prevention

Knowledge and Memory System — Comprehensive guide to the Knowledge & Memory System
Knowledge Indices — Vector search and semantic retrieval
Knowledge Graph — Graph-based knowledge representation
Execution Engine — How runs and jobs use state during execution
Control Plane — How control plane manages state