Spec Kitty mission lifecycle: a domain modeling pass through Giiken
Engineering Deterministic AI Workflows: A State-Machine Approach to Agent Missions
Current Situation Analysis
The modern AI agent landscape is saturated with frameworks that promise end-to-end automation. In practice, most of these systems operate as stateless prompt engines. They ingest a request, generate a plan, execute code, and terminate. The fundamental flaw isn't in the code generation capability; it's in the absence of structural accountability. When a session ends, the conversational context evaporates. The next agent or engineer must reconstruct intent, decisions, and constraints by reverse-engineering the codebase. This creates a fragile development loop where output is prioritized over process, and reproducibility is treated as an afterthought.
This problem is frequently misunderstood because teams conflate context window size with persistent memory. A larger context window delays the forgetting curve, but it does not solve the architectural need for deterministic state transitions. Without explicit phase boundaries, audit trails, and hard gates, AI-driven development becomes a black box. Teams cannot audit why a specific architectural decision was made, cannot resume interrupted work without re-prompting, and cannot enforce quality standards across autonomous execution cycles.
The industry is beginning to recognize that sustainable AI integration requires engineering discipline, not just model capability. Evidence from production deployments shows that missions which enforce structured lifecycles produce significantly higher auditability, enable seamless human-in-the-loop reviews, and allow multi-agent handoffs without context degradation. The shift is moving from conversational automation to deterministic pipeline execution, where every phase writes artifacts to disk, every transition is validated, and the process trail becomes the primary source of truth.
WOW Moment: Key Findings
The critical differentiator between traditional AI agent workflows and structured mission pipelines is not the quality of the generated code, but the persistence of the decision trail. When comparing stateless prompt-driven execution against a disk-backed state machine, the operational advantages become quantifiable.
| Approach | Context Persistence | Phase Enforcement | Audit Granularity | Resumption Capability |
|---|---|---|---|---|
| Stateless Prompt Engine | Session-bound (evaporates on termination) | None (linear execution) | Low (final diff only) | Poor (requires full re-prompt) |
| State-Machine Mission Pipeline | Disk-backed (survives termination) | Hard gates between phases | High (per-phase artifacts + JSONL logs) | Excellent (resume from last validated state) |
This finding matters because it inverts the traditional AI development model. Instead of treating the agent as a code generator that leaves behind a commit, the mission directory becomes the persistent context. The next agent, reviewer, or human engineer does not need to reconstruct intent from scattered chat logs or commit messages. They open the mission root and find the contract (spec.md), the chosen approach (plan.md), the evidence trail (research/, data-model.md), and the immutable audit ledger (.jsonl files). This enables deterministic replay, compliance auditing, and parallel work package execution without losing structural coherence.
Core Solution
Building a deterministic AI mission pipeline requires treating the agent lifecycle as a finite state machine with explicit phase validators, artifact registries, and immutable audit trails. The implementation below demonstrates a TypeScript-based orchestrator that mirrors production-grade mission architectures.
Architecture Decisions & Rationale
- Disk as Source of Truth: State is persisted to the filesystem rather than held in memory. This guarantees crash recovery, enables external tooling to inspect progress, and decouples execution from the orchestrator process.
- Immutable Audit Logs: JSONL files are used for event logging because they support append-only writes, are trivially parseable, and maintain chronological integrity without database dependencies.
- Hard Phase Gates: Transitions are blocked until validation passes. This prevents partial execution, enforces quality standards, and ensures that downstream phases only consume verified artifacts.
- Work Package Decomposition: Plans are split into independent work packages (WPs) to enable parallel execution, isolated review, and granular rollback.
Implementation
import { readFileSync, writeFileSync, existsSync, mkdirSync } from 'fs';
import { join } from 'path';
import { v7 as ulid } from 'ulid';
// Phase definitions with strict transition rules
type MissionPhase = 'SPECIFY' | 'PLAN' | 'TASKS' | 'IMPLEMENT' | 'REVIEW' | 'MERGE';
interface PhaseGate {
phase: MissionPhase;
requiredArtifacts: string[];
validate(): boolean;
}
interface WorkPackage {
id: string;
title: string;
status: 'PENDING' | 'IN_PROGRESS' | 'REVIEW' | 'APPROVED' | 'REJECTED';
files: string[];
}
class MissionOrchestrator {
private missionRoot: string;
private currentPhase: MissionPhase = 'SPECIFY';
private auditLog: string[] = [];
constructor(slug: string) {
const missionId = ulid();
this.missionRoot = join('kitty-specs', `${slug}-${missionId}`);
this.initializeDirectory();
this.logEvent('MISSION_CREATED', { slug, missionId });
}
private initializeDirectory(): void {
const dirs = ['checklists', 'research'];
mkdirSync(this.missionRoot, { recursive: true });
dirs.forEach(d => mkdirSync(join(this.missionRoot, d), { recursive: true }));
}
private logEvent(event: string, payload: Record<string, unknown>): void {
const entry = JSON.stringify({ timestamp: Date.now(), event, payload });
this.auditLog.push(entry);
writeFileSync(join(this.missionRoot, 'mission-events.jsonl'), entry + '\n', { flag: 'a' });
}
private validatePhaseTransition(nextPhase: MissionPhase): boolean {
const gate: PhaseGate = {
phase: this.currentPhase,
requiredArtifacts: this.getRequiredArtifacts(this.currentPhase),
validate: () => {
return this.getRequiredArtifacts(this.currentPhase).every(art =>
existsSync(join(this.missionRoot, art))
);
}
};
if (!gate.validate()) {
throw new Error(`Phase gate failed: missing required artifacts for ${this.currentPhase}`);
}
this.currentPhase = nextPhase;
this.logEvent('PHASE_TRANSITION', { from: gate.phase, to: nextPhase });
return true;
}
private getRequiredArtifacts(phase: MissionPhase): string[] {
const map: Record<MissionPhase, string[]> = {
SPECIFY: ['spec.md', 'checklists/requirements.md'],
PLAN: ['plan.md', 'research/evidence-log.csv'],
TASKS: ['tasks.json'],
IMPLEMENT: ['status.json'],
REVIEW: ['review-report.md'],
MERGE: []
};
return map[phase];
}
public executePhase(phase: MissionPhase): void {
if (phase !== this.currentPhase) {
throw new Error(Invalid phase execution. Expected ${this.currentPhase}, got ${phase});
}
switch (phase) {
case 'SPECIFY': this.runSpecifyPhase(); break;
case 'PLAN': this.runPlanPhase(); break;
case 'TASKS': this.runTasksPhase(); break;
case 'IMPLEMENT': this.runImplementPhase(); break;
case 'REVIEW': this.runReviewPhase(); break;
case 'MERGE': this.runMergePhase(); break;
}
}
private runSpecifyPhase(): void {
const spec = # Mission Specification\n\n## Scope\nDomain modeling for knowledge service.\n## Constraints\nMust align with existing migration patterns.\n## Acceptance Criteria\n- Architecture docs generated\n- Data model migration validated\n- Test suites passing;
writeFileSync(join(this.missionRoot, 'spec.md'), spec);
writeFileSync(join(this.missionRoot, 'checklists/requirements.md'), '- [x] Scope defined\n- [x] Constraints documented');
this.logEvent('ARTIFACT_WRITTEN', { file: 'spec.md' });
}
private runPlanPhase(): void {
const plan = # Implementation Plan\n\n## Approach\nDecompose domain model into entity boundaries.\n## Evidence\n- Schema analysis: research/source-register.csv\n- Migration strategy: research/evidence-log.csv;
writeFileSync(join(this.missionRoot, 'plan.md'), plan);
writeFileSync(join(this.missionRoot, 'research/evidence-log.csv'), 'timestamp,source,finding\n1715000000,waaseyaa/docs,entity-boundary-mapping');
this.logEvent('ARTIFACT_WRITTEN', { file: 'plan.md' });
}
private runTasksPhase(): void { const tasks = [ { id: 'WP01', title: 'Domain entity definitions', status: 'PENDING', files: ['src/entities/User.ts'] }, { id: 'WP02', title: 'Migration alignment', status: 'PENDING', files: ['migrations/001_init.sql'] } ]; writeFileSync(join(this.missionRoot, 'tasks.json'), JSON.stringify(tasks, null, 2)); this.logEvent('ARTIFACT_WRITTEN', { file: 'tasks.json' }); }
private runImplementPhase(): void { const status = { phase: 'IMPLEMENT', wps: 2, approved: 0, rejected: 0 }; writeFileSync(join(this.missionRoot, 'status.json'), JSON.stringify(status, null, 2)); this.logEvent('ARTIFACT_WRITTEN', { file: 'status.json' }); }
private runReviewPhase(): void {
const report = # Review Report\n\n## WP01: Approved\n## WP02: Approved\n## Notes\nAll acceptance criteria met. Tests passing.;
writeFileSync(join(this.missionRoot, 'review-report.md'), report);
this.logEvent('ARTIFACT_WRITTEN', { file: 'review-report.md' });
}
private runMergePhase(): void {
this.logEvent('MISSION_TERMINAL', { state: 'MERGED', artifacts: 13 });
console.log(Mission complete. Directory: ${this.missionRoot});
}
public advance(): void { const phases: MissionPhase[] = ['SPECIFY', 'PLAN', 'TASKS', 'IMPLEMENT', 'REVIEW', 'MERGE']; const currentIndex = phases.indexOf(this.currentPhase); if (currentIndex < phases.length - 1) { const nextPhase = phases[currentIndex + 1]; this.validatePhaseTransition(nextPhase); } else { throw new Error('Mission already in terminal state'); } } }
// Usage const mission = new MissionOrchestrator('giiken-domain-modeling'); mission.executePhase('SPECIFY'); mission.advance(); mission.executePhase('PLAN'); mission.advance(); mission.executePhase('TASKS'); mission.advance(); mission.executePhase('IMPLEMENT'); mission.advance(); mission.executePhase('REVIEW'); mission.advance(); mission.executePhase('MERGE');
### Why This Architecture Works
The orchestrator enforces deterministic progression by coupling phase execution with artifact validation. Each phase writes specific files that serve as inputs for the next phase. The `validatePhaseTransition` method acts as a circuit breaker, preventing execution from proceeding until all required artifacts exist. JSONL logging ensures that every state change, artifact write, and transition is recorded chronologically. This design eliminates ambiguity, supports external inspection, and guarantees that the mission directory remains a complete, self-documenting record of the entire lifecycle.
## Pitfall Guide
### 1. Treating Specifications as Optional Documentation
**Explanation:** Teams often skip rigorous `spec.md` generation, assuming the agent will infer requirements from the prompt. This leads to scope drift and misaligned deliverables.
**Fix:** Enforce `spec.md` as a hard contract. Require explicit acceptance criteria, constraints, and boundary definitions before allowing phase progression.
### 2. Bypassing Review Gates for Speed
**Explanation:** Pressure to ship code leads to skipping the `REVIEW` phase or marking work packages as approved without validation. This corrupts the audit trail and introduces unverified changes.
**Fix:** Implement strict state validation in the orchestrator. Reject any transition to `MERGE` if work packages remain in `PENDING` or `REVIEW` states.
### 3. Overloading Work Packages
**Explanation:** Decomposing plans into monolithic work packages makes parallel execution impossible and complicates review. Agents struggle to maintain context across large scopes.
**Fix:** Limit each WP to a single cohesive change set. Ensure WPs are independently testable and reviewable. Target 3-7 WPs per mission for optimal throughput.
### 4. Ignoring the Evidence Trail
**Explanation:** Teams focus on code output and neglect `research/` directories, CSVs, and source registers. This breaks traceability and makes future audits impossible.
**Fix:** Mandate artifact generation per phase. Require `source-register.csv` for external references and `evidence-log.csv` for decision justification. Treat these as first-class deliverables.
### 5. Assuming JSONL Logs Are Debugging Artifacts
**Explanation:** Audit logs are often treated as transient debugging output rather than immutable records. This leads to log rotation, truncation, or manual editing.
**Fix:** Store JSONL files with append-only permissions. Implement log verification checksums. Treat the audit trail as a compliance ledger, not a debugging tool.
### 6. Mixing Agent and Human Context
**Explanation:** Storing agent runtime state and human review notes in the same files creates merge conflicts and context pollution.
**Fix:** Separate `meta.json` (agent runtime state, ULID, timestamps) from `status.json` (pipeline state, WP statuses). Keep human annotations in dedicated `review-report.md` or `checklists/` files.
### 7. No Rollback Strategy on Gate Failure
**Explanation:** When a phase gate fails, teams often delete the mission directory and restart. This loses partial progress and audit history.
**Fix:** Implement checkpointing via `status.events.jsonl`. Store the last validated phase and allow the orchestrator to resume from that state without re-executing completed phases.
## Production Bundle
### Action Checklist
- [ ] Initialize mission directory with ULID-suffixed slug and required subdirectories
- [ ] Generate `spec.md` with explicit acceptance criteria and constraints
- [ ] Populate `checklists/requirements.md` and mark validation items
- [ ] Create `plan.md` referencing `research/evidence-log.csv` and `source-register.csv`
- [ ] Decompose plan into independent work packages in `tasks.json`
- [ ] Execute implementation phases with strict artifact validation per transition
- [ ] Run review gates and generate `review-report.md` before merge
- [ ] Verify `mission-events.jsonl` contains complete chronological audit trail
### Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|----------|---------------------|-----|-------------|
| Small, well-defined feature | Stateless prompt + direct commit | Low overhead, fast iteration | Minimal compute cost |
| Multi-agent handoff required | State-machine mission pipeline | Ensures context persistence and deterministic resumption | Moderate storage cost, higher initial setup |
| Compliance/audit required | JSONL-backed mission with hard gates | Immutable trail, phase validation, artifact traceability | Higher storage, strict process overhead |
| Experimental/prototype | Conversational agent + manual docs | Flexibility, rapid iteration | Low process cost, high context loss risk |
| Production domain modeling | Spec Kitty-style lifecycle | Enforces structure, evidence trail, WP decomposition | Balanced cost, high reliability |
### Configuration Template
```json
{
"mission": {
"slug": "project-domain-modeling",
"version": "1.0.0",
"phases": ["SPECIFY", "PLAN", "TASKS", "IMPLEMENT", "REVIEW", "MERGE"],
"gate_policy": "HARD",
"audit_format": "JSONL",
"artifacts": {
"spec": "spec.md",
"plan": "plan.md",
"tasks": "tasks.json",
"status": "status.json",
"review": "review-report.md",
"checklist": "checklists/requirements.md",
"evidence": "research/evidence-log.csv",
"sources": "research/source-register.csv"
},
"work_packages": {
"max_per_mission": 7,
"independent_review": true,
"parallel_execution": true
}
}
}
Quick Start Guide
- Initialize the pipeline: Create a new mission directory using the ULID-slug naming convention. Run the orchestrator constructor with your project slug to generate the base structure.
- Define the contract: Populate
spec.mdwith scope, constraints, and acceptance criteria. Completechecklists/requirements.mdto mark validation items. - Execute phase transitions: Run
executePhase()for each lifecycle stage. Calladvance()to validate gates and move to the next phase. The orchestrator will block progression if required artifacts are missing. - Monitor audit trail: Inspect
mission-events.jsonlfor chronological state changes. Verify artifact generation matches the configuration template. - Terminate and merge: Once all work packages reach
APPROVED, execute theMERGEphase. The pipeline will log the terminal state and finalize the mission directory for archival or handoff.
