ess criteria.
interface TaskContract {
id: string;
objective: string;
constraints: string[];
acceptanceCriteria: string[];
dependencies: string[];
targetModel: 'chatgpt' | 'claude' | 'gemini';
}
interface SubtaskResult {
taskId: string;
output: string;
validationStatus: 'pass' | 'fail' | 'ambiguous';
flaggedAmbiguities: string[];
}
2. Implement Capability-Aware Routing
Different models excel at different cognitive loads. Routing should be deterministic, not heuristic.
class ModelRouter {
route(task: TaskContract): string {
const hasDeepContext = task.constraints.length > 4 || task.objective.includes('refactor');
const requiresToolUse = task.constraints.some(c => c.includes('web') || c.includes('database'));
const isIdeation = task.objective.includes('design') || task.objective.includes('strategy');
if (hasDeepContext) return 'claude';
if (requiresToolUse) return 'gemini';
if (isIdeation) return 'chatgpt';
return 'claude'; // Default for implementation-heavy tasks
}
}
Rationale: Claude's extended context window and code-aware training make it optimal for multi-file refactoring and deep reasoning. Gemini's integrated tool-use and search capabilities excel at data aggregation and external API validation. ChatGPT's broad training distribution handles architectural ideation and scaffolding efficiently. Hardcoding routing prevents capability mismatch, which is a primary source of hallucination and scope creep.
3. Build a Validation Pipeline
Output must be verified against the original contract before proceeding.
class ValidationGate {
async verify(result: SubtaskResult, contract: TaskContract): Promise<boolean> {
const missingCriteria = contract.acceptanceCriteria.filter(
criteria => !result.output.toLowerCase().includes(criteria.toLowerCase())
);
if (missingCriteria.length > 0) {
result.validationStatus = 'fail';
result.flaggedAmbiguities.push(`Missing acceptance criteria: ${missingCriteria.join(', ')}`);
return false;
}
result.validationStatus = 'pass';
return true;
}
}
4. Orchestrate the ReAct Loop
The execution engine chains decomposition, routing, generation, and validation into a deterministic cycle.
class WorkflowEngine {
private router = new ModelRouter();
private gate = new ValidationGate();
async execute(task: TaskContract): Promise<SubtaskResult> {
const model = this.router.route(task);
const prompt = this.buildPrompt(task, model);
const rawOutput = await this.invokeModel(model, prompt);
const result: SubtaskResult = {
taskId: task.id,
output: rawOutput,
validationStatus: 'ambiguous',
flaggedAmbiguities: []
};
const isValid = await this.gate.verify(result, task);
if (!isValid) {
result.output = await this.replanAndRetry(task, result.flaggedAmbiguities);
}
return result;
}
private buildPrompt(task: TaskContract, model: string): string {
return `
SYSTEM: You are a senior ${model === 'claude' ? 'backend' : 'software'} engineer.
OBJECTIVE: ${task.objective}
CONSTRAINTS: ${task.constraints.join(' | ')}
ACCEPTANCE: ${task.acceptanceCriteria.join(' | ')}
INSTRUCTION: Decompose into verifiable steps. Flag ambiguities before implementation.
OUTPUT FORMAT: Markdown with explicit code blocks and test skeletons.
`;
}
private async invokeModel(model: string, prompt: string): Promise<string> {
// Abstracted API call to respective provider
// Implement rate limiting, retry logic, and temperature control (0.2-0.4)
return ''; // Placeholder
}
private async replanAndRetry(task: TaskContract, ambiguities: string[]): Promise<string> {
// Generate corrective prompt focusing on flagged gaps
return ''; // Placeholder
}
}
Architecture Decisions:
- Stateless Prompt Sessions: Each execution receives a fresh system prompt. This eliminates context drift and prevents legacy instructions from contaminating new tasks.
- Deterministic Temperature: Setting temperature between 0.2 and 0.4 reduces creative variance while preserving reasoning capability.
- Contract-First Validation: Acceptance criteria are checked programmatically before human review, filtering out incomplete outputs automatically.
- Explicit Ambiguity Flagging: The model is forced to surface unknowns rather than inventing solutions, shifting risk detection upstream.
Pitfall Guide
| Pitfall | Explanation | Fix |
|---|
| Context Window Saturation | Feeding entire codebases or long conversation histories degrades reasoning quality and increases hallucination rates. | Chunk inputs by module. Use stateless briefs per task. Reset sessions after 3-4 turns. |
| Implicit Requirement Assumption | Models fill gaps with plausible but incorrect assumptions when constraints are vague. | Enforce explicit constraint arrays in contracts. Require the model to list assumptions before coding. |
| Model Capability Mismatch | Using a general-purpose model for deep refactoring or tool-dependent tasks yields shallow or broken outputs. | Route deterministically based on task characteristics (context depth, tool need, ideation vs implementation). |
| Validation Bypass | Skipping automated checks against acceptance criteria lets incomplete code reach review. | Implement a validation gate that verifies output against contract criteria before human inspection. |
| Security Afterthought | Treating security as a post-generation checklist misses injection vectors and auth flaws baked into initial design. | Embed OWASP-aligned constraints directly in the prompt contract. Require explicit input/output threat modeling. |
| Multi-Turn Drift | Conversational threads accumulate contradictory instructions, causing output inconsistency. | Use per-task briefs. Avoid appending to old threads. Maintain a decision journal for architectural choices. |
| Over-Automation of Critical Paths | Fully automating security-critical or financial logic removes necessary human oversight. | Reserve human-in-the-loop gates for auth, data persistence, and compliance modules. Use AI for scaffolding and testing only. |
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Rapid Prototyping | Ad-hoc Prompting + ChatGPT | Speed prioritized over fidelity; acceptable for throwaway code | Low API cost, high revision risk |
| Production Feature Delivery | Structured Decomposition + Claude | Balanced fidelity and context handling; reduces review overhead | Moderate API cost, 40% faster delivery |
| Security-Critical Module | Orchestrated ReAct + Validation Gate + Human Gate | Enforces OWASP constraints, catches injection vectors, requires oversight | Higher API cost, near-zero defect leakage |
| Legacy Refactoring | Deep Context Routing + Claude + Test Skeletons | Handles multi-file dependencies, preserves behavior, generates regression tests | High initial cost, long-term maintenance savings |
| Data Aggregation/Research | Tool-Use Routing + Gemini | Native web/database integration reduces manual scraping overhead | Moderate cost, high accuracy for external data |
Configuration Template
// workflow.config.ts
export const AI_WORKFLOW_CONFIG = {
routing: {
ideation: 'chatgpt',
implementation: 'claude',
research: 'gemini',
refactoring: 'claude'
},
generation: {
temperature: 0.3,
maxTokens: 4096,
stopSequences: ['```', '## Acceptance']
},
validation: {
autoCheckCriteria: true,
requireAmbiguityFlag: true,
securityConstraints: [
'OWASP Top 10 compliance',
'Explicit input sanitization',
'Rate limiting considerations',
'Auth failure handling'
]
},
session: {
maxTurns: 3,
resetOnValidationFail: true,
contextChunkSize: 'module'
}
};
Quick Start Guide
- Initialize the Contract: Define your objective, list 3-5 hard constraints, and write 2-3 measurable acceptance criteria. Store in a
TaskContract object.
- Route & Generate: Pass the contract to the
WorkflowEngine. The router selects the optimal model and injects a stateless prompt with explicit formatting instructions.
- Validate Automatically: The validation gate checks output against acceptance criteria. If criteria are missing, the engine triggers a targeted replan prompt focusing only on the gaps.
- Human Review Gate: Inspect the validated output. Verify security constraints, edge-case handling, and architectural alignment. Log decisions in your journal.
- Iterate or Ship: If validation passes and human review approves, merge. If not, refine constraints and rerun. Keep sessions stateless to maintain consistency.
By treating AI assistance as an engineered workflow rather than a conversational tool, teams eliminate drift, enforce quality contracts, and scale delivery without sacrificing reliability. The model doesn't replace engineering discipline; it amplifies it when properly orchestrated.