The File Modification Boundary We Found After 12 ForgeFlow Projects
Beyond the Modification Deadlock: Architecting AI Agents for Clean-Sheet Generation
Current Situation Analysis
Autonomous coding agents have reached a maturity threshold where greenfield module generation is highly reliable. Teams deploying local or cloud-based LLMs in closed-loop TDD environments consistently observe first-pass success when the agent is asked to create a file from scratch. The industry pain point emerges when the workflow demands incremental modifications to existing codebases. Despite advances in prompt engineering, retry logic, and diff-generation tooling, agents frequently stall when tasked with patching established files.
This failure mode is routinely misunderstood as a prompt quality issue or a model capability gap. Engineering teams respond by tightening constraints, adding chain-of-thought instructions, or increasing retry budgets. These approaches treat the symptom rather than the structural cause. The underlying issue is statistical: when an LLM receives a large, syntactically complete file as context, the token distribution heavily favors reproducing the existing code. The smaller the required delta, the stronger the context window acts as a statistical attractor, pulling the generation toward verbatim reproduction rather than targeted mutation.
Empirical telemetry from 12 autonomous execution projects and 81 development sessions confirms this boundary. Across two distinct backend architectures (raw completion via Ollama and diff-based patching via Aider), tasks targeting new files achieved a 100% first-pass success rate. Tasks requiring modifications to existing files consistently triggered identical-output deadlocks, regardless of retry count or prompt refinement. The dataset was constrained to the Qwen3-Coder-Next model family (45GB Q4_K_M quantization) running on Apple Silicon M5 Max hardware, but the pattern aligns with broader observations about LLM generation dynamics in structured loops. The industry has optimized for creation; it has not yet engineered around modification constraints.
WOW Moment: Key Findings
The critical insight emerges when comparing execution outcomes across task types. The data reveals a hard boundary between generation and mutation within autonomous loops.
| Approach | First-Pass Success Rate | Average Execution Cycles | Failure Mode |
|---|---|---|---|
| New File Generation | 100% | 1.0 | None observed |
| Existing File Modification | 0% | DEADLOCK (3 identical cycles) | Identical GREEN / Diff Timeout |
This finding matters because it forces a fundamental shift in how autonomous workflows are architected. Instead of treating file modification as a standard task type, it must be classified as a high-risk operation that requires deterministic handling. The data demonstrates that retry loops are mathematically futile when the context window acts as a reproduction attractor. Recognizing this boundary enables teams to restructure PRDs, decouple dependency graphs, and route autonomous tasks exclusively toward clean-sheet generation. The result is a predictable execution pipeline where statistical uncertainty is isolated to file creation, while existing codebase mutations are handled through deterministic setup scripts or architectural decoupling.
Core Solution
The solution replaces iterative patching with a modular generation pattern. Infrastructure and existing-file mutations are resolved deterministically during environment setup. Autonomous agent tasks are restricted to new file creation. This separation eliminates the context window attractor problem while preserving TDD enforcement.
Step-by-Step Implementation
- Decouple Infrastructure from Feature Logic: Separate boilerplate, routing registration, and model definitions from feature-specific implementations. Infrastructure is generated or patched via deterministic scripts before the agent loop begins.
- Route Tasks to New File Targets: Every autonomous task must specify a file path that does not exist at execution time. The agent generates the complete module, writes it, and runs tests.
- Implement Hash-Based Deadlock Detection: During the GREEN phase, compute a SHA-256 hash of the generated output and compare it against the input file hash. If they match, terminate the retry loop immediately.
- Validate Task Specifications Pre-Execution: Run a validation pass against the project directory to flag any task targeting an existing file. Reject or rewrite the task before execution.
Architecture Rationale
The decision to restrict autonomous tasks to new files is driven by generation mechanics, not arbitrary preference. LLMs operate on next-token probability distributions. When the prompt contains a complete, syntactically valid file, the highest-probability continuation is often the file itself. Diff-generation backends attempt to bypass this by outputting unified diff chunks, but quantized local models struggle with precise line-matching on complex files containing async chains, mixed imports, and dense type definitions. The result is either timeout exhaustion or fallback to unchanged output.
By isolating the agent to clean-sheet generation, we remove the statistical attractor. The model receives a specification, generates a new module, and passes it through the test suite. Infrastructure dependencies are resolved deterministically, ensuring the agent never receives a partially complete or heavily modified file as context.
New Code Example: Clean-Sheet Generation Pipeline
The following TypeScript implementation demonstrates the pattern. It replaces iterative patching with a setup-driven workflow and includes hash-based deadlock detection.
import { createHash } from 'crypto';
import { readFileSync, writeFileSync, existsSync, mkdirSync } from 'fs';
import { execSync } from 'child_process';
interface TaskSpec {
id: string;
targetPath: string;
specification: string;
testCommand: string;
}
interface ExecutionResult {
taskId: string;
status: 'PASS' | 'FAIL' | 'DEADLOCK';
cycles: number;
hashMatch: boolean;
}
class AgentExecutionEngine {
private maxRetries = 3;
async executeTask(task: TaskSpec): Promise<ExecutionResult> {
if (existsSync(task.targetPath)) {
throw new Error(`Task ${task.id} targets existing file. Violates clean-sheet constraint.`);
}
let cycles = 0;
let lastOutput = '';
while (cycles < this.maxRetries) {
cycles++;
lastOutput = await this.generateModule(task.specification);
const inputHash = this.computeHash(''); // New file has no input
const outputHash = this.computeHash(lastOutput);
if (inputHash === outputHash) {
return { taskId: task.id, status: 'DEADLOCK', cycles, hashMatch: true };
}
writeFileSync(task.targetPath, lastOutput);
const testPassed = await this.runTests(task.testCommand);
if (testPassed) {
return { taskId: task.id, status: 'PASS', cycles, hashMatch: false };
}
}
return { taskId: task.id, status: 'FAIL', cycles, hashMatch: false };
}
private computeHash(content: string): string {
return createHash('sha256').update(content).digest('hex');
}
private async generateModule(spec: string): Promise<string> {
// Placeholder for LLM API call (Ollama/Aider/Cloud)
return `// Generated module\nexport const handler = () => { /* ${spec} */ };`;
}
private async runTests(command: string): Promise<boolean> {
try {
execSync(command, { stdio: 'inherit' });
return true;
} catch {
return false;
}
}
}
// Setup script: deterministic infrastructure generation
function bootstrapInfrastructure() {
const dir = './src/modules';
if (!existsSync(dir)) mkdirSync(dir, { recursive: true });
// Deterministic patching of existing router file
const routerContent = `import { Router } from 'express';\nexport const appRouter = Router();`;
writeFileSync('./src/router.ts', routerContent);
}
// Task definition: exclusively new files
const tasks: TaskSpec[] = [
{
id: 'TASK-001',
targetPath: './src/modules/inventory.ts',
specification: 'Implement stock tracking with async database calls',
testCommand: 'npx jest modules/inventory.test.ts'
},
{
id: 'TASK-002',
targetPath: './src/modules/shipping.ts',
specification: 'Create shipping calculator with rate limiting',
testCommand: 'npx jest modules/shipping.test.ts'
}
];
async function main() {
bootstrapInfrastructure();
const engine = new AgentExecutionEngine();
for (const task of tasks) {
const result = await engine.executeTask(task);
console.log(`[${result.taskId}] Status: ${result.status} | Cycles: ${result.cycles}`);
}
}
main();
This implementation enforces the clean-sheet constraint at runtime, detects identical-output deadlocks via hash comparison, and separates deterministic setup from stochastic generation. The architecture scales because each task operates in isolation, eliminating cross-file mutation risks.
Pitfall Guide
1. Context Window Attractor Trap
Explanation: When an LLM receives a large, complete file as context, the token probability distribution heavily favors reproducing the input. The model treats the existing code as a verified pattern and defaults to replication rather than mutation. Fix: Never pass existing files as context for modification tasks. Route the task to a new file path and resolve dependencies through imports or setup scripts.
2. Diff Generation on Dense Files
Explanation: Unified diff generation requires precise line-matching and token alignment. Quantized local models frequently produce malformed diff chunks when handling files with complex async chains, mixed imports, or dense type definitions. This results in timeouts or silent fallbacks to unchanged output.
Fix: Use AST-based patchers (e.g., ts-morph, libclang) for deterministic mutations. Reserve LLM diff generation for simple, low-complexity files only.
3. Over-Fragmentation at Scale
Explanation: Enforcing a strict "new file only" rule across 20+ tasks can create import spaghetti, increase compilation overhead, and fragment related logic across multiple modules.
Fix: Group related schemas, routes, or utilities into logical modules rather than single-function files. Use barrel exports (index.ts) to maintain clean import paths. Validate file count against project complexity thresholds.
4. Blocking Dependency Chains
Explanation: A single modification deadlock halts all downstream tasks that depend on the mutated file. In linear execution pipelines, this creates cascading failures. Fix: Parallelize independent modules. Use mock interfaces or stub implementations to unblock downstream tasks while upstream modules are being generated. Implement dependency graph validation before execution.
5. Ignoring Non-Deterministic Flakiness
Explanation: Identical prompts and model configurations can yield different outputs across runs due to temperature sampling, token sampling variance, or backend state. Assuming deterministic behavior leads to false confidence in retry loops. Fix: Seed generation requests where supported. Implement early-exit conditions on hash matches. Track pass/fail rates per task type and adjust retry budgets based on empirical data, not assumptions.
6. Setup Script Brittleness
Explanation: Regex or string-replacement patching breaks when source files undergo minor formatting changes, comment additions, or import reordering. Fix: Use version-controlled templates or code generation tools (e.g., Jinja2, Handlebars, or TypeScript AST transformers). Enforce consistent formatting with linters/formatters (Prettier, ESLint) before and after patching.
7. Missing Pre-Execution Validation
Explanation: Tasks targeting existing files slip through PRD reviews and trigger deadlocks during execution, wasting compute and session time.
Fix: Implement a validation pass that scans the project directory and flags any task whose targetPath already exists. Reject or rewrite the task before it enters the execution queue.
Production Bundle
Action Checklist
- Audit existing task definitions: Ensure every autonomous task targets a file path that does not currently exist in the repository.
- Implement SHA-256 hash detection: Compare generated output against input context during the GREEN phase. Terminate retry loops on match.
- Decouple infrastructure generation: Move all existing-file mutations to deterministic setup scripts using AST patchers or templating engines.
- Validate dependency graphs: Map task dependencies and parallelize independent modules to prevent cascading deadlocks.
- Enforce formatting consistency: Run linters and formatters before and after file generation to prevent diff/patch failures.
- Track empirical pass rates: Log first-pass success rates per task type. Adjust retry budgets and architecture based on observed data.
- Review file fragmentation thresholds: Monitor module count and import complexity. Consolidate related logic when approaching maintenance overhead limits.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Greenfield module creation | Clean-sheet generation | LLMs excel at first-pass creation; no context attractor | Low compute, high reliability |
| Legacy codebase patching | Deterministic AST/regex patching | Avoids diff generation instability; precise line targeting | Medium setup cost, high predictability |
| High-complexity async routes | Modular generation + barrel exports | Prevents import spaghetti; isolates mutation risk | Slightly higher file count, improved maintainability |
| Rapid prototyping | Cloud API with relaxed constraints | Faster iteration; acceptable flakiness for throwaway code | Higher API cost, lower engineering overhead |
| Production TDD loops | New-file only + setup scripts | Eliminates identical GREEN deadlocks; enforces testable boundaries | Initial architecture shift, long-term stability |
Configuration Template
// prd-validator.ts
import { existsSync } from 'fs';
import { resolve } from 'path';
interface PRDTask {
id: string;
targetPath: string;
description: string;
}
export function validatePRD(tasks: PRDTask[], projectRoot: string): string[] {
const violations: string[] = [];
for (const task of tasks) {
const fullPath = resolve(projectRoot, task.targetPath);
if (existsSync(fullPath)) {
violations.push(
`Task ${task.id} targets existing file: ${task.targetPath}. ` +
`Clean-sheet constraint violated. Rewrite task to generate a new module.`
);
}
}
return violations;
}
// Usage in CI/CD or pre-execution hook
const tasks: PRDTask[] = [
{ id: 'TASK-001', targetPath: 'src/modules/analytics.ts', description: 'Implement event tracking' },
{ id: 'TASK-002', targetPath: 'src/router.ts', description: 'Add analytics route' } // Will trigger violation
];
const violations = validatePRD(tasks, process.cwd());
if (violations.length > 0) {
console.error('PRD Validation Failed:\n', violations.join('\n'));
process.exit(1);
}
Quick Start Guide
- Audit your task queue: Scan all pending autonomous tasks. Flag any that reference existing file paths. Rewrite them to target new modules.
- Create a setup script: Extract all infrastructure generation, routing registration, and existing-file mutations into a deterministic script. Use AST patchers or templating for reliability.
- Integrate hash detection: Add SHA-256 comparison to your GREEN phase execution loop. Terminate retries immediately when output matches input.
- Run validation pre-execution: Execute the PRD validator against your project directory. Block any task that violates the clean-sheet constraint.
- Execute and monitor: Run the pipeline. Log pass/fail rates, cycle counts, and deadlock triggers. Adjust architecture based on empirical telemetry.
This pattern shifts complexity from generation-time editing to project-level organization. It is not a universal solution, but it is a production-tested boundary condition for autonomous coding loops. When modification deadlocks dominate your execution metrics, restructuring the workflow around clean-sheet generation is the most reliable path to stable, repeatable results.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
