Stop Paying Your AI Agents to Re-Learn the Same Site
The Amnesia Tax in AI Web Agents and How to Compile It Away
Current Situation Analysis
Production AI agents that interact with live websites operate under a fundamental architectural constraint: they are stateless across execution boundaries. When an agent completes a task, its working memory is discarded. The next time the same task is triggered against the same domain, the model must re-navigate the homepage, re-identify interactive elements, re-discover pagination patterns, and re-learn error handling. This creates a recurring discovery tax that scales linearly with execution volume.
The industry has largely overlooked this bottleneck because development efforts have been concentrated on two fronts: expanding context windows and improving base model reasoning. A million-token context window improves single-session depth, but it does not solve cross-session persistence. Once the process terminates, the learned navigation graph vanishes. Better reasoning applied to a stateless loop simply accelerates rediscovery; it does not eliminate it.
Browserbase's open-source convergence workflow (released early May 2026) exposed the financial impact of this architectural gap. By allowing an agent to iterate against a live target until execution stabilizes, then exporting the successful pattern into a durable artifact, organizations can decouple task execution from discovery overhead. The published benchmarks demonstrate the magnitude of the inefficiency:
- Craigslist search operations dropped from ~$0.22 per run (71s) to ~$0.12 (27s)
- Multi-step form filling fell from $1.40 to $0.24 across four iterations
- A federal grants portal scrape collapsed from 28 paginated requests to a single undocumented JSON endpoint
The pattern draws direct inspiration from Karpathy's Autoresearch harness, adapting a single-metric, time-boxed optimization loop from machine learning experimentation to web navigation. The critical insight is not model intelligence; it is knowledge persistence.
WOW Moment: Key Findings
The convergence-to-artifact pipeline fundamentally alters the cost structure of repeated web automation. The following comparison illustrates the operational shift when discovery is compiled into a reusable blueprint rather than recomputed per execution.
| Execution Mode | Avg. Cost/Run | Avg. Latency | Token Consumption | Discovery Overhead |
|---|---|---|---|---|
| Stateless Agent (Baseline) | $0.22 | 71s | High | 100% per run |
| Converged Blueprint | $0.12 | 27s | Low | 0% per run |
| Complex Form-Fill (Baseline) | $1.40 | ~180s | Very High | 100% per run |
| Complex Form-Fill (Blueprint) | $0.24 | ~35s | Low | 0% per run |
This finding matters because it shifts the optimization target from model selection to workflow persistence. The converged artifact captures undocumented endpoints, required headers, geolocation overrides, pagination batch sizes, and failure recovery paths. Subsequent executions bypass the exploration phase entirely, reading the compiled instructions and executing them deterministically. The model's role transitions from navigator to executor, which dramatically reduces token burn and latency.
Core Solution
The architecture replaces per-run discovery with a compile-then-execute pipeline. The system runs an iterative refinement loop, monitors execution traces for stability, and graduates the stable workflow into a structured markdown artifact. Future runs load the artifact and skip the exploration phase.
Step 1: Define the Convergence Loop
The orchestrator executes the target task, captures network traffic and DOM interactions, analyzes deviations from the expected path, and adjusts the strategy. This repeats until performance metrics (success rate, step count, error frequency) stabilize across consecutive runs.
Step 2: Extract the Stable Workflow
Once convergence criteria are met, the system parses the execution trace and generates a structured blueprint. The artifact documents:
- Target endpoints and required headers
- DOM selectors or navigation sequences
- Pagination or rate-limiting parameters
- Error recovery procedures
- Input/output schemas
Step 3: Load and Execute
Subsequent agent invocations read the blueprint, validate it against the current target state, and execute the documented steps. The model no longer reasons about site structure; it follows compiled instructions.
Implementation Architecture (TypeScript)
import { readFileSync, writeFileSync, existsSync } from 'fs';
import { join } from 'path';
interface ExecutionTrace {
runId: string;
steps: string[];
errors: string[];
latencyMs: number;
tokenCost: number;
success: boolean;
}
interface ConvergenceConfig {
maxIterations: number;
stabilityThreshold: number;
outputDir: string;
taskName: string;
}
class WorkflowCompiler {
private config: ConvergenceConfig;
private executionHistory: ExecutionTrace[] = [];
constructor(config: ConvergenceConfig) {
this.config = config;
}
async runConvergenceLoop(executeTask: () => Promise<ExecutionTrace>) {
console.log(`[Compiler] Starting convergence loop for: ${this.config.taskName}`);
for (let i = 0; i < this.config.maxIterations; i++) {
const trace = await executeTask();
this.executionHistory.push(trace);
console.log(`[Compiler] Run ${i + 1} | Success: ${trace.success} | Cost: $${trace.tokenCost.toFixed(2)} | Latency: ${trace.latencyMs}ms`);
if (this.hasConverged()) {
console.log(`[Compiler] Convergence detected after ${i + 1} iterations.`);
return this.generateBlueprint();
}
}
throw new Error('[Compiler] Max iterations reached without convergence.');
}
private hasConverged(): boolean {
if (this.executionHistory.length < 3) return false;
const recent = this.executionHistory.slice(-3);
const allSuccessful = recent.every(r => r.success);
const costVariance = this.calculateVariance(recent.map(r => r.tokenCost));
const latencyVariance = this.calculateVariance(recent.map(r => r.latencyMs));
return allSuccessful &&
costVariance < this.config.stabilityThreshold &&
latencyVariance < this.config.stabilityThreshold;
}
private calculateVariance(values: number[]): number {
const mean = values.reduce((a, b) => a + b, 0) / values.length;
return values.reduce((sum, val) => sum + Math.pow(val - mean, 2), 0) / values.length;
}
private generateBlueprint(): string {
const stableTrace = this.executionHistory[this.executionHistory.length - 1];
const blueprintPath = join(this.config.outputDir, `${this.config.taskName}_blueprint.md`);
const content = `# Execution Blueprint: ${thi
s.config.taskName}
Status: COMPILED
Generated: ${new Date().toISOString()}
Source Runs: ${this.executionHistory.length}
Target Configuration
- Endpoint: ${this.extractEndpoint(stableTrace)}
- Required Headers: Referer, Accept: application/json
- Pagination: batch_size=50, offset_parameter=page
Navigation Sequence
- Initialize session with geolocation override
- Query category enum: ${this.extractCategory(stableTrace)}
- Execute batch fetch with pagination loop
- Validate response schema before persistence
Error Recovery
- If 403: Rotate Referer header and retry once
- If timeout: Reduce batch_size to 25
- If schema mismatch: Fallback to DOM parsing
Performance Baseline
-
Avg Cost: $${stableTrace.tokenCost.toFixed(2)}
-
Avg Latency: ${stableTrace.latencyMs}ms `;
writeFileSync(blueprintPath, content); console.log(
[Compiler] Blueprint written to: ${blueprintPath}); return blueprintPath; }private extractEndpoint(trace: ExecutionTrace): string { return trace.steps.find(s => s.includes('api.')) || 'unknown_endpoint'; }
private extractCategory(trace: ExecutionTrace): string { return trace.steps.find(s => s.includes('category=')) || 'default'; } }
// Usage Example async function main() { const compiler = new WorkflowCompiler({ maxIterations: 5, stabilityThreshold: 0.05, outputDir: './compiled_skills', taskName: 'craigslist_search' });
await compiler.runConvergenceLoop(async () => { // Simulate agent execution against live target return { runId: crypto.randomUUID(), steps: ['init_session', 'query_api', 'paginate_results', 'validate_schema'], errors: [], latencyMs: 28000, tokenCost: 0.12, success: true }; }); }
main().catch(console.error);
### Architecture Decisions and Rationale
1. **Markdown Artifacts Over JSON/YAML**: The compiled workflow uses markdown because it remains human-readable, diff-friendly in version control, and requires zero schema validation overhead. AI models parse structured markdown natively without additional tokenization steps.
2. **Convergence Criteria Based on Variance**: Stability is measured through cost and latency variance across the last three runs, not arbitrary iteration counts. This prevents premature graduation and ensures the workflow has genuinely stabilized.
3. **Separation of Compilation and Execution**: The compiler runs only during the discovery phase. Production pipelines load pre-compiled blueprints, eliminating iterative overhead from live traffic.
4. **Explicit Error Recovery Paths**: The blueprint documents fallback procedures rather than relying on the model to improvise during failures. This reduces token consumption during error states and improves predictability.
## Pitfall Guide
### 1. Iterating on Static Content
**Explanation**: Applying the convergence loop to fixed HTML catalogs or server-rendered pages with stable schemas wastes tokens and execution budget. The agent will repeatedly rediscover the same DOM structure without finding optimization opportunities.
**Fix**: Run a lightweight schema detection pass first. If the target returns consistent HTML with predictable selectors, bypass the loop and use a static parser (e.g., BeautifulSoup, Cheerio). Reserve iteration for JS-heavy, gated, or undocumented surfaces.
### 2. Blind Trust in Undocumented Endpoints
**Explanation**: The convergence loop frequently surfaces hidden JSON APIs or internal routes. These endpoints lack SLAs, may change without notice, and often bypass official rate limits.
**Fix**: Treat discovered endpoints as provisional. Implement response schema validation, cache control headers, and fallback routing to official APIs or DOM parsing. Log endpoint stability metrics and trigger re-compilation if failure rates exceed thresholds.
### 3. Ignoring Target Site Drift
**Explanation**: Web applications update frequently. A compiled blueprint becomes stale when selectors change, pagination parameters shift, or authentication flows are modified.
**Fix**: Implement drift detection by monitoring execution success rates and latency spikes. Schedule periodic re-compilation during low-traffic windows. Version blueprints and maintain rollback paths to previous stable states.
### 4. Confusing Context Window with Cross-Run Memory
**Explanation**: Teams often assume larger context windows solve persistence. A million-token window only extends single-session reasoning depth; it does not survive process termination.
**Fix**: Architect for explicit state persistence. Use compiled artifacts, external knowledge bases, or vector stores for cross-run memory. Treat context windows as working memory, not long-term storage.
### 5. Over-Engineering the Artifact Format
**Explanation**: Attempting to serialize workflows into complex schemas, binary formats, or database tables adds parsing overhead and breaks human auditability.
**Fix**: Stick to structured markdown. Include clear sections for configuration, sequence, error handling, and performance baselines. Keep the artifact under 2KB to minimize token consumption during load.
### 6. Missing Convergence Criteria
**Explanation**: Running the loop indefinitely or stopping after a fixed iteration count leads to either wasted budget or unstable blueprints.
**Fix**: Define quantitative stability thresholds (cost variance, latency variance, success rate). Require a minimum of three consecutive stable runs before graduation. Implement early termination if metrics degrade.
### 7. Lack of Execution Fallbacks
**Explanation**: Blueprints assume the happy path. When network conditions change or target behavior shifts, agents following rigid instructions fail without recovery.
**Fix**: Embed conditional branches in the blueprint. Document primary and secondary approaches. Implement a lightweight validator that checks target responsiveness before executing the compiled sequence.
## Production Bundle
### Action Checklist
- [ ] Audit target sites for static vs. dynamic content before deploying the convergence loop
- [ ] Define quantitative convergence thresholds (cost variance < 5%, latency variance < 10%, 100% success rate)
- [ ] Implement drift detection monitoring on compiled blueprints with automated re-compilation triggers
- [ ] Validate all discovered undocumented endpoints with schema checks and fallback routing
- [ ] Version control all compiled artifacts and maintain rollback capabilities
- [ ] Budget token consumption separately for compilation phase vs. execution phase
- [ ] Implement lightweight target health checks before loading blueprints in production runs
- [ ] Document error recovery paths explicitly in the artifact rather than relying on model improvisation
### Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|----------|---------------------|-----|-------------|
| Static HTML catalog with fixed schema | Static parser (Cheerio/BeautifulSoup) | No discovery overhead; deterministic extraction | ~$0.01/run |
| JS-heavy SPA with undocumented APIs | Convergence loop → Compiled blueprint | Iteration surfaces hidden endpoints and pagination logic | Compilation: ~$2-5/run; Execution: ~$0.12/run |
| Gated portal with auth flows | Hybrid: Compile auth sequence, parse content | Reduces token burn on repeated authentication | Compilation: ~$3/run; Execution: ~$0.24/run |
| High-frequency data feed (>10k runs/day) | API-first or official endpoint | Compiled blueprints still incur load overhead; native APIs are cheaper | ~$0.005/run (if available) |
| Rapidly changing target site | Scheduled re-compilation + drift monitoring | Blueprints decay quickly; automation prevents stale execution | Compilation: ~$2/run; Execution: ~$0.15/run |
### Configuration Template
```typescript
// compiler.config.ts
export const compilationConfig = {
taskName: 'federal_grants_portal',
outputDir: './compiled_skills',
maxIterations: 5,
stabilityThreshold: 0.05,
convergenceMetrics: {
maxCostVariance: 0.05,
maxLatencyVariance: 0.10,
requiredSuccessRate: 1.0,
minStableRuns: 3
},
driftDetection: {
enabled: true,
checkInterval: '24h',
failureThreshold: 0.15,
autoRecompile: true
},
security: {
validateUndocumentedEndpoints: true,
requireSchemaValidation: true,
fallbackToDomParsing: true
}
};
// blueprint_loader.ts
export class BlueprintExecutor {
async loadAndExecute(blueprintPath: string, targetUrl: string) {
if (!this.validateBlueprint(blueprintPath)) {
throw new Error('Blueprint validation failed. Trigger re-compilation.');
}
const blueprint = this.parseMarkdown(blueprintPath);
return this.executeCompiledSteps(blueprint, targetUrl);
}
private validateBlueprint(path: string): boolean {
// Check file existence, version compatibility, and drift indicators
return existsSync(path);
}
private parseMarkdown(path: string) {
// Extract sections: Configuration, Sequence, Error Recovery, Performance
return { /* structured object */ };
}
private async executeCompiledSteps(blueprint: any, target: string) {
// Execute documented sequence with embedded fallbacks
return { success: true, latency: 27000, cost: 0.12 };
}
}
Quick Start Guide
- Install the convergence package: Add Browserbase's open-source skills plugin to your agent environment via the Claude Agent SDK marketplace (
/plugin marketplace add browserbase/skills). - Configure compilation parameters: Set target task name, output directory, iteration limits, and stability thresholds in your compiler configuration file.
- Run the discovery phase: Execute the convergence loop against your target site. Monitor iteration logs until cost and latency variance stabilize across three consecutive runs.
- Validate the compiled artifact: Review the generated markdown blueprint for accuracy, verify undocumented endpoints with schema checks, and test execution against a staging environment.
- Deploy to production: Replace stateless agent calls with blueprint loader invocations. Enable drift detection monitoring and schedule periodic re-compilation during maintenance windows.
