Beyond the JSON Blob: Incremental State Accumulation for Reliable LLM Outputs

Current Situation Analysis

Large language models excel at pattern completion and natural language generation. They perform poorly when forced to act as deterministic data serializers. When engineering teams push LLMs beyond conversational prototypes into production pipelines, the first major fracture point is almost always structured output generation. Requesting a complete JSON object in a single completion pass introduces systemic fragility that schema validation alone cannot fix.

The industry pain point is not syntax; it is cognitive load. LLMs operate as stateless predictors with finite working memory. When you ask a model to ingest a 50-page document, cross-reference external APIs, maintain a mental model of relationships, and simultaneously emit a perfectly typed 200-line JSON payload, you are asking it to perform three distinct computational tasks in a single forward pass. The result is predictable: schema drift, hallucinated fields, type mismatches, and silent truncation when the output token budget is exhausted.

This problem is frequently misunderstood because teams conflate syntactic correctness with semantic reliability. Modern provider APIs offer response_format: json_schema (OpenAI) or structured result schemas (AWS Bedrock, Anthropic). These features guarantee that the output will parse as valid JSON and conform to the declared type structure. They do not guarantee that the model will populate every required field, maintain logical consistency across nested objects, or avoid inventing data to satisfy mandatory constraints. The model still generates the entire structure in one shot, which means a single hallucination or context overflow invalidates the complete payload.

Empirical observations from production agent deployments confirm this limitation. Frameworks like Kiro CLI initially struggled with large structured outputs, forcing maintainers to implement heavy-handed workarounds: aggressive grep/tail filtering, external jq pipelines, and context window budgeting strategies. The underlying issue remains consistent across stacks: monolithic generation scales inversely with input complexity. As document volume increases, the probability of a clean, complete structured output approaches zero.

WOW Moment: Key Findings

Shifting from monolithic generation to incremental state accumulation fundamentally changes how LLMs interact with structured data. Instead of treating the output as a final artifact, we treat it as a side effect of controlled tool invocations. The model never sees or produces the complete structure. It only calls discrete functions that mutate an external state container.

Approach	Context Efficiency	Error Recovery Latency	Schema Drift Rate	Validation Overhead
Monolithic JSON Generation	Low (output consumes tokens)	High (full regeneration required)	18-34% (varies by model)	Post-hoc parsing & retry loops
Incremental Tool Accumulation	High (state lives outside context)	Near-zero (partial state preserved)	<2% (enforced at tool boundary)	Real-time, per-call validation

This finding matters because it decouples reasoning from serialization. The model can focus entirely on information extraction, cross-referencing, and decision-making while the structured output emerges deterministically from validated tool calls. Context window exhaustion no longer destroys collected data, because the accumulator exists in application memory, not in the message history. Crash recovery becomes trivial: partial state is already persisted. Teams can compress conversation history aggressively without losing a single extracted field.

Core Solution

The architectural pattern is straightforward: implement the Builder pattern via LLM tool definitions. Each tool represents a single, typed mutation to an external state object. The model's job shifts from "generate this structure" to "invoke these functions in logical sequence." Validation occurs at the tool boundary. State management occurs outside the context window.

Step 1: Define the External State Container

The accumulator must be strictly typed and isolated from the conversation history. It should expose controlled mutation methods rather than direct property access.

import { z } from 'zod';

const AuditStateSchema = z.object({
  entities: z.array(z.object({
    id: z.string().uuid(),
    name: z.string().min(1),
    type: z.enum(['vendor', 'system', 'user', 'regulator']),
    metadata: z.record(z.string()).optional()
  })),
  timeline: z.array(z.object({
    timestamp: z.string().datetime(),
    event: z.string(),
    source_ref: z.string().optional()
  })),
  findings: z.array(z.object({
    severity: z.enum(['low', 'medium', 'high', 'critical']),
    description: z.string(),
    evidence_ids: z.array(z.string().uuid()),
    status: z.enum(['open', 'investigating', 'resolved'])
  })),
  summary: z.string().nullable()
});

type AuditState = z.infer<typeof AuditStateSchema>;

class StateAccumulator {
  private state: AuditState;

  constructor() {
    this.state = {
      entities: [],
      timeline: [],
      findings: [],
      summary: null
    };
  }

  getState(): Readonly<AuditState> {
    return structuredClone(this.state);
  }

  // Controlled mutations with validation hooks
  addEntity(entity: AuditState['entities'][number]): string {
    AuditStateSchema.shape.entities.element.parse(entity);
    this.state.entities.push(entity);
    return `Entity registered: ${entity.name} (${entity.type})`;
  }

  logTimelineEntry(entry: AuditState['timeline'][number]): string {
    AuditStateSchema.shape.timeline.element.parse(entry);
    this.state.timeline.push(entry);
    return `Timeline updated: ${this.state.timeline.length} entries recorded`;
  }

  recordFinding(finding: AuditState['findings'][number]): string {
    AuditStateSchema.shape.findings.element.parse(finding);
    this.state.findings.push(finding);
    return `Finding logged: ${finding.severity} severity`;
  }

  finalizeSummary(text: string): string {
    this.state.summary = text;
    return 'Audit summary finalized';
  }
}

Step 2: Map Tools to State Mutations

Tool definitions should mirror the accumulator's mutation methods. Each tool receives strictly typed parameters, validates them at the boundary, and returns deterministic feedback. The model never constructs the final object.

import { ToolDefinition } from '@anthropic-ai/sdk';

const tools: ToolDefinition[] = [
  {
    name: 'register_entity',
    description: 'Record a new entity involved in the audit scope. Call once per unique entity.',
    input_schema: {
      type: 'object',
      properties: {
        id: { type: 'string', format: 'uuid' },
        name: { type: 'string', minLength: 1 },
        type: { type: 'string', enum: ['vendor', 'system', 'user', 'regulator'] },
        metadata: { type: 'object', additionalProperties: { type: 'string' } }
      },
      required: ['id', 'name', 'type']
    }
  },
  {
    name: 'log_event',
    description: 'Append a chronological event to the audit timeline. Dates must be ISO 8601.',
    input_schema: {
      type: 'object',
      properties: {
        timestamp: { type: 'string', format: 'date-time' },
        event: { type: 'string', minLength: 10 },
        source_ref: { type: 'string' }
      },
      required: ['timestamp', 'event']
    }
  },
  {
    name: 'create_finding',
    description: 'Document an audit finding with severity and evidence references.',
    input_schema: {
      type: 'object',
      properties: {
        severity: { type: 'string', enum: ['low', 'medium', 'high', 'critical'] },
        description: { type: 'string', minLength: 20 },
        evidence_ids: { type: 'array', items: { type: 'string', format: 'uuid' } },
        status: { type: 'string', enum: ['open', 'investigating', 'resolved'] }
      },
      required: ['severity', 'description', 'evidence_ids']
    }
  },
  {
    name: 'finalize_report',
    description: 'Write the executive summary and mark the audit as complete.',
    input_schema: {
      type: 'object',
      properties: {
        summary_text: { type: 'string', minLength: 50 }
      },
      required: ['summary_text']
    }
  }
];

Step 3: Implement Boundary Validation & Feedback

Validation must occur synchronously during tool execution. The model receives immediate feedback, allowing it to self-correct in the next turn. This eliminates post-hoc parsing failures.

async function executeToolCall(toolName: string, args: Record<string, unknown>, accumulator: StateAccumulator): Promise<string> {
  switch (toolName) {
    case 'register_entity': {
      const parsed = AuditStateSchema.shape.entities.element.parse(args);
      const exists = accumulator.getState().entities.some(e => e.id === parsed.id);
      if (exists) return `Warning: Entity ${parsed.id} already registered. Skipping duplicate.`;
      return accumulator.addEntity(parsed);
    }
    case 'log_event': {
      const parsed = AuditStateSchema.shape.timeline.element.parse(args);
      return accumulator.logTimelineEntry(parsed);
    }
    case 'create_finding': {
      const parsed = AuditStateSchema.shape.findings.element.parse(args);
      const state = accumulator.getState();
      const missingEvidence = parsed.evidence_ids.filter(id => 
        !state.entities.some(e => e.id === id)
      );
      if (missingEvidence.length > 0) {
        return `Error: Evidence IDs not found in entity registry: ${missingEvidence.join(', ')}. Register entities first.`;
      }
      return accumulator.recordFinding(parsed);
    }
    case 'finalize_report': {
      const { summary_text } = args as { summary_text: string };
      return accumulator.finalizeSummary(summary_text);
    }
    default:
      return `Error: Unknown tool ${toolName}`;
  }
}

Step 4: Decouple Reasoning from State Construction

The agent loop should interleave reading tools (document parsing, API queries, search) with writing tools (state mutations). This prevents the model from committing to a structure before gathering all facts.

async function runAuditAgent(context: string, accumulator: StateAccumulator) {
  const conversationHistory: Array<{ role: 'user' | 'assistant' | 'tool', content: string }> = [
    { role: 'user', content: `Process the following audit materials and build a structured report:\n\n${context}` }
  ];

  let turn = 0;
  const MAX_TURNS = 25;

  while (turn < MAX_TURNS) {
    const response = await llmProvider.complete({
      messages: conversationHistory,
      tools: tools,
      tool_choice: 'auto'
    });

    if (response.stop_reason === 'tool_use') {
      for (const toolCall of response.tool_calls) {
        const result = await executeToolCall(toolCall.name, toolCall.input, accumulator);
        conversationHistory.push({
          role: 'tool',
          tool_call_id: toolCall.id,
          content: result
        });
      }
    } else if (response.stop_reason === 'end_turn') {
      conversationHistory.push({ role: 'assistant', content: response.content });
    }

    // Context compression: replace older messages with state snapshot
    if (conversationHistory.length > 12) {
      const stateSnapshot = JSON.stringify(accumulator.getState(), null, 2);
      conversationHistory.splice(1, conversationHistory.length - 3, {
        role: 'system',
        content: `[CONTEXT COMPRESSED] Current audit state:\n${stateSnapshot}\nContinue processing remaining materials.`
      });
    }

    if (response.stop_reason === 'end_turn' && accumulator.getState().summary !== null) break;
    turn++;
  }

  return accumulator.getState();
}

Architecture Rationale

External State Container: Keeps structured data out of the context window. Prevents token budget exhaustion from corrupting outputs.
Tool-as-Mutation Pattern: Forces the model to operate incrementally. Each call is a discrete, validated step rather than a monolithic guess.
Boundary Validation: Catches type mismatches, missing references, and logical inconsistencies immediately. The model self-corrects before errors compound.
Context Compression: Replaces verbose conversation history with a compact state snapshot. Preserves all extracted data while freeing tokens for reasoning.
Decoupled Reading/Writing: Allows the model to gather facts, cross-reference sources, and only commit to the accumulator when confidence is high.

Pitfall Guide

1. Tool Proliferation

Explanation: Defining dozens of hyper-granular tools increases cognitive load and token consumption. The model wastes turns deciding which tool to call instead of processing data. Fix: Group related mutations under cohesive tools. Use optional parameters for edge cases. Aim for 4-8 core tools per workflow.

2. Missing Idempotency Guards

Explanation: LLMs may retry tool calls or duplicate entities when context is compressed or retries occur. Without idempotency checks, the accumulator contains duplicates. Fix: Always check for existing records before insertion. Return deterministic warnings for duplicates. Use UUIDs or composite keys for identity resolution.

3. Synchronous State Blocking

Explanation: Blocking the agent loop while waiting for external validation or database writes stalls reasoning. The model times out or hallucinates to fill gaps. Fix: Keep state mutations in-memory during the session. Persist to durable storage asynchronously after the run completes. Use optimistic updates with rollback capabilities.

4. Over-Compressing Context

Explanation: Replacing too much conversation history with state snapshots removes critical reasoning traces. The model loses track of why certain decisions were made. Fix: Preserve the last 2-3 assistant turns alongside the state snapshot. Compress only older user/tool exchanges. Maintain a separate reasoning log if audit trails are required.

5. Ignoring Tool-Call Rate Limits

Explanation: Provider APIs enforce strict rate limits on tool invocations. Incremental patterns naturally increase call volume, triggering 429 errors mid-workflow. Fix: Implement exponential backoff with jitter. Batch non-critical mutations where possible. Monitor token-to-call ratios and adjust tool granularity accordingly.

6. Schema Drift in Accumulator

Explanation: Over time, application code changes the state schema without updating tool definitions. The model continues calling outdated signatures, causing silent failures. Fix: Version your state schema. Run integration tests that validate tool signatures against the accumulator. Use code generation to sync TypeScript/Zod types with tool schemas automatically.

7. Validation Feedback Loops

Explanation: Returning vague error messages forces the model to guess corrections. "Invalid input" without specifics causes infinite retry loops. Fix: Provide exact field names, expected types, and missing references in error responses. Example: Error: evidence_ids [abc-123] not found. Call register_entity first.

Production Bundle

Action Checklist

Define external state schema with strict typing (Zod/Pydantic) before writing tools
Map each tool to a single, idempotent state mutation with boundary validation
Implement context compression that preserves state snapshots alongside recent turns
Add duplicate detection and UUID-based identity resolution to all insertion tools
Configure rate limit handling with exponential backoff and jitter
Version state schemas and sync tool definitions via automated code generation
Test incremental workflows with synthetic documents containing missing/contradictory data
Log all tool calls and state transitions for post-run auditability

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Small documents (<5 pages), simple schemas	Monolithic JSON with `response_format`	Lower latency, fewer API calls	Baseline
Large documents (>20 pages), complex relationships	Incremental tool accumulation	Prevents context overflow, enables validation	+15-30% token cost
Multi-step reasoning with external API calls	Hybrid: reading tools + incremental writers	Decouples data gathering from serialization	+20-40% token cost
High-throughput batch processing	Stream-based JSONL with post-validation	Optimizes for throughput over interactivity	-10% token cost, higher compute

Configuration Template

// agent.config.ts
export const AGENT_CONFIG = {
  model: 'claude-sonnet-4-20250514',
  maxTokens: 8192,
  temperature: 0.1,
  toolChoice: 'auto',
  contextWindow: 200000,
  compressionThreshold: 12, // messages before compression
  compressionStrategy: 'state_snapshot',
  validation: {
    strictMode: true,
    idempotencyCheck: true,
    referenceValidation: true
  },
  retryPolicy: {
    maxAttempts: 3,
    backoffBase: 1000,
    jitter: true
  }
};

// state.schema.ts
import { z } from 'zod';

export const ProductionStateSchema = z.object({
  records: z.array(z.object({
    id: z.string().uuid(),
    source: z.string(),
    extracted_fields: z.record(z.unknown()),
    confidence: z.number().min(0).max(1)
  })),
  cross_references: z.array(z.object({
    from_id: z.string().uuid(),
    to_id: z.string().uuid(),
    relationship: z.string()
  })),
  metadata: z.object({
    processed_at: z.string().datetime(),
    total_records: z.number(),
    version: z.string()
  })
});

Quick Start Guide

Initialize the accumulator: Create a state container with strict schema validation. Export controlled mutation methods instead of direct property access.
Define 4-6 core tools: Map each tool to a single state mutation. Include input validation, idempotency checks, and descriptive error messages.
Wire the agent loop: Pass tools to the LLM provider. Execute tool calls synchronously, append results to conversation history, and trigger context compression when thresholds are met.
Test with edge cases: Run the workflow against documents with missing fields, contradictory data, and partial information. Verify that the model self-corrects using tool feedback.
Deploy with observability: Log all tool invocations, state transitions, and compression events. Monitor token-to-call ratios and adjust tool granularity based on production metrics.

Générer des données structurées avec un LLM : quelques astuces pour plus de fiabilité