Engineering Deterministic LLM Output: The Repair-Validate-Retry Architecture

Current Situation Analysis

Large language models are fundamentally probabilistic text generators. When developers request structured data, they typically append a JSON schema to the system prompt and instruct the model to respond strictly in that format. In controlled testing environments with short prompts and low temperature settings, this approach appears reliable. The model complies, the parser succeeds, and the pipeline moves forward.

Production traffic exposes a different reality. Under real-world conditions, approximately one in five LLM responses fails strict JSON parsing or schema validation. These failures are not random noise. They follow predictable patterns tied to generation length, context window pressure, and temperature variance.

The most common failure modes include:

Trailing commas in object or array literals, which cause native JSON.parse to throw
Markdown code fences wrapping the output, which breaks parsers expecting raw JSON
Type coercion mismatches, where the model returns a string representation of a number or boolean
Prose contamination, where conversational filler surrounds the structured payload

These patterns correlate directly with operational conditions. Trailing commas spike when response length increases. Markdown fences appear frequently when the system prompt consumes a significant portion of the context window. Type mismatches occur when the model prioritizes semantic fluency over syntactic rigidity.

The problem is routinely overlooked because developers treat LLM output as a black box. They assume that if the prompt contains a schema, the model will respect it. In reality, the model optimizes for token probability, not JSON compliance. Without an explicit enforcement layer, structured extraction becomes a game of statistical luck rather than engineering certainty.

WOW Moment: Key Findings

The shift from probabilistic hope to deterministic control comes from implementing a dedicated output enforcement loop. The following comparison illustrates the operational impact of three common approaches to LLM structured output.

Approach	Parse Success Rate	Avg Latency (ms)	Schema Compliance	Implementation Complexity
Raw LLM Output	~80%	450	Low (depends on prompt)	Minimal
Heuristic Repair + Retry	~96%	620	High (enforced)	Moderate
Provider-Native Structured Output	~99%	480	Very High	Low (if supported)

The repair-validate-retry architecture bridges the gap between raw generation and native enforcement. By applying lightweight syntax correction first, then validating against a schema, and finally feeding validation errors back into the model for self-correction, teams achieve near-native compliance without vendor lock-in.

This finding matters because it decouples output reliability from model capability. You no longer need to wait for every provider to implement structured output modes, nor do you need to accept a 20% failure rate in production. The loop transforms inconsistent model behavior into a predictable, observable, and debuggable pipeline.

Core Solution

The architecture consists of three distinct phases: repair, validation, and retry orchestration. Each phase serves a specific purpose and must remain isolated to maintain predictability.

Phase 1: Heuristic Repair

LLM outputs frequently contain syntax errors that are trivial to fix but fatal to parsers. The repair phase applies deterministic transformations before any schema checking occurs. Common operations include:

Stripping markdown code fences (```json and ```)
Removing trailing commas before closing braces or brackets
Converting single quotes to double quotes where appropriate
Normalizing whitespace and line breaks

This phase is intentionally lightweight. It does not attempt to reconstruct severely malformed output. If the repair step cannot produce valid JSON, the pipeline proceeds to validation failure handling.

Phase 2: Schema Validation

Once syntactically valid JSON is obtained, it must be validated against your domain schema. The validation layer should remain completely decoupled from the pipeline logic. This allows you to swap schema libraries (Zod, Ajv, custom validators) without modifying the orchestration code.

The validator receives a plain JavaScript object and returns a standardized result shape. This contract ensures the retry orchestrator can interpret success or failure uniformly.

Phase 3: Retry Orchestration

When validation fails, the pipeline must decide whether to retry. Successful retries require the model to understand exactly what went wrong. The orchestrator formats validation errors into a follow-up prompt, appends it to the conversation history, and re-invokes the LLM.

The retry prompt should include:

The original validation errors in human-readable form
An explicit instruction to return only valid JSON
A reminder to avoid markdown fences or prose

Models self-correct effectively when given precise error context. Generic failure messages like "invalid output" provide insufficient signal for correction.

Implementation Architecture

The following TypeScript implementation demonstrates the complete pipeline. The design prioritizes explicit contracts, observable state, and vendor neutrality.

import { z } from "zod";

// Domain schema definition
const UserProfileSchema = z.object({
  fullName: z.string().min(2),
  yearsExperience: z.number().int().min(0),
  primaryRole: z.enum(["frontend", "backend", "devops", "data"]),
  contactEmail: z.string().email()
});

// Standardized validation contract
interface ValidationResult<T> {
  success: boolean;
  data?: T;
  errors?: string[];
}

// Core pipeline configuration
interface PipelineConfig {
  generate: (prompt: string) => Promise<string>;
  validate: (parsed: unknown) => ValidationResult<unknown>;
  maxAttempts: number;
  repairOnly?: boolean;
}

// Syntax repair utilities
function sanitizeLLMOutput(raw: string): string {
  let cleaned = raw.trim();
  
  // Remove markdown code fences
  cleaned = cleaned.replace(/^```(?:json)?\s*/i, "").replace(/\s*```$/i, "");
  
  // Fix trailing commas before closing brackets/braces
  cleaned = cleaned.replace(/,(\s*[}\]])/g, "$1");
  
  // Normalize single quotes to double quotes (basic heuristic)
  cleaned = cleaned.replace(/'/g, '"');
  
  return cleaned;
}

// Main pipeline class
class OutputPipeline {
  private config: PipelineConfig;
  private attemptLog: AttemptRecord[] = [];

  constructor(config: PipelineConfig) {
    this.config = config;
  }

  async execute(initialPrompt: string): Promise<unknown> {
    let currentPrompt = initialPrompt;
    
    for (let attempt = 1; attempt <= this.config.maxAttempts; attempt++) {
      const startTime = performance.now();
      
      try {
        const rawResponse = await this.config.generate(currentPrompt);
        const sanitized = sanitizeLLMOutput(rawResponse);
        const parsed = JSON.parse(sanitized);
        
        const validation = this.config.validate(parsed);
        
        if (validation.success) {
          this.logAttempt(attempt, rawResponse, parsed, null, performance.now() - startTime);
          return validation.data;
        }
        
        if (this.config.repairOnly) {
          throw new OutputPipelineError("Validation failed in repair-only mode", this.attemptLog);
        }
        
        // Prepare retry context
        const errorSummary = validation.errors?.join("\n") || "Unknown validation failure";
        currentPrompt = `${currentPrompt}\n\nYour previous response failed validation with these errors:\n${errorSummary}\n\nPlease return ONLY valid JSON. Do not include markdown fences or explanatory text.`;
        
        this.logAttempt(attempt, rawResponse, parsed, validation.errors, performance.now() - startTime);
        
      } catch (err) {
        if (err instanceof OutputPipelineError) throw err;
        
        this.logAttempt(attempt, rawResponse || "", null, [err instanceof Error ? err.message : "Parse failure"], performance.now() - startTime);
        
        if (attempt === this.config.maxAttempts) {
          throw new OutputPipelineError("All retry attempts exhausted", this.attemptLog);
        }
        
        currentPrompt = `${currentPrompt}\n\nYour previous response could not be parsed as JSON. Please return ONLY valid JSON. Do not include markdown fences.`;
      }
    }
    
    throw new OutputPipelineError("Pipeline execution completed without success", this.attemptLog);
  }

  private logAttempt(
    attempt: number,
    raw: string,
    parsed: unknown,
    errors: string[] | null,
    latency: number
  ) {
    this.attemptLog.push({ attempt, raw, parsed, errors, latency });
  }
}

class OutputPipelineError extends Error {
  constructor(message: string, public readonly history: AttemptRecord[]) {
    super(message);
    this.name = "OutputPipelineError";
  }
}

interface AttemptRecord {
  attempt: number;
  raw: string;
  parsed: unknown;
  errors: string[] | null;
  latency: number;
}

Architecture Decisions and Rationale

Separation of Repair and Validation Repair operates on string-level syntax. Validation operates on object-level semantics. Mixing these concerns creates unpredictable failure states. By isolating repair, you ensure that schema validators only process syntactically valid JSON, reducing false negatives.

Validator-Agnostic Contract The pipeline does not import or depend on any schema library. The validate function accepts a standardized interface. This design prevents vendor lock-in and allows teams to migrate between Zod, Ajv, or custom validators without rewriting orchestration logic.

Error-Driven Retry Prompts Models self-correct when given precise feedback. The retry prompt injects validation errors directly into the conversation context. This transforms the LLM from a blind generator into a self-debugging system. Generic retry prompts without error context yield significantly lower success rates.

Attempt History Tracking Every execution records raw output, parsed objects, validation errors, and latency. This data is critical for production monitoring, cost analysis, and debugging. Throwing errors with attached history ensures observability without external logging dependencies.

Pitfall Guide

1. Assuming Heuristic Repair Handles All Malformation

Explanation: The repair phase fixes common syntax patterns. It cannot reconstruct responses where the model outputs prose with embedded JSON fragments or completely ignores the requested structure. Fix: Implement a severity threshold. If repair fails to produce valid JSON, immediately trigger a retry with explicit structural instructions. Do not attempt complex string manipulation that risks data corruption.

2. Using Generic Validation Error Messages

Explanation: Returning messages like "invalid input" or "schema mismatch" provides insufficient signal for the model to self-correct. The model cannot infer which field failed or why. Fix: Map schema validation errors to human-readable, field-specific messages. Zod's .issues array naturally provides path and message data. Format these explicitly in the retry prompt.

3. Ignoring Context Window Limits During Retries

Explanation: Each retry appends error context to the prompt. Without monitoring, the conversation can exceed the model's context window, causing truncation or degraded performance. Fix: Track cumulative prompt length. Implement a retry budget that caps total token usage. Consider summarizing previous attempts or truncating older context when approaching limits.

4. Hardcoding Retry Counts Without Circuit Breakers

Explanation: Fixed retry limits can lead to excessive API costs and latency spikes when the model consistently fails due to prompt design flaws or capability mismatches. Fix: Implement dynamic retry logic based on error classification. Distinguish between syntax errors (retry) and semantic impossibilities (fail fast). Add cost-aware circuit breakers that halt execution after a budget threshold.

5. Skipping Attempt History for Production Monitoring

Explanation: Without structured attempt logs, debugging production failures requires guessing. You lose visibility into whether failures stem from syntax, schema, or model capability. Fix: Always attach attempt history to thrown errors. Integrate with your observability stack to track success rates, average retries, and latency distributions. Use this data to refine prompts and adjust retry budgets.

6. Applying the Pipeline to Streaming Outputs

Explanation: The repair-validate-retry architecture operates on complete responses. Streaming token-by-token validation breaks the repair phase and complicates retry orchestration. Fix: Use provider-native structured output modes for streaming scenarios. If streaming is mandatory, implement incremental validation that buffers complete JSON objects before schema checking, accepting higher complexity for lower latency.

7. Relying on the Pipeline to Fix Poor Prompts

Explanation: The pipeline handles post-generation enforcement. It cannot compensate for ambiguous system prompts, missing schema examples, or contradictory instructions. Fix: Treat the pipeline as a safety net, not a prompt replacement. Invest in prompt engineering: provide explicit JSON examples, specify field constraints, and forbid conversational filler. The pipeline should handle edge cases, not fundamental design flaws.

Production Bundle

Action Checklist

Define explicit JSON schema with field types, constraints, and examples in the system prompt
Implement heuristic repair covering markdown fences, trailing commas, and quote normalization
Create a validator function that returns standardized success/error shapes
Configure retry logic with explicit error injection into follow-up prompts
Attach attempt history to all pipeline errors for production debugging
Set maximum retry limits with cost-aware circuit breakers
Monitor success rates, average latency, and retry distribution in production
Validate pipeline behavior against edge cases: long responses, high temperature, context pressure

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Provider supports native structured output	Use provider-native mode	Highest compliance, lowest latency, no custom logic	Baseline
Multi-provider deployment or legacy models	Repair-validate-retry pipeline	Vendor-neutral, consistent enforcement across models	+15-25% (retry overhead)
Strict latency requirements (<200ms)	Repair-only mode	Fixes syntax without retry delay	+5% (repair compute)
Streaming token-by-streaming output	Provider-native streaming or incremental parser	Architecture mismatch with batch retry loop	+30% (complexity)
High-volume extraction with budget constraints	Circuit-broken retry with error classification	Prevents cost spikes on unfixable prompts	-20% (failed fast)

Configuration Template

import { z } from "zod";
import { OutputPipeline } from "./output-pipeline";

// 1. Define domain schema
const InvoiceSchema = z.object({
  invoiceId: z.string().uuid(),
  amount: z.number().positive(),
  currency: z.string().length(3),
  lineItems: z.array(z.object({
    description: z.string(),
    quantity: z.number().int().positive(),
    unitPrice: z.number().positive()
  }))
});

// 2. Create validator adapter
const validateInvoice = (input: unknown) => {
  const result = InvoiceSchema.safeParse(input);
  if (result.success) {
    return { success: true, data: result.data };
  }
  return {
    success: false,
    errors: result.error.issues.map(issue => 
      `Field '${issue.path.join('.')}' ${issue.message}`
    )
  };
};

// 3. Configure pipeline
const invoicePipeline = new OutputPipeline({
  generate: async (prompt) => {
    // Replace with your LLM client implementation
    return await llmClient.complete(prompt, { temperature: 0.2 });
  },
  validate: validateInvoice,
  maxAttempts: 3,
  repairOnly: false
});

// 4. Execute with structured prompt
const prompt = `Extract invoice details from the following text. Return ONLY valid JSON matching the schema. Do not include markdown fences or explanatory text.

Text: ${rawInvoiceText}`;

try {
  const invoice = await invoicePipeline.execute(prompt);
  console.log("Parsed invoice:", invoice);
} catch (err) {
  if (err instanceof OutputPipelineError) {
    console.error("Pipeline failed after", err.history.length, "attempts");
    err.history.forEach(attempt => {
      console.log(`Attempt ${attempt.attempt}: ${attempt.latency}ms | Errors: ${attempt.errors?.join(", ") || "None"}`);
    });
  }
}

Quick Start Guide

Install dependencies: Add zod (or your preferred schema library) and create the pipeline class from the core solution section.
Define your schema: Write a strict schema with explicit types, constraints, and examples. Include this in your system prompt alongside clear formatting instructions.
Wire your LLM client: Replace the generate function with your provider's API call. Set temperature ≤ 0.3 for structured extraction tasks.
Execute and monitor: Run the pipeline with your extraction prompt. Log attempt history on failure. Track success rates and adjust retry budgets based on production data.
Iterate on prompts: If retry rates exceed 30%, refine your system prompt with explicit JSON examples and stricter formatting rules. The pipeline should handle edge cases, not compensate for ambiguous instructions.

agentcast: Validate and Retry LLM JSON Responses Until They Match Your Schema