agentcast: Validate and Retry LLM JSON Responses Until They Match Your Schema
Engineering Deterministic LLM Output: The Repair-Validate-Retry Architecture
Current Situation Analysis
Large language models are fundamentally probabilistic text generators. When developers request structured data, they typically append a JSON schema to the system prompt and instruct the model to respond strictly in that format. In controlled testing environments with short prompts and low temperature settings, this approach appears reliable. The model complies, the parser succeeds, and the pipeline moves forward.
Production traffic exposes a different reality. Under real-world conditions, approximately one in five LLM responses fails strict JSON parsing or schema validation. These failures are not random noise. They follow predictable patterns tied to generation length, context window pressure, and temperature variance.
The most common failure modes include:
- Trailing commas in object or array literals, which cause native
JSON.parseto throw - Markdown code fences wrapping the output, which breaks parsers expecting raw JSON
- Type coercion mismatches, where the model returns a string representation of a number or boolean
- Prose contamination, where conversational filler surrounds the structured payload
These patterns correlate directly with operational conditions. Trailing commas spike when response length increases. Markdown fences appear frequently when the system prompt consumes a significant portion of the context window. Type mismatches occur when the model prioritizes semantic fluency over syntactic rigidity.
The problem is routinely overlooked because developers treat LLM output as a black box. They assume that if the prompt contains a schema, the model will respect it. In reality, the model optimizes for token probability, not JSON compliance. Without an explicit enforcement layer, structured extraction becomes a game of statistical luck rather than engineering certainty.
WOW Moment: Key Findings
The shift from probabilistic hope to deterministic control comes from implementing a dedicated output enforcement loop. The following comparison illustrates the operational impact of three common approaches to LLM structured output.
| Approach | Parse Success Rate | Avg Latency (ms) | Schema Compliance | Implementation Complexity |
|---|---|---|---|---|
| Raw LLM Output | ~80% | 450 | Low (depends on prompt) | Minimal |
| Heuristic Repair + Retry | ~96% | 620 | High (enforced) | Moderate |
| Provider-Native Structured Output | ~99% | 480 | Very High | Low (if supported) |
The repair-validate-retry architecture bridges the gap between raw generation and native enforcement. By applying lightweight syntax correction first, then validating against a schema, and finally feeding validation errors back into the model for self-correction, teams achieve near-native compliance without vendor lock-in.
This finding matters because it decouples output reliability from model capability. You no longer need to wait for every provider to implement structured output modes, nor do you need to accept a 20% failure rate in production. The loop transforms inconsistent model behavior into a predictable, observable, and debuggable pipeline.
Core Solution
The architecture consists of three distinct phases: repair, validation, and retry orchestration. Each phase serves a specific purpose and must remain isolated to maintain predictability.
Phase 1: Heuristic Repair
LLM outputs frequently contain syntax errors that are trivial to fix but fatal to parsers. The repair phase applies deterministic transformations before any schema checking occurs. Common operations include:
- Stripping markdown code fences (
```jsonand```) - Removing trailing commas before closing braces or brackets
- Converting single quotes to double quotes where appropriate
- Normalizing whitespace and line breaks
This phase is intentionally lightweight. It does not attempt to reconstruct severely malformed output. If the repair step cannot produce valid JSON, the pipeline proceeds to validation failure handling.
Phase 2: Schema Validation
Once syntactically valid JSON is obtained, it must be validated against your domain schema. The validation layer should remain completely decoupled from the pipeline logic. This allows you to swap schema libraries (Zod, Ajv, custom validators) without modifying the orchestration code.
The validator receives a plain JavaScript object and returns a standardized result shape. This contract ensures the retry orchestrator can interpret success or failure uniformly.
Phase 3: Retry Orchestration
When validation fails, the pipeline must decide whether to retry. Successful retries require the model to understand exactly what went wrong. The orchestrator formats validation errors into a follow-up prompt, appends it to the conversation history, and re-invokes the LLM.
The retry prompt should include:
- The original validation errors in human-readable form
- An explicit instruction to return only valid JSON
- A reminder to avoid markdown fences or prose
Models self-correct effectively when given precise error context. Generic failure messages like "invalid output" provide insufficient signal for correction.
Implementation Architecture
The following TypeScript implementation demonstrates the complete pipeline. The design prioritizes explicit contracts, observable state, and vendor neutrality.
import { z } from "zod";
// Domain schema definition
const UserProfileSchema = z.object({
fullName: z.string().min(2),
yearsExperience: z.number().int().min(0),
primaryRole: z.enum(["frontend", "backend", "devops", "data"]),
contactEmail: z.string().email()
});
// Standardized validation contract
interface ValidationResult<T> {
success: boolean;
data?: T;
errors?: string[];
}
// Core pipeline configuration
interface PipelineConfig {
generate: (prompt: string) => Promise<string>;
validate: (parsed: unknown) => ValidationResult<unknown>;
maxAttempts: number;
repairOnly?: boolean;
}
// Syntax repair utilities
function sanitizeLLMOutput(raw: string): string {
let cleaned = raw.trim();
// Remove markdown code fences
cleaned = cleaned.replace(/^```(?:json)?\s*/i, "").replace(/\s*```$/i, "");
// Fix trailing commas before closing brackets/braces
cleaned = cleaned.replace(/,(\s*[}\]])/g, "$1");
// Normalize single quotes to double quotes (basic heuristic)
cleaned = cleaned.replace(/'/g, '"');
return cleaned;
}
// Main pipeline class
class OutputPipeline {
private config: PipelineConfig;
private attemptLog: AttemptRecord[] = [];
constructor(config: PipelineConfig) {
this.config = config;
}
async execute(initialPrompt: string): Promise<unknown> {
let currentPrompt = initialPrompt;
for (let attempt = 1; attempt <= this.config.maxAttempts; attempt++) {
const startTime = performance.now();
try {
const rawResponse = await this.config.generate(currentPrompt);
const sanitized = sanitizeLLMOutput(rawResponse);
const parsed = JSON.parse(sanitized);
const validation = this.config.validate(parsed);
if (validation.success) {
this.logAttempt(attempt, rawResponse, parsed, null, performance.now() - startTime);
return validation.data;
}
if (this.config.repairOnly) {
throw new OutputPipelineError("Validation failed in repair-only mode", this.attemptLog);
}
// Prepare retry context
const errorSummary = validation.errors?.join("\n") || "Unknown validation failure";
currentPrompt = `${currentPrompt}\n\nYour previous response failed validation with these errors:\n${errorSummary}\n\nPlease return ONLY valid JSON. Do not include markdown fences or explanatory text.`;
this.logAttempt(attempt, rawResponse, parsed, validation.errors, performance.now() - startTime);
} catch (err) {
if (err instanceof OutputPipelineError) throw err;
this.logAttempt(attempt, rawResponse || "", null, [err instanceof Error ? err.message : "Parse failure"], performance.now() - startTime);
if (attempt === this.config.maxAttempts) {
throw new OutputPipelineError("All retry attempts exhausted", this.attemptLog);
}
currentPrompt = `${currentPrompt}\n\nYour previous response could not be parsed as JSON. Please return ONLY valid JSON. Do not include markdown fences.`;
}
}
throw new OutputPipelineError("Pipeline execution completed without success", this.attemptLog);
}
private logAttempt(
attempt: number,
raw: string,
parsed: unknown,
errors: string[] | null,
latency: number
) {
this.attemptLog.push({ attempt, raw, parsed, errors, latency });
}
}
class OutputPipelineError extends Error {
constructor(message: string, public readonly history: AttemptRecord[]) {
super(message);
this.name = "OutputPipelineError";
}
}
interface AttemptRecord {
attempt: number;
raw: string;
parsed: unknown;
errors: string[] | null;
latency: number;
}
Architecture Decisions and Rationale
Separation of Repair and Validation Repair operates on string-level syntax. Validation operates on object-level semantics. Mixing these concerns creates unpredictable failure states. By isolating repair, you ensure that schema validators only process syntactically valid JSON, reducing false negatives.
Validator-Agnostic Contract
The pipeline does not import or depend on any schema library. The validate function accepts a standardized interface. This design prevents vendor lock-in and allows teams to migrate between Zod, Ajv, or custom validators without rewriting orchestration logic.
Error-Driven Retry Prompts Models self-correct when given precise feedback. The retry prompt injects validation errors directly into the conversation context. This transforms the LLM from a blind generator into a self-debugging system. Generic retry prompts without error context yield significantly lower success rates.
Attempt History Tracking Every execution records raw output, parsed objects, validation errors, and latency. This data is critical for production monitoring, cost analysis, and debugging. Throwing errors with attached history ensures observability without external logging dependencies.
Pitfall Guide
1. Assuming Heuristic Repair Handles All Malformation
Explanation: The repair phase fixes common syntax patterns. It cannot reconstruct responses where the model outputs prose with embedded JSON fragments or completely ignores the requested structure. Fix: Implement a severity threshold. If repair fails to produce valid JSON, immediately trigger a retry with explicit structural instructions. Do not attempt complex string manipulation that risks data corruption.
2. Using Generic Validation Error Messages
Explanation: Returning messages like "invalid input" or "schema mismatch" provides insufficient signal for the model to self-correct. The model cannot infer which field failed or why.
Fix: Map schema validation errors to human-readable, field-specific messages. Zod's .issues array naturally provides path and message data. Format these explicitly in the retry prompt.
3. Ignoring Context Window Limits During Retries
Explanation: Each retry appends error context to the prompt. Without monitoring, the conversation can exceed the model's context window, causing truncation or degraded performance. Fix: Track cumulative prompt length. Implement a retry budget that caps total token usage. Consider summarizing previous attempts or truncating older context when approaching limits.
4. Hardcoding Retry Counts Without Circuit Breakers
Explanation: Fixed retry limits can lead to excessive API costs and latency spikes when the model consistently fails due to prompt design flaws or capability mismatches. Fix: Implement dynamic retry logic based on error classification. Distinguish between syntax errors (retry) and semantic impossibilities (fail fast). Add cost-aware circuit breakers that halt execution after a budget threshold.
5. Skipping Attempt History for Production Monitoring
Explanation: Without structured attempt logs, debugging production failures requires guessing. You lose visibility into whether failures stem from syntax, schema, or model capability. Fix: Always attach attempt history to thrown errors. Integrate with your observability stack to track success rates, average retries, and latency distributions. Use this data to refine prompts and adjust retry budgets.
6. Applying the Pipeline to Streaming Outputs
Explanation: The repair-validate-retry architecture operates on complete responses. Streaming token-by-token validation breaks the repair phase and complicates retry orchestration. Fix: Use provider-native structured output modes for streaming scenarios. If streaming is mandatory, implement incremental validation that buffers complete JSON objects before schema checking, accepting higher complexity for lower latency.
7. Relying on the Pipeline to Fix Poor Prompts
Explanation: The pipeline handles post-generation enforcement. It cannot compensate for ambiguous system prompts, missing schema examples, or contradictory instructions. Fix: Treat the pipeline as a safety net, not a prompt replacement. Invest in prompt engineering: provide explicit JSON examples, specify field constraints, and forbid conversational filler. The pipeline should handle edge cases, not fundamental design flaws.
Production Bundle
Action Checklist
- Define explicit JSON schema with field types, constraints, and examples in the system prompt
- Implement heuristic repair covering markdown fences, trailing commas, and quote normalization
- Create a validator function that returns standardized success/error shapes
- Configure retry logic with explicit error injection into follow-up prompts
- Attach attempt history to all pipeline errors for production debugging
- Set maximum retry limits with cost-aware circuit breakers
- Monitor success rates, average latency, and retry distribution in production
- Validate pipeline behavior against edge cases: long responses, high temperature, context pressure
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Provider supports native structured output | Use provider-native mode | Highest compliance, lowest latency, no custom logic | Baseline |
| Multi-provider deployment or legacy models | Repair-validate-retry pipeline | Vendor-neutral, consistent enforcement across models | +15-25% (retry overhead) |
| Strict latency requirements (<200ms) | Repair-only mode | Fixes syntax without retry delay | +5% (repair compute) |
| Streaming token-by-streaming output | Provider-native streaming or incremental parser | Architecture mismatch with batch retry loop | +30% (complexity) |
| High-volume extraction with budget constraints | Circuit-broken retry with error classification | Prevents cost spikes on unfixable prompts | -20% (failed fast) |
Configuration Template
import { z } from "zod";
import { OutputPipeline } from "./output-pipeline";
// 1. Define domain schema
const InvoiceSchema = z.object({
invoiceId: z.string().uuid(),
amount: z.number().positive(),
currency: z.string().length(3),
lineItems: z.array(z.object({
description: z.string(),
quantity: z.number().int().positive(),
unitPrice: z.number().positive()
}))
});
// 2. Create validator adapter
const validateInvoice = (input: unknown) => {
const result = InvoiceSchema.safeParse(input);
if (result.success) {
return { success: true, data: result.data };
}
return {
success: false,
errors: result.error.issues.map(issue =>
`Field '${issue.path.join('.')}' ${issue.message}`
)
};
};
// 3. Configure pipeline
const invoicePipeline = new OutputPipeline({
generate: async (prompt) => {
// Replace with your LLM client implementation
return await llmClient.complete(prompt, { temperature: 0.2 });
},
validate: validateInvoice,
maxAttempts: 3,
repairOnly: false
});
// 4. Execute with structured prompt
const prompt = `Extract invoice details from the following text. Return ONLY valid JSON matching the schema. Do not include markdown fences or explanatory text.
Text: ${rawInvoiceText}`;
try {
const invoice = await invoicePipeline.execute(prompt);
console.log("Parsed invoice:", invoice);
} catch (err) {
if (err instanceof OutputPipelineError) {
console.error("Pipeline failed after", err.history.length, "attempts");
err.history.forEach(attempt => {
console.log(`Attempt ${attempt.attempt}: ${attempt.latency}ms | Errors: ${attempt.errors?.join(", ") || "None"}`);
});
}
}
Quick Start Guide
- Install dependencies: Add
zod(or your preferred schema library) and create the pipeline class from the core solution section. - Define your schema: Write a strict schema with explicit types, constraints, and examples. Include this in your system prompt alongside clear formatting instructions.
- Wire your LLM client: Replace the
generatefunction with your provider's API call. Set temperature β€ 0.3 for structured extraction tasks. - Execute and monitor: Run the pipeline with your extraction prompt. Log attempt history on failure. Track success rates and adjust retry budgets based on production data.
- Iterate on prompts: If retry rates exceed 30%, refine your system prompt with explicit JSON examples and stricter formatting rules. The pipeline should handle edge cases, not compensate for ambiguous instructions.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
