Phase 1: Define the Output Contract
Start with a strict schema that maps directly to your downstream data model. Use a validation library that supports runtime checking and TypeScript type inference. The schema should enforce types, enums, required fields, and length constraints. Avoid optional fields unless the business logic explicitly permits missing data.
import { z } from 'zod';
export const PolicyChangeSchema = z.object({
policy_id: z.string().uuid(),
jurisdiction: z.enum(['federal', 'state', 'municipal', 'international']),
change_category: z.enum(['amendment', 'new_regulation', 'repeal', 'enforcement_guidance']),
affected_sectors: z.array(z.string()).min(1),
effective_date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
source_reference: z.string().url(),
confidence_score: z.number().min(0).max(1),
requires_human_review: z.boolean()
});
export type PolicyChange = z.infer<typeof PolicyChangeSchema>;
Phase 2: Pre-Filter Context Before Generation
Hallucination in domain-specific pipelines is rarely a model failure; it's a context pollution problem. Classical retrieval and relevance scoring must occur before the LLM receives any input. Filter documents by recency, jurisdictional match, and semantic similarity. Trim context to the minimum viable window.
interface ContextDocument {
id: string;
text: string;
relevance_score: number;
publication_date: string;
}
function curateContext(rawDocs: ContextDocument[], targetJurisdiction: string): string[] {
const cutoffDate = new Date();
cutoffDate.setFullYear(cutoffDate.getFullYear() - 2);
return rawDocs
.filter(doc => {
const isRecent = new Date(doc.publication_date) >= cutoffDate;
const isRelevant = doc.relevance_score > 0.75;
const matchesJurisdiction = doc.text.toLowerCase().includes(targetJurisdiction);
return isRecent && isRelevant && matchesJurisdiction;
})
.sort((a, b) => b.relevance_score - a.relevance_score)
.slice(0, 5)
.map(doc => doc.text);
}
Phase 3: Constrained Generation
Pass the curated context and the schema to the model using provider-native structured output APIs. These APIs enforce JSON schema compliance at the token level, preventing invalid structures from being generated. Configure the request to return only the structured payload, stripping any conversational filler.
import { OpenAI } from 'openai';
const client = new OpenAI();
async function generatePolicyUpdate(
context: string[],
query: string
): Promise<PolicyChange> {
const systemPrompt = `
You are a regulatory analysis engine. Extract policy changes from the provided context.
Output must strictly follow the defined JSON schema. Do not include explanations.
`;
const response = await client.chat.completions.create({
model: 'gpt-4o-2024-08-06',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: `Context:\n${context.join('\n---\n')}\n\nQuery: ${query}` }
],
response_format: { type: 'json_schema', json_schema: { name: 'policy_change', schema: PolicyChangeSchema } },
temperature: 0.1,
max_tokens: 1024
});
const rawOutput = response.choices[0].message.content;
if (!rawOutput) throw new Error('Empty model response');
const parsed = PolicyChangeSchema.parse(JSON.parse(rawOutput));
return parsed;
}
Phase 4: Schema-Driven Routing
Use schema fields to route outputs deterministically. The requires_human_review flag should be populated by the model based on explicit rules encoded in the system prompt, but validated against business thresholds. Route high-confidence, low-risk updates directly to ingestion pipelines. Queue flagged items for compliance review.
function routePolicyUpdate(update: PolicyChange): 'ingest' | 'review_queue' {
if (update.requires_human_review || update.confidence_score < 0.85) {
return 'review_queue';
}
if (update.change_category === 'repeal' || update.affected_sectors.length > 3) {
return 'review_queue';
}
return 'ingest';
}
Architecture Rationale:
- Schema as source of truth: The Zod schema drives validation, TypeScript types, and provider configuration. Changes propagate automatically across the stack.
- Context curation over prompt stuffing: Limiting input to high-signal documents reduces token cost, lowers latency, and prevents the model from hallucinating based on irrelevant noise.
- Deterministic routing: Business rules encoded in the schema and routing function replace probabilistic decision-making. The pipeline becomes auditable and version-controlled.
- Left-shifted validation: Validation occurs immediately after generation, before any downstream service touches the data. Failures are caught at the boundary, not in production databases.
Pitfall Guide
1. Schema Over-Engineering
Explanation: Teams add excessive nested objects, optional fields, and complex enums to capture every edge case. This increases token consumption, slows generation, and raises validation failure rates.
Fix: Start with a minimal viable schema. Add fields only when downstream systems explicitly require them. Use flat structures where possible. Reserve enums for closed sets; use strings for open-ended values.
2. Context Window Bloat
Explanation: Feeding entire regulatory documents or long conversation histories into the prompt dilutes signal-to-noise ratio. The model attends to irrelevant tokens, increasing hallucination probability and cost.
Fix: Implement a pre-generation filter that scores documents by recency, jurisdictional match, and semantic relevance. Cap context at 3β5 high-signal excerpts. Use chunking with overlap only when legal text requires contiguous clause preservation.
3. Treating Human Review as an Afterthought
Explanation: Review queues are often bolted on after compliance incidents. This creates race conditions where unvalidated data enters production before review completes.
Fix: Make review routing a first-class pipeline stage. The schema must include a requires_human_review boolean. Configure the model to set this flag based on explicit thresholds (low confidence, high-impact categories, conflicting sources). Route flagged items to a dedicated queue before database insertion.
4. Ignoring Schema Versioning
Explanation: Downstream systems evolve. Adding or removing fields breaks existing pipelines when schema changes aren't versioned or migrated.
Fix: Embed a schema_version field in every output. Maintain backward-compatible parsers that handle version deltas. Use feature flags to roll out schema changes incrementally. Never mutate a deployed schema without a migration strategy.
5. Blind Trust in Model Confidence
Explanation: Models can output high confidence scores even when extraction is incorrect. Confidence is a proxy for token probability, not factual accuracy.
Fix: Treat confidence scores as routing hints, not truth guarantees. Cross-validate high-impact fields against external sources when available. Implement a secondary validation step for critical domains (e.g., financial thresholds, safety classifications). Log confidence distributions to detect model drift.
6. Skipping Validation Gates
Explanation: Assuming provider-level schema enforcement eliminates the need for application-level validation. Provider APIs can still return malformed JSON under network stress or model updates.
Fix: Always validate output against your local schema library (Zod, Pydantic, etc.) before routing. Wrap generation calls in try/catch blocks that log raw responses and trigger fallback regeneration. Never trust network-layer guarantees.
7. Hardcoding Provider-Specific Syntax
Explanation: Tying implementation to OpenAI's response_format or Anthropic's tool use syntax creates vendor lock-in. Provider APIs change frequently.
Fix: Abstract the generation layer behind a provider-agnostic interface. Pass schemas as generic JSON objects. Implement fallback routing to alternative providers when primary APIs degrade. Keep provider-specific code isolated in adapter modules.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-volume, low-risk updates (e.g., minor policy clarifications) | Schema-first with auto-ingestion | Deterministic routing eliminates manual review; low error tolerance acceptable | Low (single model call, minimal compute) |
| High-stakes regulatory changes (e.g., enforcement guidance, repeals) | Schema-first + mandatory human review | Business risk outweighs latency cost; review queue prevents compliance gaps | Medium (review overhead, but avoids regulatory penalties) |
| Multi-jurisdictional monitoring with conflicting sources | Schema-first + confidence threshold routing | Conflicting data triggers requires_human_review; prevents false positives | Medium-High (additional validation steps, but reduces downstream correction costs) |
| Legacy system integration with rigid database schemas | Schema-first with adapter layer | Strict JSON mapping eliminates parsing failures; adapter handles field translation | Low (one-time adapter development, long-term maintenance savings) |
Configuration Template
// schema.config.ts
import { z } from 'zod';
export const RegulatoryUpdateSchema = z.object({
update_id: z.string().uuid(),
jurisdiction: z.enum(['federal', 'state', 'municipal', 'international']),
category: z.enum(['amendment', 'new_regulation', 'repeal', 'guidance']),
affected_entities: z.array(z.string()).min(1),
effective_date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
source_url: z.string().url(),
confidence: z.number().min(0).max(1),
review_required: z.boolean(),
schema_version: z.literal('1.0.0')
});
export type RegulatoryUpdate = z.infer<typeof RegulatoryUpdateSchema>;
// provider.adapter.ts
import { OpenAI } from 'openai';
import { RegulatoryUpdateSchema } from './schema.config';
export async function generateStructuredUpdate(
context: string[],
prompt: string
): Promise<RegulatoryUpdate> {
const client = new OpenAI();
const response = await client.chat.completions.create({
model: 'gpt-4o-2024-08-06',
messages: [
{ role: 'system', content: 'Extract regulatory changes. Output strictly matches JSON schema.' },
{ role: 'user', content: `Context:\n${context.join('\n---\n')}\n\nTask: ${prompt}` }
],
response_format: {
type: 'json_schema',
json_schema: { name: 'regulatory_update', schema: RegulatoryUpdateSchema }
},
temperature: 0.1,
max_tokens: 800
});
const raw = response.choices[0].message.content;
if (!raw) throw new Error('Model returned empty payload');
return RegulatoryUpdateSchema.parse(JSON.parse(raw));
}
Quick Start Guide
- Define your schema: Create a Zod or JSON Schema object that maps exactly to your downstream database or API contract. Include required fields, enums, and a
review_required boolean.
- Build a context filter: Write a function that scores incoming documents by recency, jurisdictional match, and relevance. Return only the top 3β5 excerpts.
- Configure structured generation: Use your provider's native schema enforcement API. Pass the curated context, set temperature to 0.1β0.2, and cap tokens to prevent verbose output.
- Validate and route: Parse the response through your local schema validator. Route outputs with
review_required: true or confidence below threshold to a human queue. Ingest the rest directly.
- Instrument monitoring: Log raw responses, validation results, confidence scores, and routing decisions. Set alerts for validation failure spikes or confidence distribution shifts.