lution
Building a production-grade AI workflow requires enforcing a strict separation between perception and execution. The architecture follows a linear pipeline: inference β structured proposal β deterministic validation β policy enforcement β execution β audit. Each stage owns a specific responsibility, and no stage bypasses the next.
Step 1: Define the Inference Boundary
The model's sole responsibility is to convert unstructured input into a structured candidate action. It does not check permissions, verify resource existence, or calculate business thresholds. It outputs a proposal.
interface TierUpgradeProposal {
proposal_id: string;
customer_id: string;
current_tier: 'basic' | 'standard' | 'premium';
target_tier: 'basic' | 'standard' | 'premium';
effective_date: string;
justification: string;
metadata: Record<string, unknown>;
}
Step 2: Enforce Structured Output Contracts
Raw text generation is unsuitable for transactional systems. Use constrained decoding or JSON schema validation to guarantee output shape. OpenAI's Structured Outputs and equivalent vendor features apply deterministic grammar constraints during token sampling, eliminating schema drift.
import { z } from 'zod';
const TierUpgradeSchema = z.object({
proposal_id: z.string().uuid(),
customer_id: z.string().min(1),
current_tier: z.enum(['basic', 'standard', 'premium']),
target_tier: z.enum(['basic', 'standard', 'premium']),
effective_date: z.string().datetime(),
justification: z.string().min(10).max(500),
metadata: z.record(z.unknown()).optional()
});
type ValidatedProposal = z.infer<typeof TierUpgradeSchema>;
Step 3: Build the Deterministic Validator
Validation logic must reside entirely in application code. It checks business rules, permissions, resource state, and idempotency. The validator returns a decision object, never a side effect.
class TierUpgradeValidator {
constructor(
private readonly billingRepo: BillingRepository,
private readonly policyEngine: PolicyEngine,
private readonly idempotencyStore: IdempotencyStore
) {}
async validate(proposal: ValidatedProposal): Promise<ValidationResult> {
// 1. Idempotency check
const existing = await this.idempotencyStore.get(proposal.proposal_id);
if (existing) return { status: 'rejected', reason: 'duplicate_proposal' };
// 2. Resource existence & state
const customer = await this.billingRepo.getCustomer(proposal.customer_id);
if (!customer) return { status: 'rejected', reason: 'customer_not_found' };
if (customer.current_tier !== proposal.current_tier) {
return { status: 'rejected', reason: 'tier_mismatch' };
}
// 3. Policy & threshold enforcement
const policyCheck = await this.policyEngine.evaluate({
action: 'tier_upgrade',
customer_id: proposal.customer_id,
target_tier: proposal.target_tier,
effective_date: proposal.effective_date
});
if (!policyCheck.allowed) {
return { status: 'rejected', reason: policyCheck.violation };
}
// 4. Approval routing
const requiresApproval = this.policyEngine.needsHumanReview(proposal);
return {
status: requiresApproval ? 'pending_approval' : 'approved',
proposal,
audit_trail: { validated_at: new Date().toISOString(), validator_version: '1.2.0' }
};
}
}
Step 4: Orchestrate State Transitions
The workflow engine owns the state machine. It routes proposals based on validation outcomes, handles retries, manages dead-letter queues, and persists audit logs. The model never interacts with this layer directly.
class WorkflowOrchestrator {
async execute(proposal: ValidatedProposal): Promise<ExecutionResult> {
const validation = await this.validator.validate(proposal);
switch (validation.status) {
case 'rejected':
await this.auditLogger.logRejection(proposal, validation.reason);
return { status: 'failed', error: validation.reason };
case 'pending_approval':
await this.approvalQueue.enqueue(proposal);
await this.auditLogger.logPending(proposal);
return { status: 'awaiting_review' };
case 'approved':
const execution = await this.billingService.applyUpgrade(proposal);
await this.idempotencyStore.set(proposal.proposal_id, execution);
await this.auditLogger.logExecution(proposal, execution);
return { status: 'completed', transaction_id: execution.id };
}
}
}
Tools exposed to the model must follow the principle of least privilege. Each tool should be narrow, explicitly permissioned, and designed for read-only or dry-run operations unless wrapped by the validation layer. OpenAI's agent guidance emphasizes risk assessment based on write access, reversibility, and financial impact. High-risk tools require human escalation or sandboxed execution.
Step 6: Add Observability & Tracing
Every pipeline execution must generate a deterministic trace linking the original input, model proposal, validation decisions, policy evaluations, and final state change. Without this, debugging becomes speculative. Use structured logging with correlation IDs, and integrate with distributed tracing systems to capture latency, rejection rates, and approval bottlenecks.
Architecture Decisions & Rationale
- Why separate inference from execution? LLMs optimize for token likelihood, not business correctness. Deterministic code optimizes for invariants. Separation allows independent scaling, testing, and auditing.
- Why enforce schema validation? Unstructured output introduces parsing fragility. Constrained decoding eliminates shape drift and reduces validation overhead.
- Why externalize state? Model context windows are volatile and expensive. State machines provide durability, concurrency control, and rollback capabilities.
- Why require explicit approval paths? High-risk actions (financial adjustments, account modifications, compliance changes) demand human oversight. Automated routing ensures consistent policy application without blocking low-risk operations.
Pitfall Guide
1. Schema Drift in Model Output
Explanation: Foundation models occasionally omit fields, change casing, or return nested structures that break downstream parsers. This is especially common when prompts are modified or models are upgraded.
Fix: Implement strict JSON schema validation with fallback routing. Version your schemas, and maintain a compatibility layer that normalizes minor structural variations before validation.
2. Implicit State Ownership by the Model
Explanation: Developers sometimes rely on model memory or conversation history to track counters, statuses, or multi-step progress. Context windows truncate, tokens expire, and state becomes inconsistent.
Fix: Externalize all state to a durable store. Use explicit state machines or workflow engines. Pass only necessary context to the model; never assume it remembers prior steps.
Explanation: Granting models broad tool permissions (e.g., "update any record", "send any email") creates attack surfaces and violates least-privilege principles. Prompt injection can manipulate tool parameters.
Fix: Scope tools to specific resources and actions. Implement parameter validation at the tool boundary. Use dry-run modes for destructive operations, and require explicit approval for high-impact calls.
4. Validation Bypass via Context Pollution
Explanation: User inputs or retrieved documents may contain instructions that override validation logic if the model is tasked with both interpretation and rule enforcement.
Fix: Isolate validation from inference. Never pass business rules to the model for execution. Keep policy enforcement in application code, and sanitize inputs before they reach the validation layer.
5. Silent Failure Routing
Explanation: Invalid proposals are sometimes dropped or logged without alerting, creating blind spots in production monitoring. Teams discover failures only after customer complaints or financial discrepancies.
Fix: Implement explicit error states with dead-letter queues. Route rejections to monitoring dashboards, and set up alerts for anomaly spikes in rejection rates or validation latency.
6. Over-Engineering the Validator
Explanation: Teams attempt to encode every edge case into the validation layer, creating brittle rule engines that require constant maintenance. Some try to replace validation with secondary model calls.
Fix: Start with deterministic rule checks. Use secondary models only for ambiguous classification tasks, never for policy enforcement. Maintain a rule registry with version control and automated testing.
7. Ignoring Idempotency
Explanation: Network retries, workflow engine restarts, or duplicate user requests can trigger the same proposal multiple times, causing double charges, duplicate notifications, or state corruption.
Fix: Generate idempotency keys at the proposal stage. Store execution results keyed by these IDs. Ensure all downstream APIs support idempotent operations, and validate against existing records before execution.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Low-risk classification (e.g., ticket routing) | Direct model inference + lightweight validation | Speed outweighs strict governance; failures are recoverable | Low |
| Medium-risk state change (e.g., tier upgrade) | Proposal-validation pipeline with automated policy checks | Balances automation with deterministic safeguards; reduces incident rate | Moderate |
| High-risk transaction (e.g., refund, contract modification) | Proposal-validation + mandatory human approval + dry-run simulation | Compliance requirements; financial exposure demands explicit oversight | High |
| Multi-step workflow with external dependencies | State machine orchestration with scoped tools & idempotency | Handles retries, partial failures, and concurrency safely | Moderate-High |
Configuration Template
# workflow-config.yaml
pipeline:
inference:
model: "gpt-4o-2024-08-06"
temperature: 0.2
max_tokens: 1024
schema_version: "v2.1"
constrained_decoding: true
validation:
idempotency_ttl: "24h"
policy_engine: "internal-policy-service"
approval_threshold: "high_risk"
retry_policy:
max_attempts: 3
backoff: "exponential"
jitter: true
observability:
tracing: "otel"
audit_log_retention: "90d"
alerting:
rejection_rate_threshold: 0.15
validation_latency_p99: "2s"
dead_letter_queue: "dlq-ai-proposals"
Quick Start Guide
- Define your proposal schema: Create a Zod or JSON Schema definition that captures only the data the model needs to propose an action. Exclude permissions, pricing, or state checks.
- Implement the validator: Write a TypeScript class that checks resource existence, applies business rules, verifies permissions, and generates idempotency keys. Return explicit status objects (
approved, rejected, pending_approval).
- Wire the workflow engine: Use a lightweight state machine (e.g., XState, Temporal, or a custom queue) to route proposals based on validation outcomes. Add audit logging at each transition.
- Deploy with observability: Instrument correlation IDs, enable structured logging, and configure alerts for rejection spikes and validation latency. Test with synthetic failure scenarios before production rollout.