overestimating framework capabilities and underestimating the operational overhead of production approvals.
Core Solution
A production-ready HITL system must decouple the agent execution loop from human interaction. The architecture requires five distinct components: a durable state store, a typed request/response contract, a channel adapter interface, a verification middleware layer, and an idempotent resume handler. Below is a complete implementation pattern using TypeScript.
Step 1: Define Typed Contracts
Human approvals require strict schema validation on both sides. The agent emits a structured request; the human returns a structured response. Using Zod ensures runtime validation and TypeScript type inference.
import { z } from "zod";
export const ApprovalRequestSchema = z.object({
requestId: z.string().uuid(),
actionType: z.enum(["fund_transfer", "content_publish", "config_update"]),
payload: z.record(z.unknown()),
metadata: z.object({
initiatedBy: z.string(),
priority: z.enum(["low", "medium", "high"]),
expiresAt: z.coerce.date(),
}),
});
export const HumanResponseSchema = z.object({
requestId: z.string().uuid(),
decision: z.enum(["approved", "rejected", "modified"]),
comment: z.string().max(500).optional(),
modifiedPayload: z.record(z.unknown()).optional(),
reviewerId: z.string(),
});
export type ApprovalRequest = z.infer<typeof ApprovalRequestSchema>;
export type HumanResponse = z.infer<typeof HumanResponseSchema>;
Step 2: Build the Channel Adapter Interface
Channel routing must be abstracted. The approval engine should never contain Slack, email, or dashboard logic directly. Instead, it delegates to adapters that implement a unified interface.
export interface ApprovalChannel {
id: string;
sendRequest(request: ApprovalRequest): Promise<void>;
supportsPriority(priority: "low" | "medium" | "high"): boolean;
}
export class SlackApprovalChannel implements ApprovalChannel {
id = "slack";
async sendRequest(request: ApprovalRequest): Promise<void> {
const message = this.formatMessage(request);
await slackClient.chat.postMessage({
channel: "#ops-approvals",
blocks: message,
});
}
supportsPriority(priority: string): boolean {
return priority !== "low";
}
private formatMessage(req: ApprovalRequest) {
return [
{ type: "header", text: { type: "plain_text", text: `Approval Required: ${req.actionType}` } },
{ type: "section", text: { type: "mrkdwn", text: `*Request ID:* \`${req.requestId}\`\n*Priority:* ${req.metadata.priority}` } },
];
}
}
Step 3: Implement the Approval Orchestrator
The orchestrator manages state persistence, channel routing, verification, and resume logic. It writes the paused state to durable storage before yielding control back to the agent runtime.
import { PostgresStore } from "./postgres-store";
export class ApprovalOrchestrator {
constructor(
private store: PostgresStore,
private channels: ApprovalChannel[],
private verifier: ApprovalVerifier
) {}
async pauseForApproval(request: ApprovalRequest): Promise<void> {
await this.store.savePendingRequest(request);
const targetChannel = this.channels.find(c =>
c.supportsPriority(request.metadata.priority)
);
if (!targetChannel) {
throw new Error(`No channel available for priority: ${request.metadata.priority}`);
}
await targetChannel.sendRequest(request);
}
async submitResponse(response: HumanResponse): Promise<void> {
const pending = await this.store.getPendingRequest(response.requestId);
if (!pending) throw new Error("Request not found or already resolved");
const verified = await this.verifier.validate(response, pending);
if (!verified.isValid) {
await this.store.flagForReview(response.requestId, verified.reason);
return;
}
await this.store.markResolved(response.requestId, verified.processedPayload);
}
async resumeAgent(requestId: string): Promise<z.infer<typeof HumanResponseSchema> | null> {
const record = await this.store.getResolvedRequest(requestId);
if (!record) return null;
return record.response;
}
}
Step 4: Add Verification Middleware
Human responses must be validated before the agent resumes. This prevents accidental approvals, malformed payloads, or policy violations from propagating into the execution loop.
export class ApprovalVerifier {
async validate(
response: HumanResponse,
original: ApprovalRequest
): Promise<{ isValid: boolean; reason?: string; processedPayload: unknown }> {
if (response.decision === "approved") {
const schema = this.getSchemaForAction(original.actionType);
const parsed = schema.safeParse(original.payload);
if (!parsed.success) {
return { isValid: false, reason: "Payload schema mismatch", processedPayload: null };
}
return { isValid: true, processedPayload: parsed.data };
}
if (response.decision === "modified" && !response.modifiedPayload) {
return { isValid: false, reason: "Modified decision requires modifiedPayload", processedPayload: null };
}
return { isValid: true, processedPayload: response.modifiedPayload ?? original.payload };
}
private getSchemaForAction(type: string) {
const schemas: Record<string, z.ZodSchema> = {
fund_transfer: z.object({ amount: z.number().positive(), recipient: z.string() }),
content_publish: z.object({ url: z.string().url(), tags: z.array(z.string()) }),
config_update: z.object({ key: z.string(), value: z.unknown() }),
};
return schemas[type] ?? z.unknown();
}
}
Architecture Rationale
- Durable Storage First: The orchestrator writes to Postgres before routing to any channel. This guarantees that worker rotation, container restarts, or network partitions never lose pending approvals.
- Explicit Idempotency: Every request carries a UUID. The store uses
requestId as the primary key. Resume operations check resolution status before re-executing agent logic, preventing duplicate actions.
- Channel Decoupling: The
ApprovalChannel interface allows swapping Slack for email, SMS, or internal dashboards without touching the agent code. Priority routing ensures high-severity approvals reach the correct team.
- Verification Layer: Human input is treated as untrusted data. The verifier runs schema validation, policy checks, and payload sanitization before the agent resumes. This catches typos, malformed modifications, and unauthorized changes.
- Separation of Concerns: The agent loop only calls
pauseForApproval() and resumeAgent(). All infrastructure logic lives in the orchestrator, making the agent framework-agnostic and testable in isolation.
Pitfall Guide
1. Synchronous Blocking in Async Runtimes
Explanation: Using input(), await new Promise(resolve => setTimeout(...)), or blocking callbacks halts the event loop or thread. In containerized environments, this prevents health checks, blocks other requests, and causes orchestrators to kill the process.
Fix: Always yield control back to the runtime. Persist state, route to a channel, and return immediately. Resume only when an external webhook or poller triggers the continuation.
2. Implicit State Resumption (Double Execution)
Explanation: Several frameworks replay the entire step when an approval event arrives. Any side effects (API calls, database writes, LLM invocations) before the pause execute twice, causing duplicate charges or corrupted state.
Fix: Structure agent nodes as pure functions before the pause point. Place all side effects after the resume check. Use idempotency keys on all external calls and verify execution status before proceeding.
Explanation: Treating human responses as trusted data leads to schema mismatches, injection vulnerabilities, and policy violations. A reviewer clicking "approve" on a malformed payload can crash downstream services.
Fix: Implement a verification middleware that validates the response against the original request schema, checks business rules, and sanitizes modified payloads before resuming the agent.
4. Hardcoded Channel Logic
Explanation: Embedding Slack, email, or dashboard logic directly into the agent loop creates tight coupling. Changing routing rules requires redeploying the agent. Adding new channels breaks existing code.
Fix: Abstract channel routing behind an interface. Use configuration-driven routing based on priority, action type, or team ownership. Keep the agent loop unaware of delivery mechanisms.
5. Missing Idempotency Tokens
Explanation: Without explicit request IDs, retries and duplicate webhook deliveries cause multiple approvals for the same action. The agent cannot distinguish between a fresh request and a duplicate.
Fix: Generate UUIDs at request creation. Store them as primary keys. Reject or ignore submissions with duplicate or expired IDs. Log all resolution attempts for audit trails.
6. In-Memory State Assumptions
Explanation: Storing pending approvals in process memory or local variables works in development but fails during pod restarts, scaling events, or crashes. The agent loses context and cannot resume.
Fix: Use external durable storage (Postgres, Redis, or managed workflow engines). Serialize the full request payload, metadata, and channel routing state. Verify persistence before yielding control.
7. Ignoring Timeout & Escalation Paths
Explanation: Approvals left pending indefinitely block agent execution. Without expiration or escalation, critical workflows stall, and reviewers miss time-sensitive requests.
Fix: Attach expiresAt timestamps to all requests. Implement background jobs that flag expired approvals, trigger escalation channels, or auto-reject based on policy. Log timeout events for operational visibility.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small team, low approval volume | Framework-native pause + custom webhook | Minimal infrastructure overhead; fast to ship | Low (developer time) |
| High-volume, multi-channel routing | Custom Approval Orchestrator + Postgres | Decouples channels, ensures durability, scales horizontally | Medium (storage + routing service) |
| Strict compliance & audit requirements | External workflow engine (Temporal/Restate) | Built-in idempotency, replay, and audit logging | High (runtime licensing/ops) |
| Multi-agent coordination | State machine orchestrator + shared approval store | Prevents race conditions, centralizes policy enforcement | Medium-High (orchestration layer) |
Configuration Template
approval_engine:
storage:
type: postgres
connection_string: "${DB_URL}"
table: approval_requests
ttl_hours: 24
routing:
default_channel: slack
priority_map:
low: email
medium: slack
high: pagerduty
team_routing:
fund_transfer: "#finance-ops"
content_publish: "#editorial"
config_update: "#platform-eng"
verification:
enabled: true
schema_strictness: strict
allow_modifications: true
max_comment_length: 500
escalation:
auto_reject_after_hours: 48
notify_on_expiry: true
escalation_channel: "#ops-alerts"
Quick Start Guide
- Initialize the storage layer: Run the provided migration script to create the
approval_requests table with columns for request_id, payload, status, channel, expires_at, and resolved_at.
- Deploy the orchestrator service: Containerize the
ApprovalOrchestrator class. Expose a /submit endpoint for webhook deliveries and a /resume/{requestId} endpoint for agent polling.
- Configure channel adapters: Register your Slack, email, or dashboard adapters in the orchestrator configuration. Map priorities and action types to target channels.
- Integrate with your agent: Replace synchronous pause calls with
await orchestrator.pauseForApproval(request). After resuming, call await orchestrator.resumeAgent(requestId) and inject the verified payload into the agent state.
- Validate with chaos testing: Simulate worker restarts, duplicate webhook deliveries, and expired requests. Verify that the agent resumes exactly once, rejects malformed inputs, and escalates stalled approvals.