low is a step-by-step implementation strategy.
1. Define the Risk Ladder
Risk classification must be deterministic. Relying on the LLM to self-assess danger introduces hallucination into the security boundary. Instead, map every available tool or action to a predefined risk tier based on impact, reversibility, and data sensitivity.
export enum RiskTier {
READ_ONLY = 0,
DRAFT_ONLY = 1,
LOW_IMPACT_WRITE = 2,
EXTERNAL_COMMUNICATION = 3,
FINANCIAL_OR_PERMISSION = 4,
DESTRUCTIVE_OR_CROSS_TENANT = 5
}
export interface ActionDefinition {
toolName: string;
tier: RiskTier;
requiresIdempotency: boolean;
maxRecordsAffected: number;
}
2. Implement the Policy Decision Point (PDP)
The PDP evaluates proposed actions against business rules. It should never query the model for safety judgments. Policies are evaluated deterministically using action metadata, tenant context, payload constraints, and source attribution.
export type PolicyVerdict = 'ALLOW' | 'REVIEW_REQUIRED' | 'DENY' | 'ESCALATE';
export interface ActionIntent {
toolName: string;
tenantId: string;
sourceContext: 'internal' | 'customer_email' | 'webhook' | 'external_api';
payload: Record<string, unknown>;
estimatedFinancialImpact?: number;
recordCount?: number;
isReversible: boolean;
}
export class PolicyEngine {
evaluate(intent: ActionIntent, actionDef: ActionDefinition): PolicyVerdict {
if (actionDef.tier === RiskTier.DESTRUCTIVE_OR_CROSS_TENANT) {
return 'ESCALATE';
}
if (intent.recordCount && intent.recordCount > actionDef.maxRecordsAffected) {
return 'ESCALATE';
}
if (actionDef.tier >= RiskTier.FINANCIAL_OR_PERMISSION) {
return 'REVIEW_REQUIRED';
}
if (intent.sourceContext !== 'internal' && !intent.isReversible) {
return 'REVIEW_REQUIRED';
}
if (actionDef.tier <= RiskTier.LOW_IMPACT_WRITE) {
return 'ALLOW';
}
return 'ALLOW';
}
}
Architectural Rationale: Deterministic evaluation eliminates model drift from security decisions. The engine operates on structured metadata, making policies auditable, testable, and version-controlled. Source attribution (sourceContext) is critical for detecting confused deputy attacks where external content attempts to trigger internal actions.
3. Decouple Planning from Execution
Agents should generate ActionIntent objects, never execute tools directly. The execution path flows through the control plane:
Agent generates intent β Policy Engine evaluates β Approval Ticket created (if needed) β Human/System approves β Execution Broker runs with scoped credentials β Audit log written
This separation ensures the agent never holds long-lived administrative privileges. It also enables payload freezing: the exact data reviewed during approval is what gets executed, preventing post-approval mutation.
4. Design the Approval Artifact
Review interfaces must prevent rubber-stamping. Provide concise, structured context rather than raw chain-of-thought dumps. Reviewers need impact clarity, source traceability, and rollback options.
export interface ApprovalTicket {
ticketId: string;
tenantId: string;
riskTier: RiskTier;
actionType: string;
plainLanguageSummary: string;
sourceEvidence: Array<{ type: string; id: string; url?: string }>;
frozenPayload: Record<string, unknown>;
reversibilityStatus: 'full' | 'partial' | 'none';
expiresAt: Date;
status: 'PENDING' | 'APPROVED' | 'REJECTED' | 'EXPIRED';
}
5. Deploy the Execution Broker with Scoped Credentials
The broker is the only component permitted to invoke production APIs. It validates approval state, checks for state drift, and executes with short-lived, scoped tokens.
export class ExecutionBroker {
async executeApprovedAction(ticketId: string, approverId: string) {
const ticket = await this.ticketStore.findById(ticketId);
if (!ticket || ticket.status !== 'APPROVED') {
throw new Error('Invalid or unapproved ticket');
}
if (ticket.expiresAt < new Date()) {
throw new Error('Approval expired');
}
// Re-validate current state to prevent drift exploitation
const currentState = await this.stateValidator.check(ticket.tenantId, ticket.frozenPayload);
if (!currentState.isValid) {
throw new Error('Target state changed since approval; execution halted');
}
// Issue short-lived scoped token
const scopedToken = await this.tokenIssuer.generate({
tenantId: ticket.tenantId,
actionType: ticket.actionType,
ttlSeconds: 60,
approverId
});
// Execute exact frozen payload
const result = await this.toolRunner.invoke(ticket.actionType, ticket.frozenPayload, scopedToken);
await this.auditLogger.record({
ticketId,
approverId,
actionType: ticket.actionType,
resultStatus: result.success ? 'SUCCESS' : 'FAILED',
executedAt: new Date()
});
return result;
}
}
Architectural Rationale: Short-lived tokens limit blast radius. State re-validation prevents attackers from exploiting the pause window to modify underlying records. Executing the frozen payload guarantees the reviewed action matches the executed action. Idempotency keys (handled internally by toolRunner) prevent duplicate execution on retries.
Pitfall Guide
1. Model-Dependent Risk Scoring
Explanation: Asking the LLM to classify its own action as safe or dangerous. Models are probabilistic and can be influenced by prompt context, leading to inconsistent security decisions.
Fix: Route all risk classification through a deterministic policy engine. Use the model only for intent extraction and payload generation, never for security judgment.
2. Long-Lived Agent Credentials
Explanation: Granting the agent permanent admin or service account tokens. If the agent is compromised or tricked, the approval gate becomes decorative because the agent can bypass it by calling APIs directly.
Fix: Implement an execution broker that issues short-lived, scoped tokens only after policy approval. The agent should never receive or store production credentials.
3. Ignoring State Drift During Pauses
Explanation: Resuming execution after human approval without re-checking underlying data. Records may have been modified, deleted, or permission levels revoked during the approval window.
Fix: Implement a pre-execution validation step that re-fetches critical records, verifies permissions, and confirms the frozen payload still aligns with current state. Halt execution if drift is detected.
4. Rubber-Stamp UI Design
Explanation: Overwhelming reviewers with raw model outputs, lengthy reasoning chains, or ambiguous action descriptions. Reviewers will approve blindly to clear queues.
Fix: Structure approval tickets with plain-language summaries, frozen payload previews, source evidence links, reversibility status, and explicit risk warnings. Limit cognitive load to five key decision points.
5. Conflating Intent with Payload
Explanation: Allowing the model to regenerate or modify the action payload after human approval. The reviewer approves one set of parameters, but the system executes a different one.
Fix: Freeze the payload at the moment of approval. The execution broker must run the exact snapshot reviewed by the human or policy system. Never pass post-approval model outputs directly to production APIs.
6. Missing Idempotency Guards
Explanation: Retry mechanisms or network timeouts causing duplicate financial charges, record updates, or email dispatches.
Fix: Enforce strict idempotency keys at the execution layer. The broker should generate a deterministic key based on ticket ID, action type, and frozen payload hash. Deduplicate at the API gateway or service layer.
7. Treating Approval Gates as Injection Defense
Explanation: Assuming approval gates eliminate the need for prompt injection controls. Gates are a last-resort pause, not a substitute for input sanitization and instruction hierarchy.
Fix: Maintain instruction/data separation, use explicit delimiters for untrusted content, enforce tool allowlists, and apply least-privilege OAuth scopes. Approval gates catch what injection defenses miss.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Internal tooling & read-only queries | Allow with logging | Low blast radius; agents operate on isolated datasets | Minimal infrastructure overhead |
| Customer-facing automation (drafts, summaries) | Allow with draft watermarking | Prevents accidental external dispatch while maintaining UX | Low engineering cost |
| Financial operations (refunds, plan changes) | Human approval + scoped execution | High compliance risk; requires explicit accountability | Moderate ops overhead, high risk reduction |
| Cross-tenant data synchronization | Escalate to admin review + policy freeze | Prevents data leakage and confused deputy exploitation | High security value, requires admin staffing |
| Infrastructure mutations (deployments, config changes) | Deny by default + manual ticket workflow | Irreversible impact; agents lack contextual awareness of system state | Highest operational friction, lowest incident rate |
Configuration Template
# policy-config.yaml
risk_tiers:
- tier: 0
label: "READ_ONLY"
default_verdict: "ALLOW"
tools: ["search_docs", "fetch_ticket", "summarize_usage"]
- tier: 1
label: "DRAFT_ONLY"
default_verdict: "ALLOW"
tools: ["draft_email", "prepare_crm_note"]
- tier: 2
label: "LOW_IMPACT_WRITE"
default_verdict: "ALLOW"
tools: ["add_internal_note", "tag_ticket"]
- tier: 3
label: "EXTERNAL_COMMUNICATION"
default_verdict: "REVIEW_REQUIRED"
tools: ["send_email", "post_slack_message"]
- tier: 4
label: "FINANCIAL_OR_PERMISSION"
default_verdict: "REVIEW_REQUIRED"
tools: ["issue_refund", "update_subscription", "create_api_key"]
- tier: 5
label: "DESTRUCTIVE_OR_CROSS_TENANT"
default_verdict: "ESCALATE"
tools: ["delete_records", "export_tenant_data", "modify_access_control"]
policy_rules:
- condition: "record_count > 100"
verdict: "ESCALATE"
- condition: "source_context != 'internal' AND is_reversible == false"
verdict: "REVIEW_REQUIRED"
- condition: "estimated_financial_impact > 5000"
verdict: "REVIEW_REQUIRED"
- condition: "tool_name starts_with 'draft_'"
verdict: "ALLOW"
execution:
token_ttl_seconds: 60
require_idempotency: true
state_drift_check: true
audit_retention_days: 365
Quick Start Guide
- Map your tools to risk tiers: Inventory every action your agent can perform. Assign each to a tier based on financial impact, data sensitivity, and reversibility. Start with a conservative baseline; you can relax policies after monitoring.
- Deploy the policy engine: Implement the deterministic evaluator using your configuration. Route all agent tool calls through it before execution. Log every verdict for baseline monitoring.
- Build the approval ticket schema: Create a structured object that captures the frozen payload, plain-language summary, source evidence, and expiry. Integrate it with your internal admin dashboard or notification system.
- Implement the execution broker: Replace direct agent API calls with broker invocations. Enforce short-lived tokens, state re-validation, and idempotency. Connect the broker to your audit logging pipeline.
- Run controlled simulations: Test the gate with synthetic confused deputy prompts, high-volume record updates, and financial mutations. Verify that low-risk actions proceed automatically while high-risk actions pause, freeze payloads, and require explicit approval. Iterate on policy thresholds based on false positive/negative rates.