Claude Code's plan mode is prompt engineering, not hard enforcement
By Codcompass Team··9 min read
Current Situation Analysis
Autonomous coding agents and LLM-driven toolchains have rapidly shifted from experimental prototypes to production-grade development infrastructure. As these systems gain the ability to modify files, execute shell commands, and interact with external APIs, the question of safety boundaries becomes critical. Many engineering teams assume that permission modes built into AI agents function like traditional access control lists: deterministic, stateful, and unbreakable. In reality, a significant portion of modern agent permission systems rely on probabilistic instruction adherence rather than runtime enforcement.
This misunderstanding stems from a fundamental architectural mismatch. Large language models process system instructions as contextual priors, not as executable constraints. When a developer configures an agent to operate in a restricted mode, the expectation is that destructive operations will be blocked at the execution layer. However, without explicit tool-call interception, the model retains full capability to invoke write, edit, or shell tools. The restriction exists only as text in the prompt window, competing with thousands of other tokens for attention.
Industry telemetry and architectural audits reveal a consistent pattern: prompt-only guardrails degrade predictably as conversation length increases. Context window dilution, instruction overriding, and multi-turn drift systematically reduce adherence to advisory directives. In frameworks like Claude Code, this architectural choice is explicit. The plan permission mode injects a system directive prohibiting edits, yet the underlying tool dispatcher lacks any mode-aware branching. The permission resolver does not intercept tool calls, and the isReadOnly() metadata flag remains unused by the execution path. Meanwhile, other modes (acceptEdits, auto, dontAsk) implement deterministic allowlists, static danger classifiers, and fail-closed defaults. The discrepancy highlights a broader industry gap: teams frequently conflate UX guidance with security policy.
Treating natural language instructions as hard boundaries introduces silent failure modes. Accidental file mutations, unintended shell execution, and configuration drift become statistically inevitable in long-running sessions. Engineering organizations deploying agents in CI/CD pipelines, multi-tenant environments, or regulated workflows cannot afford probabilistic safety. The solution requires shifting permission logic from the prompt layer to the tool execution layer, where deterministic evaluation can guarantee behavior regardless of context length or instruction complexity.
WOW Moment: Key Findings
The architectural divergence between advisory and enforced permission models produces measurable differences in reliability, security posture, and operational overhead. The following comparison isolates the core trade-offs observed across modern agent frameworks:
Enforcement Layer
Bypass Resistance
Context Drift Tolerance
Implementation Overhead
Prompt-Only Advisory
Low (direct override possible)
Poor (adherence drops >60% after 15k tokens)
Minimal
Runtime Tool Interceptor
High (hard deny at dispatch)
Excellent (stateless evaluation)
Moderate
Hybrid (Prompt + Runtime)
Very High (defense-in-depth)
Excellent (redundant validation)
High
Prompt-only systems fail because they place security policy in the same layer as user intent. When the model generates a tool call, the runtime executes it without validation. The directive exists only as a probabilistic weight in the generation process. Runtime interceptors, by contrast, evaluate tool metadata against the active permission state before side effects occur. This decouples policy from generation, ensuring that even if the model attempts a restricted operation, the dispatcher rejects it deterministically.
This finding matters because it redefines how teams should architect agent safety. Relying on advisory prompts cr
eates a false sense of control that collapses under production conditions. Implementing a hard enforcement layer transforms permission modes from UX hints into predictable execution contracts. It enables safe automation in constrained environments, reduces incident response overhead, and provides auditable decision trails. For teams building or extending agent SDKs, this architectural shift is non-negotiable for production readiness.
Core Solution
Building a deterministic permission layer requires restructuring how tool calls flow through the agent runtime. Instead of allowing the LLM to invoke tools directly, the system must route all calls through a permission-aware dispatcher that evaluates metadata, active mode, and danger classifiers before execution.
Step 1: Define Tool Metadata Schema
Every tool must declare its safety profile explicitly. This metadata replaces implicit assumptions with structured data that the dispatcher can evaluate deterministically.
The permission state must be scoped to execution contexts, not global singletons. This prevents mode leakage across sessions and enables fine-grained control.
export class PermissionContext {
private currentMode: PermissionMode = 'default';
private readonly allowList: Set<string> = new Set();
private readonly dangerRules: RegExp[] = [];
constructor(mode: PermissionMode, config?: Partial<PermissionContext>) {
this.currentMode = mode;
if (config?.allowList) config.allowList.forEach(cmd => this.allowList.add(cmd));
if (config?.dangerRules) this.dangerRules.push(...config.dangerRules);
}
getMode(): PermissionMode {
return this.currentMode;
}
isAllowed(toolName: string, command?: string): boolean {
switch (this.currentMode) {
case 'plan':
return false; // Hard deny for all non-readonly operations
case 'acceptEdits':
return this.allowList.has(toolName) || command?.match(/^(mkdir|touch|rm|rmdir|mv|cp|sed)\b/);
case 'dontAsk':
return this.allowList.has(toolName);
case 'auto':
return this.evaluateAutoClassifier(toolName, command);
case 'bypass':
return !this.isDangerous(command);
default:
return true;
}
}
private isDangerous(command?: string): boolean {
if (!command) return false;
return this.dangerRules.some(rule => rule.test(command));
}
private evaluateAutoClassifier(toolName: string, command?: string): boolean {
// Fail-closed by default; external classifier or static analysis required
return this.allowList.has(toolName) && !this.isDangerous(command);
}
}
Step 3: Build the Execution Dispatcher
The dispatcher acts as the single entry point for all tool invocations. It validates metadata, checks mode constraints, and rejects unauthorized calls before they reach the filesystem or shell.
export class ToolDispatcher {
private registry: Map<string, ToolDefinition> = new Map();
private permissionCtx: PermissionContext;
constructor(permissionCtx: PermissionContext) {
this.permissionCtx = permissionCtx;
}
registerTool(tool: ToolDefinition): void {
this.registry.set(tool.name, tool);
}
async execute(toolName: string, params: Record<string, unknown>, command?: string): Promise<unknown> {
const tool = this.registry.get(toolName);
if (!tool) throw new Error(`Tool ${toolName} not registered`);
// Hard enforcement: reject before side effects
if (!tool.isReadOnly && !this.permissionCtx.isAllowed(toolName, command)) {
throw new PermissionError(`Execution blocked by ${this.permissionCtx.getMode()} mode`);
}
// Danger scanning for shell operations
if (toolName === 'Bash' && command && this.permissionCtx.isDangerous(command)) {
throw new SecurityError(`Dangerous pattern detected in command: ${command}`);
}
// Delegate to actual implementation
return this.invokeTool(tool, params);
}
private async invokeTool(tool: ToolDefinition, params: Record<string, unknown>): Promise<unknown> {
// Actual tool execution logic (filesystem, API, shell wrapper)
// Omitted for brevity; should be isolated from permission logic
return { status: 'executed', tool: tool.name };
}
}
export class PermissionError extends Error {
constructor(message: string) {
super(message);
this.name = 'PermissionError';
}
}
export class SecurityError extends Error {
constructor(message: string) {
super(message);
this.name = 'SecurityError';
}
}
Architecture Decisions & Rationale
Metadata-Driven Validation: Tools declare isReadOnly and dangerLevel explicitly. This decouples policy from implementation and enables deterministic evaluation without LLM involvement.
Fail-Closed Defaults: The dispatcher throws on unregistered tools or missing metadata. This prevents silent fallbacks that could bypass safety checks.
Mode Scoping: PermissionContext is instantiated per execution session. This eliminates cross-session state leakage and supports concurrent agent runs with different safety profiles.
Separation of Concerns: Danger scanning, mode evaluation, and tool invocation are isolated. This simplifies testing, enables hot-swapping of classifiers, and prevents permission logic from polluting business logic.
Deterministic Over Probabilistic: The system never asks the model to judge its own safety. Static patterns, allowlists, and explicit mode rules replace LLM classifiers for critical boundaries.
Pitfall Guide
1. Prompt-as-Policy Fallacy
Explanation: Treating system instructions as binding constraints. LLMs process prompts as contextual weights, not executable code. Direct overrides, context drift, and multi-turn dilution systematically break advisory guardrails.
Fix: Treat prompts as UX guidance only. Implement hard enforcement at the tool dispatch layer. Validate all side-effect operations deterministically before execution.
2. Context Window Dilution
Explanation: Long conversations bury system directives in token noise. Adherence to advisory rules drops significantly after 10k–15k tokens of back-and-forth. The model effectively "forgets" restrictions.
Fix: Implement periodic state re-injection or external policy checks. Use short-lived execution contexts for sensitive operations. Never rely on prompt persistence for security boundaries.
3. Missing Tool Metadata
Explanation: Tools lack explicit safety declarations (isReadOnly, dangerLevel). The dispatcher cannot evaluate permissions without structured data, forcing fallback to prompt-based assumptions.
Fix: Enforce schema validation on all tool definitions. Require metadata registration before tools can be added to the registry. Reject tools that omit safety profiles.
4. Over-Reliance on LLM Classifiers
Explanation: Using the model to judge whether its own tool calls are safe. This creates circular reasoning and introduces probabilistic failure modes into deterministic workflows.
Fix: Use static analysis, regex pattern matching, or external policy engines for danger detection. Reserve LLMs for intent parsing, not safety validation.
5. State Leakage Across Sessions
Explanation: Permission modes persist as global singletons. When agents run concurrently or reuse contexts, modes bleed across executions, causing unexpected allow/deny behavior.
Fix: Scope permission state to execution contexts. Instantiate fresh PermissionContext objects per session. Implement explicit mode reset on session termination.
6. False Sense of "Auto" Safety
Explanation: Assuming auto-approval modes are inherently safe because they use classifiers. Without fail-closed defaults and explicit allowlists, auto modes can approve dangerous operations under ambiguous conditions.
Fix: Implement fail-closed defaults. Require explicit allowlist registration for auto-approved tools. Log all auto-approvals for audit trails.
7. Neglecting Fallback States
Explanation: When permission evaluation fails or throws, the system defaults to execution. This turns safety checks into optional gates rather than hard boundaries.
Fix: Design dispatchers to fail-closed. Any evaluation error, missing metadata, or unhandled mode must result in denial. Log the failure for investigation.
Production Bundle
Action Checklist
Define explicit tool metadata schema: Require isReadOnly, dangerLevel, and requiresApproval for all registered tools.
Implement mode-aware dispatcher: Route all tool calls through a single entry point that evaluates permissions before execution.
Scope permission state per session: Avoid global singletons. Instantiate fresh context objects for each execution run.
Add static danger scanning: Use regex patterns or external policy engines to flag dangerous shell commands, wildcards, and interpreters.
Enforce fail-closed defaults: Ensure unhandled modes, missing metadata, or evaluation errors result in denial, not execution.
Instrument audit logging: Record every permission decision, tool invocation, and denial with timestamp, mode, and tool metadata.
Test bypass scenarios: Simulate prompt injection, context drift, and malformed tool calls to validate hard enforcement boundaries.
Decision Matrix
Scenario
Recommended Approach
Why
Cost Impact
Local Development
Prompt + Runtime Hybrid
Balances flexibility with safety; allows quick iteration while preventing accidental mutations
Low
CI/CD Pipeline
Runtime Tool Interceptor
Deterministic enforcement required; no prompt drift; predictable execution contracts
Medium
Multi-Tenant SaaS
Hybrid + External Policy Engine
Isolation between tenants; centralized policy management; audit compliance
High
Research/Experimentation
Prompt-Only Advisory
Maximum flexibility; low overhead; acceptable risk for non-production workloads
Install & Initialize: Add the permission dispatcher to your agent runtime. Instantiate PermissionContext with your target mode and danger rules.
Register Tools: Define your tool registry using the ToolDefinition interface. Ensure every tool declares isReadOnly and dangerLevel.
Route Calls: Replace direct tool invocations with dispatcher.execute(toolName, params, command). The dispatcher handles validation automatically.
Test Boundaries: Run bypass simulations. Verify that restricted modes deny write/shell operations deterministically, regardless of prompt content.
Deploy & Monitor: Enable audit logging. Track permission decisions, denials, and danger flags. Adjust allowlists and rules based on production telemetry.
🎉 Mid-Year Sale — Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all 635+ tutorials.