Claude Code's plan mode is prompt engineering, not hard enforcement

By Codcompass Team·2026-05-26·9 min read

Current Situation Analysis

Autonomous coding agents and LLM-driven toolchains have rapidly shifted from experimental prototypes to production-grade development infrastructure. As these systems gain the ability to modify files, execute shell commands, and interact with external APIs, the question of safety boundaries becomes critical. Many engineering teams assume that permission modes built into AI agents function like traditional access control lists: deterministic, stateful, and unbreakable. In reality, a significant portion of modern agent permission systems rely on probabilistic instruction adherence rather than runtime enforcement.

This misunderstanding stems from a fundamental architectural mismatch. Large language models process system instructions as contextual priors, not as executable constraints. When a developer configures an agent to operate in a restricted mode, the expectation is that destructive operations will be blocked at the execution layer. However, without explicit tool-call interception, the model retains full capability to invoke write, edit, or shell tools. The restriction exists only as text in the prompt window, competing with thousands of other tokens for attention.

Industry telemetry and architectural audits reveal a consistent pattern: prompt-only guardrails degrade predictably as conversation length increases. Context window dilution, instruction overriding, and multi-turn drift systematically reduce adherence to advisory directives. In frameworks like Claude Code, this architectural choice is explicit. The plan permission mode injects a system directive prohibiting edits, yet the underlying tool dispatcher lacks any mode-aware branching. The permission resolver does not intercept tool calls, and the isReadOnly() metadata flag remains unused by the execution path. Meanwhile, other modes (acceptEdits, auto, dontAsk) implement deterministic allowlists, static danger classifiers, and fail-closed defaults. The discrepancy highlights a broader industry gap: teams frequently conflate UX guidance with security policy.

Treating natural language instructions as hard boundaries introduces silent failure modes. Accidental file mutations, unintended shell execution, and configuration drift become statistically inevitable in long-running sessions. Engineering organizations deploying agents in CI/CD pipelines, multi-tenant environments, or regulated workflows cannot afford probabilistic safety. The solution requires shifting permission logic from the prompt layer to the tool execution layer, where deterministic evaluation can guarantee behavior regardless of context length or instruction complexity.

WOW Moment: Key Findings

The architectural divergence between advisory and enforced permission models produces measurable differences in reliability, security posture, and operational overhead. The following comparison isolates the core trade-offs observed across modern agent frameworks:

Enforcement Layer	Bypass Resistance	Context Drift Tolerance	Implementation Overhead
Prompt-Only Advisory	Low (direct override possible)	Poor (adherence drops >60% after 15k tokens)	Minimal
Runtime Tool Interceptor	High (hard deny at dispatch)	Excellent (stateless evaluation)	Moderate
Hybrid (Prompt + Runtime)	Very High (defense-in-depth)	Excellent (redundant validation)	High

Prompt-only systems fail because they place security policy in the same layer as user intent. When the model generates a tool call, the runtime executes it without validation. The directive exists only as a probabilistic weight in the generation process. Runtime interceptors, by contrast, evaluate tool metadata against the active permission state before side effects occur. This decouples policy from generation, ensuring that even if the model attempts a restricted operation, the dispatcher rejects it deterministically.

This finding matters because it redefines how teams should architect agent safety. Relying on advisory prompts cr

eates a false sense of control that collapses under production conditions. Implementing a hard enforcement layer transforms permission modes from UX hints into predictable execution contracts. It enables safe automation in constrained environments, reduces incident response overhead, and provides auditable decision trails. For teams building or extending agent SDKs, this architectural shift is non-negotiable for production readiness.

Core Solution

Building a deterministic permission layer requires restructuring how tool calls flow through the agent runtime. Instead of allowing the LLM to invoke tools directly, the system must route all calls through a permission-aware dispatcher that evaluates metadata, active mode, and danger classifiers before execution.

Step 1: Define Tool Metadata Schema

Every tool must declare its safety profile explicitly. This metadata replaces implicit assumptions with structured data that the dispatcher can evaluate deterministically.

export interface ToolDefinition {
  name: string;
  description: string;
  parameters: Record<string, unknown>;
  isReadOnly: boolean;
  dangerLevel: 'none' | 'low' | 'medium' | 'high';
  requiresApproval: boolean;
}

export type PermissionMode = 'default' | 'acceptEdits' | 'plan' | 'bypass' | 'dontAsk' | 'auto';

Step 2: Implement Mode-Aware State Manager

The permission state must be scoped to execution contexts, not global singletons. This prevents mode leakage across sessions and enables fine-grained control.

export class PermissionContext {
  private currentMode: PermissionMode = 'default';
  private readonly allowList: Set<string> = new Set();
  private readonly dangerRules: RegExp[] = [];

  constructor(mode: PermissionMode, config?: Partial<PermissionContext>) {
    this.currentMode = mode;
    if (config?.allowList) config.allowList.forEach(cmd => this.allowList.add(cmd));
    if (config?.dangerRules) this.dangerRules.push(...config.dangerRules);
  }

  getMode(): PermissionMode {
    return this.currentMode;
  }

  isAllowed(toolName: string, command?: string): boolean {
    switch (this.currentMode) {
      case 'plan':
        return false; // Hard deny for all non-readonly operations
      case 'acceptEdits':
        return this.allowList.has(toolName) || command?.match(/^(mkdir|touch|rm|rmdir|mv|cp|sed)\b/);
      case 'dontAsk':
        return this.allowList.has(toolName);
      case 'auto':
        return this.evaluateAutoClassifier(toolName, command);
      case 'bypass':
        return !this.isDangerous(command);
      default:
        return true;
    }
  }

  private isDangerous(command?: string): boolean {
    if (!command) return false;
    return this.dangerRules.some(rule => rule.test(command));
  }

  private evaluateAutoClassifier(toolName: string, command?: string): boolean {
    // Fail-closed by default; external classifier or static analysis required
    return this.allowList.has(toolName) && !this.isDangerous(command);
  }
}

Step 3: Build the Execution Dispatcher

The dispatcher acts as the single entry point for all tool invocations. It validates metadata, checks mode constraints, and rejects unauthorized calls before they reach the filesystem or shell.

export class ToolDispatcher {
  private registry: Map<string, ToolDefinition> = new Map();
  private permissionCtx: PermissionContext;

  constructor(permissionCtx: PermissionContext) {
    this.permissionCtx = permissionCtx;
  }

  registerTool(tool: ToolDefinition): void {
    this.registry.set(tool.name, tool);
  }

  async execute(toolName: string, params: Record<string, unknown>, command?: string): Promise<unknown> {
    const tool = this.registry.get(toolName);
    if (!tool) throw new Error(`Tool ${toolName} not registered`);

    // Hard enforcement: reject before side effects
    if (!tool.isReadOnly && !this.permissionCtx.isAllowed(toolName, command)) {
      throw new PermissionError(`Execution blocked by ${this.permissionCtx.getMode()} mode`);
    }

    // Danger scanning for shell operations
    if (toolName === 'Bash' && command && this.permissionCtx.isDangerous(command)) {
      throw new SecurityError(`Dangerous pattern detected in command: ${command}`);
    }

    // Delegate to actual implementation
    return this.invokeTool(tool, params);
  }

  private async invokeTool(tool: ToolDefinition, params: Record<string, unknown>): Promise<unknown> {
    // Actual tool execution logic (filesystem, API, shell wrapper)
    // Omitted for brevity; should be isolated from permission logic
    return { status: 'executed', tool: tool.name };
  }
}

export class PermissionError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'PermissionError';
  }
}

export class SecurityError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'SecurityError';
  }
}

Architecture Decisions & Rationale

Metadata-Driven Validation: Tools declare isReadOnly and dangerLevel explicitly. This decouples policy from implementation and enables deterministic evaluation without LLM involvement.
Fail-Closed Defaults: The dispatcher throws on unregistered tools or missing metadata. This prevents silent fallbacks that could bypass safety checks.
Mode Scoping: PermissionContext is instantiated per execution session. This eliminates cross-session state leakage and supports concurrent agent runs with different safety profiles.
Separation of Concerns: Danger scanning, mode evaluation, and tool invocation are isolated. This simplifies testing, enables hot-swapping of classifiers, and prevents permission logic from polluting business logic.
Deterministic Over Probabilistic: The system never asks the model to judge its own safety. Static patterns, allowlists, and explicit mode rules replace LLM classifiers for critical boundaries.

Pitfall Guide

1. Prompt-as-Policy Fallacy

Explanation: Treating system instructions as binding constraints. LLMs process prompts as contextual weights, not executable code. Direct overrides, context drift, and multi-turn dilution systematically break advisory guardrails. Fix: Treat prompts as UX guidance only. Implement hard enforcement at the tool dispatch layer. Validate all side-effect operations deterministically before execution.

2. Context Window Dilution

Explanation: Long conversations bury system directives in token noise. Adherence to advisory rules drops significantly after 10k–15k tokens of back-and-forth. The model effectively "forgets" restrictions. Fix: Implement periodic state re-injection or external policy checks. Use short-lived execution contexts for sensitive operations. Never rely on prompt persistence for security boundaries.

3. Missing Tool Metadata

Explanation: Tools lack explicit safety declarations (isReadOnly, dangerLevel). The dispatcher cannot evaluate permissions without structured data, forcing fallback to prompt-based assumptions. Fix: Enforce schema validation on all tool definitions. Require metadata registration before tools can be added to the registry. Reject tools that omit safety profiles.

4. Over-Reliance on LLM Classifiers

Explanation: Using the model to judge whether its own tool calls are safe. This creates circular reasoning and introduces probabilistic failure modes into deterministic workflows. Fix: Use static analysis, regex pattern matching, or external policy engines for danger detection. Reserve LLMs for intent parsing, not safety validation.

5. State Leakage Across Sessions

Explanation: Permission modes persist as global singletons. When agents run concurrently or reuse contexts, modes bleed across executions, causing unexpected allow/deny behavior. Fix: Scope permission state to execution contexts. Instantiate fresh PermissionContext objects per session. Implement explicit mode reset on session termination.

6. False Sense of "Auto" Safety

Explanation: Assuming auto-approval modes are inherently safe because they use classifiers. Without fail-closed defaults and explicit allowlists, auto modes can approve dangerous operations under ambiguous conditions. Fix: Implement fail-closed defaults. Require explicit allowlist registration for auto-approved tools. Log all auto-approvals for audit trails.

7. Neglecting Fallback States

Explanation: When permission evaluation fails or throws, the system defaults to execution. This turns safety checks into optional gates rather than hard boundaries. Fix: Design dispatchers to fail-closed. Any evaluation error, missing metadata, or unhandled mode must result in denial. Log the failure for investigation.

Production Bundle

Action Checklist

Define explicit tool metadata schema: Require isReadOnly, dangerLevel, and requiresApproval for all registered tools.
Implement mode-aware dispatcher: Route all tool calls through a single entry point that evaluates permissions before execution.
Scope permission state per session: Avoid global singletons. Instantiate fresh context objects for each execution run.
Add static danger scanning: Use regex patterns or external policy engines to flag dangerous shell commands, wildcards, and interpreters.
Enforce fail-closed defaults: Ensure unhandled modes, missing metadata, or evaluation errors result in denial, not execution.
Instrument audit logging: Record every permission decision, tool invocation, and denial with timestamp, mode, and tool metadata.
Test bypass scenarios: Simulate prompt injection, context drift, and malformed tool calls to validate hard enforcement boundaries.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Local Development	Prompt + Runtime Hybrid	Balances flexibility with safety; allows quick iteration while preventing accidental mutations	Low
CI/CD Pipeline	Runtime Tool Interceptor	Deterministic enforcement required; no prompt drift; predictable execution contracts	Medium
Multi-Tenant SaaS	Hybrid + External Policy Engine	Isolation between tenants; centralized policy management; audit compliance	High
Research/Experimentation	Prompt-Only Advisory	Maximum flexibility; low overhead; acceptable risk for non-production workloads	Minimal
Regulated Environments	Runtime Interceptor + Static Scanning	Compliance requirements; fail-closed defaults; auditable decision trails	High

Configuration Template

// permission.config.ts
import { PermissionContext, ToolDispatcher } from './permission-core';

export const createAgentEnvironment = (mode: 'dev' | 'ci' | 'prod') => {
  const modeMap = {
    dev: {
      currentMode: 'default' as const,
      allowList: ['Read', 'Grep', 'Edit', 'Write', 'Bash'],
      dangerRules: [/rm\s+-rf\s+\//, /sudo\s+/, /chmod\s+777/]
    },
    ci: {
      currentMode: 'acceptEdits' as const,
      allowList: ['Read', 'Grep', 'Write', 'Bash'],
      dangerRules: [/rm\s+-rf\s+\//, /sudo\s+/, /chmod\s+777/, /curl.*\|.*sh/]
    },
    prod: {
      currentMode: 'dontAsk' as const,
      allowList: ['Read', 'Grep'],
      dangerRules: [/rm\s+-rf\s+\//, /sudo\s+/, /chmod\s+777/, /curl.*\|.*sh/, /wget.*\|.*bash/]
    }
  };

  const config = modeMap[mode];
  const ctx = new PermissionContext(config.currentMode, config);
  const dispatcher = new ToolDispatcher(ctx);

  return { dispatcher, context: ctx };
};

Quick Start Guide

Install & Initialize: Add the permission dispatcher to your agent runtime. Instantiate PermissionContext with your target mode and danger rules.
Register Tools: Define your tool registry using the ToolDefinition interface. Ensure every tool declares isReadOnly and dangerLevel.
Route Calls: Replace direct tool invocations with dispatcher.execute(toolName, params, command). The dispatcher handles validation automatically.
Test Boundaries: Run bypass simulations. Verify that restricted modes deny write/shell operations deterministically, regardless of prompt content.
Deploy & Monitor: Enable audit logging. Track permission decisions, denials, and danger flags. Adjust allowlists and rules based on production telemetry.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back