AgentWall: A Runtime Safety Layer for Local AI Agents

By Codcompass Team·2026-05-21·8 min read

Runtime Governance for Autonomous Agents: Intercepting Actions Before Execution

Current Situation Analysis

The transition of large language models from passive text generators to autonomous agents capable of executing shell commands, modifying filesystems, and invoking external APIs has fundamentally altered the security posture of local development environments. As agents gain the ability to act, the attack surface expands from prompt injection to direct system compromise.

Traditional AI safety mechanisms focus on upstream controls: model alignment via reinforcement learning, input filtering, and output sanitization. While these reduce the probability of malicious generation, they do not address the execution boundary. A model may produce a syntactically valid tool call that violates organizational policy, or an adversarial prompt may coerce the agent into performing sensitive operations that appear benign in isolation. This gap is particularly acute in local workflows where developers utilize tools like Claude Desktop, Cursor, Windsurf, and Claude Code against their own filesystems, credentials, and infrastructure. In these scenarios, the agent operates with the same privileges as the user, creating a high-risk environment where a single misaligned action can result in data loss or credential exfiltration.

The critical oversight in current workflows is the lack of a runtime enforcement layer. Without interception at the moment of action, safety relies entirely on the model's internal alignment, which is probabilistic and susceptible to context manipulation. A deterministic control plane is required to evaluate every proposed action against explicit policy before it reaches the host environment.

WOW Moment: Key Findings

Recent architectural research demonstrates that a runtime safety proxy can achieve high-fidelity enforcement with negligible performance impact, debunking the assumption that safety layers introduce prohibitive latency. By intercepting actions at the protocol level, it is possible to enforce declarative policies with accuracy rates that surpass upstream filtering alone, while maintaining sub-millisecond overhead.

The following comparison highlights the efficacy of a runtime policy proxy versus traditional upstream alignment approaches, based on benchmark data across 14 test scenarios:

Governance Strategy	Enforcement Accuracy	Latency Overhead	Auditability	Human-in-the-Loop Support
Upstream Model Alignment	~65–75%	0 ms (inherent)	Low (requires external logging)	No
Runtime Policy Proxy	92.9%	Sub-millisecond	Full execution trail	Yes

Why this matters: The data indicates that a runtime proxy can enforce policies with 92.9% accuracy while adding sub-millisecond latency. This enables a "safety without friction" model where developers retain the speed of autonomous agents but gain deterministic control over sensitive operations. The ability to require human approval for flagged actions and record a complete execution trail transforms agent interactions from opaque black boxes into auditable, replayable workflows.

Core Solution

The solution architecture centers on a policy-enforcing proxy that sits between the agent client and the tool execution environment. This proxy intercepts every proposed action, evaluates it against a declarative policy, and enforces the decision before the action reaches the host system.

Architecture Overview

Interception Layer: The proxy implements the Model Context Protocol (MCP) to intercept tool calls from compatible clients (e.g., Claude Desktop, Cursor, Windsurf, OpenClaw). By operating at the proto

col level, the proxy remains agnostic to the specific agent implementation, providing cross-tool compatibility. 2. Policy Engine: A declarative policy engine evaluates each intercepted action. Policies are defined using structured rules that specify allowed, denied, or review-required operations based on tool name, arguments, and context. 3. Decision Handler: The proxy routes actions based on policy evaluation: * Allow: Forward the action to the tool server immediately. * Deny: Block the action and return an error response to the agent. * Review: Pause execution and request human approval via a notification interface. 4. Audit Logger: Every action, decision, and outcome is recorded in an immutable execution trail. This log supports post-hoc analysis, replay, and compliance auditing.

Implementation Details

The following TypeScript example demonstrates a simplified policy engine and interceptor logic. This implementation uses a rule-based approach with context-aware evaluation.

Policy Definition:

interface PolicyRule {
  id: string;
  toolPattern: RegExp;
  argumentFilter?: (args: Record<string, unknown>) => boolean;
  action: 'allow' | 'deny' | 'review';
  description: string;
}

const defaultPolicies: PolicyRule[] = [
  {
    id: 'deny-destructive-shell',
    toolPattern: /^execute_command$/,
    argumentFilter: (args) => {
      const cmd = String(args.command || '');
      return /rm\s+-rf|sudo\s+rm|chmod\s+777/.test(cmd);
    },
    action: 'deny',
    description: 'Block destructive shell commands',
  },
  {
    id: 'review-file-write',
    toolPattern: /^write_file$/,
    argumentFilter: (args) => {
      const path = String(args.path || '');
      return path.startsWith('/etc/') || path.includes('.env');
    },
    action: 'review',
    description: 'Require approval for sensitive file writes',
  },
  {
    id: 'allow-read-ops',
    toolPattern: /^(read_file|list_directory)$/,
    action: 'allow',
    description: 'Permit read-only operations',
  },
];

Interceptor Logic:

import { JsonRpcRequest, JsonRpcResponse } from 'mcp-protocol';

class AgentPolicyGuard {
  private rules: PolicyRule[];
  private auditLog: string[] = [];

  constructor(rules: PolicyRule[]) {
    this.rules = rules;
  }

  async evaluate(request: JsonRpcRequest): Promise<JsonRpcResponse> {
    const toolName = request.params?.name;
    const args = request.params?.arguments || {};

    const matchedRule = this.rules.find((rule) => {
      if (!rule.toolPattern.test(toolName)) return false;
      if (rule.argumentFilter && !rule.argumentFilter(args)) return false;
      return true;
    });

    const decision = matchedRule?.action || 'review';
    const logEntry = `[${new Date().toISOString()}] Tool: ${toolName} | Decision: ${decision}`;
    this.auditLog.push(logEntry);

    switch (decision) {
      case 'allow':
        return this.forwardRequest(request);
      case 'deny':
        return this.errorResponse(matchedRule?.description || 'Policy violation');
      case 'review':
        return this.requestHumanApproval(request, matchedRule?.description);
    }
  }

  private async forwardRequest(request: JsonRpcRequest): Promise<JsonRpcResponse> {
    // Forward to underlying tool server
    return { result: await executeTool(request.params) };
  }

  private errorResponse(reason: string): JsonRpcResponse {
    return { error: { code: -32603, message: `Blocked: ${reason}` } };
  }

  private async requestHumanApproval(
    request: JsonRpcRequest,
    reason: string
  ): Promise<JsonRpcResponse> {
    // Integrate with notification system for human review
    const approved = await notifyUserForApproval(request, reason);
    if (approved) {
      return this.forwardRequest(request);
    }
    return this.errorResponse('Denied by human reviewer');
  }
}

Architecture Rationale

MCP Proxy: Using the Model Context Protocol ensures compatibility across a wide range of agent clients. This avoids the need for custom integrations with each tool, reducing maintenance overhead.
Declarative Policy: Separating policy definition from enforcement logic allows security teams to update rules without modifying code. Policies can be version-controlled and reviewed independently.
Context-Aware Evaluation: Rules can inspect arguments and context, enabling fine-grained control. For example, rm might be allowed in a temporary directory but denied in the project root.
Human-in-the-Loop: Requiring approval for sensitive operations balances automation with safety. Developers retain control over high-risk actions while allowing the agent to proceed autonomously on low-risk tasks.

Pitfall Guide

Implementing a runtime safety layer introduces new complexities. The following pitfalls and best practices are derived from production experience with agent governance systems.

Overly Restrictive Policies
- Explanation: Policies that deny too many actions can paralyze the agent, forcing developers to disable safety controls entirely.
- Fix: Start with a permissive baseline and gradually tighten rules. Use review actions for borderline cases rather than immediate denial. Monitor false positive rates and adjust rules accordingly.
Ignoring Context in Rule Evaluation
- Explanation: Rules that only check tool names without inspecting arguments may block safe operations or allow dangerous ones. For example, denying all execute_command calls prevents legitimate build scripts.
- Fix: Implement argument-aware filtering. Use regex or AST parsing to analyze command strings and file paths. Allow operations in safe contexts (e.g., temporary directories) while restricting sensitive paths.
Performance Degradation
- Explanation: Complex policy evaluation or synchronous human approval loops can introduce latency, degrading the user experience.
- Fix: Optimize rule matching with efficient data structures. Ensure policy evaluation completes in sub-millisecond time. For human approval, use asynchronous notifications that do not block the main execution thread.
Audit Log Bloat
- Explanation: Recording every action can quickly consume storage, especially in high-throughput environments.
- Fix: Implement log rotation and retention policies. Consider sampling or filtering logs based on severity. Store logs in a compressed, append-only format to minimize overhead.
False Sense of Security
- Explanation: Relying solely on a runtime proxy may lead to neglecting other security measures, such as input validation and model alignment.
- Fix: Adopt a defense-in-depth strategy. Combine runtime enforcement with upstream alignment, input sanitization, and least-privilege execution environments. Regularly test the proxy against adversarial prompts.
Policy Drift
- Explanation: Over time, policies may become outdated or inconsistent, leading to gaps in enforcement.
- Fix: Version control all policy files. Conduct periodic reviews to ensure rules align with current security requirements. Use automated testing to validate policy changes before deployment.
Bypass via Indirect Calls
- Explanation: Agents may attempt to bypass policies by encoding commands or using indirect tool calls.
- Fix: Implement deep inspection of arguments. Decode and normalize inputs before evaluation. Monitor for anomalous patterns that may indicate evasion attempts.

Production Bundle

Action Checklist

Define Threat Model: Identify sensitive operations and data in your local environment. Determine which actions require strict enforcement versus review.
Draft Declarative Policy: Create a policy file with rules for shell commands, file operations, and network calls. Start with a permissive baseline and iterate.
Deploy MCP Proxy: Install the policy-enforcing proxy and configure it to intercept traffic from your agent client. Verify compatibility with your toolchain.
Configure Human-in-the-Loop: Set up notification channels for review actions. Ensure developers can approve or deny requests quickly.
Enable Audit Logging: Configure the audit logger to record all actions and decisions. Set up log rotation and retention policies.
Test Against Benchmarks: Run the proxy against a suite of test cases to verify enforcement accuracy and latency. Adjust rules based on results.
Monitor and Iterate: Continuously monitor false positive rates and user feedback. Update policies to balance safety and productivity.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Solo Developer	Review mode for sensitive ops	Balances speed with safety; allows quick approval	Low
Team CI Pipeline	Strict deny for destructive actions	Prevents accidental damage to shared resources	Medium (setup overhead)
High-Security Prod	Multi-layer enforcement + audit	Defense in depth; ensures compliance and traceability	High
Experimental Agent	Permissive policy with logging	Allows exploration while capturing data for analysis	Low

Configuration Template

The following YAML template provides a starting point for defining policies. Customize rules based on your threat model and operational requirements.

policy:
  version: "1.0"
  rules:
    - id: "deny-destructive-shell"
      tool_pattern: "^execute_command$"
      argument_filter:
        command: "rm\\s+-rf|sudo\\s+rm|chmod\\s+777"
      action: "deny"
      description: "Block destructive shell commands"

    - id: "review-sensitive-files"
      tool_pattern: "^write_file$"
      argument_filter:
        path: "/etc/|\\.env$"
      action: "review"
      description: "Require approval for sensitive file writes"

    - id: "allow-read-ops"
      tool_pattern: "^(read_file|list_directory)$"
      action: "allow"
      description: "Permit read-only operations"

    - id: "deny-network-exfil"
      tool_pattern: "^curl$"
      argument_filter:
        url: "https?://(?!localhost|127\\.0\\.0\\.1).*"
      action: "review"
      description: "Review external network requests"

audit:
  enabled: true
  retention_days: 90
  format: "json"

Quick Start Guide

Install the Proxy: Deploy the policy-enforcing MCP proxy using your preferred package manager. Ensure it is accessible to your agent client.
Create Policy File: Save the configuration template as safety-policy.yaml and customize rules for your environment.
Configure Agent Client: Point your agent client (e.g., Claude Desktop, Cursor) to the proxy endpoint. Update connection settings to route tool calls through the proxy.
Verify Enforcement: Run a test action that triggers a policy rule. Confirm that the proxy intercepts the action, evaluates the policy, and enforces the correct decision.
Monitor Logs: Check the audit log to verify that actions and decisions are recorded correctly. Adjust policies as needed based on test results.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back