col level, the proxy remains agnostic to the specific agent implementation, providing cross-tool compatibility.
2. Policy Engine: A declarative policy engine evaluates each intercepted action. Policies are defined using structured rules that specify allowed, denied, or review-required operations based on tool name, arguments, and context.
3. Decision Handler: The proxy routes actions based on policy evaluation:
* Allow: Forward the action to the tool server immediately.
* Deny: Block the action and return an error response to the agent.
* Review: Pause execution and request human approval via a notification interface.
4. Audit Logger: Every action, decision, and outcome is recorded in an immutable execution trail. This log supports post-hoc analysis, replay, and compliance auditing.
Implementation Details
The following TypeScript example demonstrates a simplified policy engine and interceptor logic. This implementation uses a rule-based approach with context-aware evaluation.
Policy Definition:
interface PolicyRule {
id: string;
toolPattern: RegExp;
argumentFilter?: (args: Record<string, unknown>) => boolean;
action: 'allow' | 'deny' | 'review';
description: string;
}
const defaultPolicies: PolicyRule[] = [
{
id: 'deny-destructive-shell',
toolPattern: /^execute_command$/,
argumentFilter: (args) => {
const cmd = String(args.command || '');
return /rm\s+-rf|sudo\s+rm|chmod\s+777/.test(cmd);
},
action: 'deny',
description: 'Block destructive shell commands',
},
{
id: 'review-file-write',
toolPattern: /^write_file$/,
argumentFilter: (args) => {
const path = String(args.path || '');
return path.startsWith('/etc/') || path.includes('.env');
},
action: 'review',
description: 'Require approval for sensitive file writes',
},
{
id: 'allow-read-ops',
toolPattern: /^(read_file|list_directory)$/,
action: 'allow',
description: 'Permit read-only operations',
},
];
Interceptor Logic:
import { JsonRpcRequest, JsonRpcResponse } from 'mcp-protocol';
class AgentPolicyGuard {
private rules: PolicyRule[];
private auditLog: string[] = [];
constructor(rules: PolicyRule[]) {
this.rules = rules;
}
async evaluate(request: JsonRpcRequest): Promise<JsonRpcResponse> {
const toolName = request.params?.name;
const args = request.params?.arguments || {};
const matchedRule = this.rules.find((rule) => {
if (!rule.toolPattern.test(toolName)) return false;
if (rule.argumentFilter && !rule.argumentFilter(args)) return false;
return true;
});
const decision = matchedRule?.action || 'review';
const logEntry = `[${new Date().toISOString()}] Tool: ${toolName} | Decision: ${decision}`;
this.auditLog.push(logEntry);
switch (decision) {
case 'allow':
return this.forwardRequest(request);
case 'deny':
return this.errorResponse(matchedRule?.description || 'Policy violation');
case 'review':
return this.requestHumanApproval(request, matchedRule?.description);
}
}
private async forwardRequest(request: JsonRpcRequest): Promise<JsonRpcResponse> {
// Forward to underlying tool server
return { result: await executeTool(request.params) };
}
private errorResponse(reason: string): JsonRpcResponse {
return { error: { code: -32603, message: `Blocked: ${reason}` } };
}
private async requestHumanApproval(
request: JsonRpcRequest,
reason: string
): Promise<JsonRpcResponse> {
// Integrate with notification system for human review
const approved = await notifyUserForApproval(request, reason);
if (approved) {
return this.forwardRequest(request);
}
return this.errorResponse('Denied by human reviewer');
}
}
Architecture Rationale
- MCP Proxy: Using the Model Context Protocol ensures compatibility across a wide range of agent clients. This avoids the need for custom integrations with each tool, reducing maintenance overhead.
- Declarative Policy: Separating policy definition from enforcement logic allows security teams to update rules without modifying code. Policies can be version-controlled and reviewed independently.
- Context-Aware Evaluation: Rules can inspect arguments and context, enabling fine-grained control. For example,
rm might be allowed in a temporary directory but denied in the project root.
- Human-in-the-Loop: Requiring approval for sensitive operations balances automation with safety. Developers retain control over high-risk actions while allowing the agent to proceed autonomously on low-risk tasks.
Pitfall Guide
Implementing a runtime safety layer introduces new complexities. The following pitfalls and best practices are derived from production experience with agent governance systems.
-
Overly Restrictive Policies
- Explanation: Policies that deny too many actions can paralyze the agent, forcing developers to disable safety controls entirely.
- Fix: Start with a permissive baseline and gradually tighten rules. Use
review actions for borderline cases rather than immediate denial. Monitor false positive rates and adjust rules accordingly.
-
Ignoring Context in Rule Evaluation
- Explanation: Rules that only check tool names without inspecting arguments may block safe operations or allow dangerous ones. For example, denying all
execute_command calls prevents legitimate build scripts.
- Fix: Implement argument-aware filtering. Use regex or AST parsing to analyze command strings and file paths. Allow operations in safe contexts (e.g., temporary directories) while restricting sensitive paths.
-
Performance Degradation
- Explanation: Complex policy evaluation or synchronous human approval loops can introduce latency, degrading the user experience.
- Fix: Optimize rule matching with efficient data structures. Ensure policy evaluation completes in sub-millisecond time. For human approval, use asynchronous notifications that do not block the main execution thread.
-
Audit Log Bloat
- Explanation: Recording every action can quickly consume storage, especially in high-throughput environments.
- Fix: Implement log rotation and retention policies. Consider sampling or filtering logs based on severity. Store logs in a compressed, append-only format to minimize overhead.
-
False Sense of Security
- Explanation: Relying solely on a runtime proxy may lead to neglecting other security measures, such as input validation and model alignment.
- Fix: Adopt a defense-in-depth strategy. Combine runtime enforcement with upstream alignment, input sanitization, and least-privilege execution environments. Regularly test the proxy against adversarial prompts.
-
Policy Drift
- Explanation: Over time, policies may become outdated or inconsistent, leading to gaps in enforcement.
- Fix: Version control all policy files. Conduct periodic reviews to ensure rules align with current security requirements. Use automated testing to validate policy changes before deployment.
-
Bypass via Indirect Calls
- Explanation: Agents may attempt to bypass policies by encoding commands or using indirect tool calls.
- Fix: Implement deep inspection of arguments. Decode and normalize inputs before evaluation. Monitor for anomalous patterns that may indicate evasion attempts.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Solo Developer | Review mode for sensitive ops | Balances speed with safety; allows quick approval | Low |
| Team CI Pipeline | Strict deny for destructive actions | Prevents accidental damage to shared resources | Medium (setup overhead) |
| High-Security Prod | Multi-layer enforcement + audit | Defense in depth; ensures compliance and traceability | High |
| Experimental Agent | Permissive policy with logging | Allows exploration while capturing data for analysis | Low |
Configuration Template
The following YAML template provides a starting point for defining policies. Customize rules based on your threat model and operational requirements.
policy:
version: "1.0"
rules:
- id: "deny-destructive-shell"
tool_pattern: "^execute_command$"
argument_filter:
command: "rm\\s+-rf|sudo\\s+rm|chmod\\s+777"
action: "deny"
description: "Block destructive shell commands"
- id: "review-sensitive-files"
tool_pattern: "^write_file$"
argument_filter:
path: "/etc/|\\.env$"
action: "review"
description: "Require approval for sensitive file writes"
- id: "allow-read-ops"
tool_pattern: "^(read_file|list_directory)$"
action: "allow"
description: "Permit read-only operations"
- id: "deny-network-exfil"
tool_pattern: "^curl$"
argument_filter:
url: "https?://(?!localhost|127\\.0\\.0\\.1).*"
action: "review"
description: "Review external network requests"
audit:
enabled: true
retention_days: 90
format: "json"
Quick Start Guide
- Install the Proxy: Deploy the policy-enforcing MCP proxy using your preferred package manager. Ensure it is accessible to your agent client.
- Create Policy File: Save the configuration template as
safety-policy.yaml and customize rules for your environment.
- Configure Agent Client: Point your agent client (e.g., Claude Desktop, Cursor) to the proxy endpoint. Update connection settings to route tool calls through the proxy.
- Verify Enforcement: Run a test action that triggers a policy rule. Confirm that the proxy intercepts the action, evaluates the policy, and enforces the correct decision.
- Monitor Logs: Check the audit log to verify that actions and decisions are recorded correctly. Adjust policies as needed based on test results.