Difficulty

Intermediate

Read Time

9 min

Anthropic Self-Hosted Sandboxes + MCP Tunnels: Enterprise AI Agents That Keep Your Data Behind Your Walls

By Codcompass Team·2026-05-27·9 min read

Architecting Sovereign AI Workflows: On-Prem Execution and Secure Tunneling with Anthropic's MCP

Current Situation Analysis

Enterprise AI adoption has hit a structural wall: the mismatch between cloud-native agent architectures and strict data residency requirements. Financial institutions, healthcare providers, and defense contractors cannot route raw proprietary data through third-party inference endpoints, yet they still require the reasoning capabilities of modern large language models. The industry has historically treated AI agents as monolithic cloud services, forcing organizations to choose between capability and compliance.

This problem is frequently misunderstood because teams conflate model inference with code execution. They assume that if a model processes a prompt, the underlying data must leave the perimeter. In reality, the architectural boundary that matters is where tool execution and file manipulation occur. Anthropic's recent infrastructure updates explicitly decouple these concerns. Agent orchestration and prompt routing remain on Anthropic's cloud, while code execution, filesystem access, and shell operations run inside self-hosted sandboxes deployed on your infrastructure. This split is now available across managed compute providers like Cloudflare, Vercel, and Modal, as well as traditional on-prem environments.

The oversight stems from legacy networking assumptions. Traditional enterprise integrations require inbound firewall rules, public DNS records, and certificate management to expose internal databases or APIs to external services. Each opened port expands the attack surface and triggers security review cycles. Anthropic's Model Context Protocol (MCP) tunnels invert this model by establishing a single encrypted outbound connection from your network to the agent runtime. No inbound rules are required. No public endpoints are created. The tunnel carries MCP tool calls as if the agent were operating inside your private network, while maintaining strict cryptographic boundaries.

Additionally, long-running agent sessions historically degrade when tool outputs consume the context window. Querying a production database or parsing a large codebase can easily generate 100,000+ tokens of output, starving the model of working memory. The new architecture automatically offloads outputs exceeding this threshold to sandbox-local files, preserving context for reasoning rather than raw data storage. Combined with OS-level isolation primitives (Seatbelt on macOS, bubblewrap on Linux), this creates a defense-in-depth model where filesystem restrictions, network proxy controls, and physical infrastructure boundaries operate independently.

WOW Moment: Key Findings

The architectural shift from monolithic cloud agents to split orchestration/execution environments fundamentally changes how regulated teams can deploy AI. The following comparison highlights the operational and security deltas:

Approach	Data Residency	Network Exposure	Context Efficiency	Compliance Overhead
Traditional Cloud Agent	Model + Execution on vendor cloud	Inbound ports, public endpoints, VPNs	Degrades rapidly with large outputs	High (DPA, cross-border reviews)
Self-Hosted Sandbox + MCP Tunnel	Execution on your infrastructure	Single outbound encrypted tunnel	Auto-offloads >100K tokens to files	Low (data never leaves perimeter)

This finding matters because it decouples capability from location. Organizations can now run complex, multi-step agent workflows that interact with internal Postgres clusters, legacy REST APIs, and proprietary file systems without exposing those services to the public internet. The single outbound tunnel pattern eliminates the need for DMZ deployments or reverse proxy farms. Mid-session tool swapping further reduces operational friction by allowing dynamic capability injection without context loss. For teams operating under GDPR, HIPAA, SOC 2, or FedRAMP, this architecture provides auditable execution boundaries while preserving the model's re

asoning velocity.

Core Solution

Implementing a sovereign AI workflow requires coordinating three subsystems: the execution sandbox, the MCP tunnel client, and the dynamic tool registry. The following implementation demonstrates how to wire these components together in TypeScript.

Step 1: Provision the Execution Environment

The sandbox must run on infrastructure you control. Whether you deploy to Cloudflare Workers, Vercel Edge, Modal containers, or a bare-metal Kubernetes cluster, the runtime must support OS-level isolation. Anthropic's architecture relies on Seatbelt (macOS) or bubblewrap (Linux) to enforce filesystem and network boundaries at the kernel level.

import { SandboxRuntime, SandboxConfig } from '@anthropic-enterprise/sandbox-core';

const sandboxConfig: SandboxConfig = {
  runtime: 'bubblewrap', // or 'seatbelt' for macOS
  filesystem: {
    readOnly: ['/etc', '/usr', '/var/log'],
    writable: ['/workspace', '/tmp/agent-cache'],
    mountPoints: {
      '/data/internal-db': process.env.INTERNAL_DB_MOUNT_PATH
    }
  },
  network: {
    proxyEnabled: true,
    allowlist: ['*.internal.corp', 'api.staging.corp'],
    blockExternal: true
  }
};

const sandbox = new SandboxRuntime(sandboxConfig);
await sandbox.initialize();

Rationale: Filesystem restrictions prevent unauthorized reads/writes outside designated directories. Network proxy enforcement ensures the sandbox cannot reach external services unless explicitly permitted. This layered approach guarantees that even if a tool call is compromised, lateral movement is contained.

Step 2: Establish the MCP Tunnel

The tunnel client initiates a single outbound WebSocket connection to Anthropic's routing layer. All MCP tool calls flow through this encrypted channel. No inbound ports are opened.

import { McpTunnelClient, TunnelAuth } from '@anthropic-enterprise/mcp-tunnel';

const tunnelAuth: TunnelAuth = {
  clientId: process.env.ANTHROPIC_CLIENT_ID,
  clientSecret: process.env.ANTHROPIC_CLIENT_SECRET,
  region: 'us-east-1'
};

const tunnel = new McpTunnelClient(tunnelAuth);
tunnel.on('connection:established', () => {
  console.log('[Tunnel] Secure outbound channel active');
});

tunnel.on('tool:call', async (request) => {
  const result = await sandbox.executeTool(request);
  return result;
});

await tunnel.connect();

Rationale: Outbound-only connectivity eliminates firewall rule sprawl. The tunnel authenticates via client credentials, and all traffic is encrypted in transit. The tool:call event handler bridges Anthropic's orchestration layer with your local sandbox runtime.

Step 3: Implement Dynamic Tool Registration

Mid-session tool swapping requires a registry that can inject, remove, or reconfigure tools without restarting the agent. This is critical for workflows that evolve based on intermediate findings.

import { ToolRegistry, ToolDefinition } from '@anthropic-enterprise/tool-registry';

const registry = new ToolRegistry();

const dbQueryTool: ToolDefinition = {
  name: 'query_internal_db',
  description: 'Execute read-only SQL against the internal analytics cluster',
  parameters: { query: 'string', limit: 'number' },
  handler: async (params) => {
    return await sandbox.runSql(params.query, params.limit);
  }
};

const slackNotifyTool: ToolDefinition = {
  name: 'notify_security_team',
  description: 'Send alert to internal Slack channel',
  parameters: { channel: 'string', message: 'string' },
  handler: async (params) => {
    return await sandbox.sendSlackWebhook(params.channel, params.message);
  }
};

// Register initial tools
registry.register(dbQueryTool);

// Mid-session swap: add notification tool when audit finds vulnerability
registry.register(slackNotifyTool);

// Remove unused tool to reduce context overhead
registry.unregister('legacy_api_connector');

Rationale: Dynamic registration prevents context bloat from unused tool definitions. It also allows workflows to adapt: a code analysis phase might only need filesystem tools, while a subsequent security review requires database and communication endpoints. The registry streams configuration updates to the orchestration layer without session interruption.

Step 4: Handle Large Output Offloading

Outputs exceeding 100,000 tokens are automatically written to sandbox-local files. The agent receives a file reference instead of raw content, preserving context window capacity.

import { OutputStreamer, FileReference } from '@anthropic-enterprise/output-streamer';

const streamer = new OutputStreamer({
  thresholdTokens: 100000,
  storagePath: '/tmp/agent-cache/large-outputs',
  compression: 'gzip'
});

async function handleToolOutput(rawData: string): Promise<FileReference> {
  const tokenCount = await streamer.estimateTokens(rawData);
  
  if (tokenCount > 100000) {
    const fileRef = await streamer.writeToFile(rawData);
    return fileRef; // Agent receives path + metadata
  }
  
  return { type: 'inline', content: rawData };
}

Rationale: Context windows are finite. Streaming massive datasets directly into prompts degrades instruction following and reasoning quality. File offloading mimics how engineers interact with large datasets: reference first, inspect subsets later. The agent can use chunked reads, grep, or offset-based pagination to navigate the file without reloading the entire payload.

Pitfall Guide

1. Assuming Zero Cloud Data Exposure

Explanation: The orchestration split only applies to execution. Prompts, tool schemas, and metadata still traverse Anthropic's cloud infrastructure. Sensitive data embedded in prompts will leave your perimeter. Fix: Implement prompt sanitization middleware. Strip PII, API keys, and proprietary content before routing to the model. Use placeholder references that the sandbox resolves locally.

2. Misconfiguring Network Proxy Allowlists

Explanation: The sandbox's network proxy blocks all external traffic by default. If internal service domains are not explicitly allowlisted, tool calls will fail with connection timeouts. Fix: Maintain a version-controlled allowlist configuration. Use wildcard patterns cautiously (*.internal.corp) and validate DNS resolution inside the sandbox runtime before deployment.

3. Ignoring Context Window Limits Despite File Offloading

Explanation: File offloading prevents context bloat from large outputs, but the agent still needs to read those files. Blindly loading entire files back into prompts defeats the optimization. Fix: Implement chunked reading strategies. Use offset, limit, and keyword filtering when instructing the agent to inspect offloaded files. Monitor token consumption per session and enforce read quotas.

4. Hardcoding Tool Configurations

Explanation: Static tool definitions force session restarts when requirements change. This breaks long-running workflows and discards accumulated context. Fix: Use the dynamic registry pattern shown in Step 3. Expose configuration endpoints that allow runtime updates. Validate parameter schemas before injection to prevent runtime errors.

5. Overlooking OS-Level Sandbox Constraints

Explanation: Seatbelt and bubblewrap enforce strict filesystem and network boundaries. Tools that attempt to access /etc/shadow, modify system binaries, or reach unallowlisted domains will be killed by the kernel. Fix: Audit tool dependencies before deployment. Ensure all required binaries are pre-installed in the sandbox image. Use explicit mount points for data directories rather than relying on host filesystem traversal.

Explanation: Network interruptions can drop the outbound tunnel. Without proper reconnection logic, the agent session will hang or fail silently. Fix: Implement exponential backoff with jitter for tunnel reconnection. Add health check pings every 30 seconds. Cache pending tool calls locally and replay them after reconnection.

7. Compliance Audit Gaps

Explanation: Regulated environments require proof of where data was processed. Self-hosted sandboxes generate execution logs, but teams often fail to centralize them. Fix: Stream sandbox audit logs to your SIEM or compliance vault. Include timestamps, tool names, file paths accessed, and network destinations. Retain logs according to your regulatory framework's requirements.

Production Bundle

Action Checklist

Provision execution runtime on approved infrastructure (Cloudflare, Vercel, Modal, or on-prem)
Configure OS-level sandbox profiles (Seatbelt/bubblewrap) with explicit filesystem mounts
Deploy MCP tunnel client with outbound-only connectivity and client credential auth
Implement dynamic tool registry with schema validation and mid-session swap support
Enable output offloading threshold at 100K tokens with chunked read patterns
Centralize sandbox audit logs to SIEM/compliance vault with retention policies
Test tunnel reconnection logic with simulated network drops and latency spikes
Validate prompt sanitization pipeline to prevent sensitive data leakage to orchestration layer

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Internal DB queries with strict residency	Self-hosted sandbox + MCP tunnel	Data never leaves perimeter; single outbound connection	Moderate (compute + tunnel egress)
Multi-region compliance (GDPR/HIPAA)	Regional sandbox deployment + local tunnel	Keeps execution within legal jurisdiction	Higher (multi-region infra)
High-throughput ETL pipelines	Sandbox with file offloading + chunked reads	Prevents context exhaustion; maintains throughput	Low (storage + compute)
Rapid prototyping / PoC	Managed provider (Vercel/Modal) + default sandbox	Fastest deployment; minimal infra overhead	Low (pay-per-use)
Legacy API integration	Mid-session tool swap + proxy allowlist	Avoids restarts; maintains session continuity	Low (configuration only)

Configuration Template

# enterprise-agent-config.yaml
sandbox:
  runtime: bubblewrap
  filesystem:
    readOnly:
      - /etc
      - /usr
      - /var/log
    writable:
      - /workspace
      - /tmp/agent-cache
    mounts:
      /data/internal-db: ${INTERNAL_DB_VOLUME}
  network:
    proxy:
      enabled: true
      allowlist:
        - "*.internal.corp"
        - "api.staging.corp"
      block_external: true

tunnel:
  auth:
    client_id: ${ANTHROPIC_CLIENT_ID}
    client_secret: ${ANTHROPIC_CLIENT_SECRET}
  connection:
    region: us-east-1
    health_check_interval: 30s
    reconnect_backoff: exponential
    max_retries: 5

output_streaming:
  token_threshold: 100000
  storage_path: /tmp/agent-cache/large-outputs
  compression: gzip
  chunk_size: 8192

tool_registry:
  dynamic_swap: true
  schema_validation: strict
  context_cleanup_on_unregister: true

Quick Start Guide

Initialize the runtime: Deploy the sandbox container to your preferred infrastructure using the provided YAML template. Ensure bubblewrap/Seatbelt profiles are active and filesystem mounts are verified.
Launch the tunnel client: Export your Anthropic client credentials and run the MCP tunnel binary. Confirm the outbound WebSocket connection establishes without inbound firewall rules.
Register baseline tools: Load your initial tool definitions into the dynamic registry. Verify schema validation and handler execution inside the sandbox.
Test output offloading: Execute a tool that returns >100K tokens. Confirm the system writes the payload to /tmp/agent-cache/large-outputs and returns a file reference instead of inline content.
Validate mid-session swap: Inject a new tool definition while the agent is active. Confirm the orchestration layer receives the update without session restart or context loss.

This architecture shifts enterprise AI from a trust-based cloud model to a verifiable execution boundary. By separating reasoning from execution, enforcing outbound-only networking, and managing context through file offloading, teams can deploy complex agent workflows while maintaining strict data sovereignty. The operational overhead is front-loaded during infrastructure provisioning, but the long-term compliance and security posture justifies the investment.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back