Intercepting LLM Traffic at the Network Layer: A Plugin-Free Proxy Architecture

Current Situation Analysis

The modern AI-assisted development workflow is fragmented by design. Every major editor—Cursor, VS Code, JetBrains IDEs, Zed, and terminal-based CLIs like Claude Code—implements its own extension model, update cadence, and internal API surface. When you need to intercept, transform, or audit every prompt and response flowing between a developer and an LLM, the conventional approach is to write an IDE plugin.

This path quickly becomes unsustainable. Plugin APIs drift with every editor release. Packaging, signing, and distribution add operational overhead. More critically, plugins operate inside the UI thread or extension host, forcing you to reverse-engineer undocumented hooks just to access raw HTTP payloads. You end up rebuilding the same wire-protocol logic N times, once per editor, while chasing breaking changes.

The problem is overlooked because developers assume the integration boundary must be the editor itself. In reality, the integration boundary is the network. Nearly every modern AI coding client respects a base URL environment variable: ANTHROPIC_BASE_URL for Anthropic's ecosystem, OPENAI_BASE_URL for OpenAI-compatible providers, and similar overrides for Google and Mistral. By pointing these variables to a localhost endpoint, you collapse the integration problem from "maintain N plugins" to "run a single reverse proxy." The editor never knows the traffic was intercepted. The contract is strictly HTTP, making the approach universally compatible across terminals, GUIs, and headless CI runners.

WOW Moment: Key Findings

Shifting from UI-layer plugins to network-layer proxies fundamentally changes the maintenance curve and compatibility matrix. The following comparison illustrates the operational trade-offs:

Integration Strategy	Maintenance Overhead	IDE Compatibility	Latency Impact	Implementation Complexity
IDE Plugin/Extension	High (per-IDE API drift, release cycles)	Limited to supported editors	Minimal (<1ms)	High (UI hooks, packaging, sandbox restrictions)
Network Proxy (Env Var)	Low (single codebase, framework-agnostic)	Universal (any HTTP/SSE client)	<5ms overhead	Medium (streaming, protocol normalization, header management)

This finding matters because it decouples business logic (obfuscation, prompt auditing, guardrails, cost tracking) from UI churn. You write the transformation logic once. Every client that respects the base URL override works immediately. The proxy becomes a transparent middleware layer that survives editor upgrades, OS changes, and workflow migrations.

Core Solution

The architecture replaces IDE-specific hooks with a lightweight localhost HTTP server that intercepts, transforms, and forwards traffic. The implementation uses Node.js with TypeScript, leveraging native streams for zero-copy forwarding and explicit header normalization to prevent protocol leakage.

Step 1: Server Initialization & Route Binding

Bind to 127.0.0.1 exclusively. Exposing to 0.0.0.0 creates a local network attack surface. Use a dedicated port (e.g., 8077) and validate that it's available before binding.

import http from 'node:http';
import { createServer } from 'node:http';
import { Transform } from 'node:stream';
import { parseEnv } from './config';

const config = parseEnv();
const server = createServer();

server.on('request', async (req, res) => {
  if (req.method !== 'POST' || !req.url?.includes('/v1/messages')) {
    res.writeHead(404);
    return res.end();
  }

  await handleIntercept(req, res);
});

server.listen(config.proxyPort, '127.0.0.1', () => {
  console.log(`Interceptor listening on http://127.0.0.1:${config.proxyPort}`);
});

Step 2: Request Interception & Payload Transformation

Read the incoming request body, parse the JSON, apply domain-specific transformations, and forward to the upstream API. The transformation layer must distinguish between user-facing text and internal tool payloads.

async function handleIntercept(req: http.IncomingMessage, res: http.ServerResponse) {
  const chunks: Buffer[] = [];
  for await (const chunk of req) chunks.push(chunk);
  const rawBody = Buffer.concat(chunks).toString('utf-8');

  let payload: any;
  try {
    payload = JSON.parse(rawBody);
  } catch {
    res.writeHead(400);
    return res.end('Invalid JSON');
  }

  // Apply transformation rules
  const transformed = applyTransformations(payload);

  // Forward to upstream
  const upstreamRes = await fetch(`${config.upstreamUrl}${req.url}`, {
    method: 'POST',
    headers: buildForwardHeaders(req),
    body: JSON.stringify(transformed),
  });

  await streamUpstreamResponse(upstreamRes, res);
}

Step 3: Selective Block Translation

LLM APIs structure content as typed arrays. User prompts and AI replies require translation. Tool results and tool calls originate from the workspace or execution environment and must remain untouched to prevent context corruption.

function applyTransformations(payload: any): any {
  if (!payload.messages) return payload;

  payload.messages = payload.messages.map((msg: any) => {
    if (!Array.isArray(msg.content)) return msg;

    msg.content = msg.content.map((block: any) => {
      // Only translate explicit text blocks
      if (block.type === 'text' && typeof block.text === 'string') {
        block.text = translateIdentifiers(block.text);
      }
      // tool_use, tool_result, image, etc. remain untouched
      return block;
    });

    return msg;
  });

  return payload;
}

function translateIdentifiers(text: string): string {
  // Replace with your mapping logic (e.g., regex, trie, or lookup table)
  return text.replace(/\b[A-Z][a-zA-Z0-9_]{3,}\b/g, (match) => {
    return config.mappingTable[match] || match;
  });
}

Step 4: Streaming Response Forwarding

LLM clients expect Server-Sent Events (SSE). Buffering the entire response destroys the interactive experience. Forward chunks as they arrive, but handle token fragmentation carefully.

async function streamUpstreamResponse(upstreamRes: Response, clientRes: http.ServerResponse) {
  clientRes.writeHead(upstreamRes.status, normalizeHeaders(upstreamRes.headers));

  if (!upstreamRes.body) return clientRes.end();

  const decoder = new TextDecoder();
  const encoder = new TextEncoder();
  let carryOver = '';

  const reader = upstreamRes.body.getReader();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value, { stream: true });
    const combined = carryOver + chunk;
    
    // Process SSE lines
    const lines = combined.split('\n');
    carryOver = lines.pop() || ''; // Keep incomplete line for next chunk

    for (const line of lines) {
      const processed = processSSELine(line);
      clientRes.write(encoder.encode(processed + '\n'));
    }
  }

  // Flush remaining carry-over
  if (carryOver) {
    clientRes.write(encoder.encode(processSSELine(carryOver) + '\n'));
  }
  
  clientRes.end();
}

Architecture Rationale

Native Streams Over Third-Party Proxies: Libraries like http-proxy abstract away chunk boundaries, making SSE transformation impossible. Native ReadableStream + TextDecoder gives explicit control over line boundaries and carry-over buffers.
Explicit Header Normalization: Forwarding all headers verbatim leaks HTTP/2 pseudo-headers and compression directives. Whitelisting safe headers prevents protocol mismatches.
Typed Block Filtering: LLM APIs are not flat strings. Translating tool payloads corrupts workspace state. Filtering by type preserves execution integrity.
Localhost Binding: Security-first design. The proxy never leaves the loopback interface, eliminating lateral movement risks.

Pitfall Guide

1. SSE Token Fragmentation

Explanation: LLM servers tokenize responses and split them across network chunks. A single identifier like InvoiceService might arrive as Invoice in one chunk and Service in the next. Naive line-by-line replacement fails to match the full token, leaving partial obfuscations. Fix: Maintain a carryOver buffer that holds the trailing incomplete line. Concatenate it with the next chunk before splitting on \n. Only process complete lines, and flush the buffer on stream end.

2. Silent Compression Failure

Explanation: Forwarding Accept-Encoding: gzip, br causes the upstream API to return compressed payloads. Text-based interceptors parse binary garbage, find zero matches, and forward the compressed bytes unchanged. The client decompresses successfully, but transformations are silently skipped. Fix: Strip Accept-Encoding from forwarded requests. Force uncompressed JSON responses so the interceptor can reliably parse and transform text.

3. Tool Block Contamination

Explanation: tool_result blocks contain file contents read from the workspace. If the workspace is already obfuscated, running these blocks through the translator attempts to double-encode or corrupt internal references. Comments or strings containing real identifiers get mangled, breaking round-trip consistency. Fix: Inspect the type field of each content block. Apply transformations exclusively to text blocks. Leave tool_use, tool_result, image, and document blocks untouched.

4. HTTP/2 Pseudo-Header Leakage

Explanation: Modern APIs use HTTP/2, which introduces pseudo-headers like :status, :method, and :path. These are valid at the protocol layer but illegal in HTTP/1.1 responses. Forwarding them causes strict clients (like Claude Code) to reject the response with protocol errors. Fix: Filter headers during forwarding. Drop any header starting with :. Normalize casing to lowercase to prevent duplicate header issues.

5. Background Process Orchestration

Explanation: Running the proxy in the foreground blocks the terminal. Developers expect to open a new shell and immediately run claude or cursor. Foreground processes also die when the terminal session closes, breaking long-running workflows. Fix: Implement a --detach flag that spawns a child process, redirects stdout/stderr to a log file, writes the PID to a known location, and exits. Provide a --stop command that reads the PID and terminates the process gracefully.

6. Unbounded Buffer Growth

Explanation: Accumulating request bodies or response chunks without size limits causes memory exhaustion during large file reads or extended conversations. Fix: Enforce a maximum payload size (e.g., 10MB). Reject requests exceeding the limit with 413 Payload Too Large. Use streaming parsers for JSON when possible, or chunk the body with a bounded accumulator.

7. Race Conditions in Header Forwarding

Explanation: HTTP headers are case-insensitive but Node.js treats them as case-sensitive strings. Forwarding Content-Type and content-type simultaneously creates duplicate headers, causing upstream API validation failures. Fix: Normalize all headers to lowercase before forwarding. Use a Map or object to deduplicate, keeping the last value or merging arrays as appropriate.

Production Bundle

Action Checklist

Bind proxy to 127.0.0.1 only; never expose to 0.0.0.0
Strip Accept-Encoding from all forwarded requests
Filter HTTP/2 pseudo-headers (:status, :method, :path) before response forwarding
Implement carry-over buffering for SSE line processing
Restrict transformations to type: "text" blocks; ignore tool payloads
Enforce payload size limits to prevent memory exhaustion
Normalize all headers to lowercase and deduplicate before forwarding
Implement daemon mode with PID tracking and log rotation

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Single IDE, frequent updates	Network Proxy	Survives editor upgrades; zero plugin maintenance	Low (dev time)
Multi-IDE team, unified policy	Network Proxy	Single enforcement point across Cursor, VS Code, JetBrains	Low (infra)
Deep UI integration required (e.g., inline diffs)	IDE Plugin	Proxy cannot manipulate editor DOM or selection state	High (maintenance)
Enterprise compliance/auditing	API Gateway + Proxy	Proxy handles transformation; gateway handles auth/logging	Medium (infra)
Offline/air-gapped environment	Local Proxy	No external dependencies; full control over traffic flow	Low (dev time)

Configuration Template

// proxy.config.ts
import dotenv from 'dotenv';
dotenv.config();

export interface ProxyConfig {
  proxyPort: number;
  upstreamUrl: string;
  mappingTable: Record<string, string>;
  maxPayloadBytes: number;
  logLevel: 'info' | 'debug' | 'error';
}

export function parseEnv(): ProxyConfig {
  const upstream = process.env.UPSTREAM_API_URL || 'https://api.anthropic.com';
  const port = parseInt(process.env.PROXY_PORT || '8077', 10);
  
  // Load mapping from JSON file or environment
  const mappingRaw = process.env.IDENTIFIER_MAP || '{}';
  const mappingTable = JSON.parse(mappingRaw);

  return {
    proxyPort: port,
    upstreamUrl: upstream,
    mappingTable,
    maxPayloadBytes: 10 * 1024 * 1024, // 10MB
    logLevel: (process.env.LOG_LEVEL as any) || 'info',
  };
}

Quick Start Guide

Initialize the project: npm init -y && npm install typescript @types/node tsx
Create the server file: Save the core solution code as server.ts
Configure environment: Create .env with UPSTREAM_API_URL=https://api.anthropic.com and PROXY_PORT=8077
Launch the proxy: tsx server.ts
Point your AI client: Run export ANTHROPIC_BASE_URL=http://127.0.0.1:8077 (or OPENAI_BASE_URL for compatible providers), then launch your IDE or CLI. Traffic flows through the proxy transparently.

Building a transparent terminal-based proxy for Claude Code in Cursor (or any IDE)