Building a transparent terminal-based proxy for Claude Code in Cursor (or any IDE)
Intercepting LLM Traffic at the Network Layer: A Plugin-Free Proxy Architecture
Current Situation Analysis
The modern AI-assisted development workflow is fragmented by design. Every major editorāCursor, VS Code, JetBrains IDEs, Zed, and terminal-based CLIs like Claude Codeāimplements its own extension model, update cadence, and internal API surface. When you need to intercept, transform, or audit every prompt and response flowing between a developer and an LLM, the conventional approach is to write an IDE plugin.
This path quickly becomes unsustainable. Plugin APIs drift with every editor release. Packaging, signing, and distribution add operational overhead. More critically, plugins operate inside the UI thread or extension host, forcing you to reverse-engineer undocumented hooks just to access raw HTTP payloads. You end up rebuilding the same wire-protocol logic N times, once per editor, while chasing breaking changes.
The problem is overlooked because developers assume the integration boundary must be the editor itself. In reality, the integration boundary is the network. Nearly every modern AI coding client respects a base URL environment variable: ANTHROPIC_BASE_URL for Anthropic's ecosystem, OPENAI_BASE_URL for OpenAI-compatible providers, and similar overrides for Google and Mistral. By pointing these variables to a localhost endpoint, you collapse the integration problem from "maintain N plugins" to "run a single reverse proxy." The editor never knows the traffic was intercepted. The contract is strictly HTTP, making the approach universally compatible across terminals, GUIs, and headless CI runners.
WOW Moment: Key Findings
Shifting from UI-layer plugins to network-layer proxies fundamentally changes the maintenance curve and compatibility matrix. The following comparison illustrates the operational trade-offs:
| Integration Strategy | Maintenance Overhead | IDE Compatibility | Latency Impact | Implementation Complexity |
|---|---|---|---|---|
| IDE Plugin/Extension | High (per-IDE API drift, release cycles) | Limited to supported editors | Minimal (<1ms) | High (UI hooks, packaging, sandbox restrictions) |
| Network Proxy (Env Var) | Low (single codebase, framework-agnostic) | Universal (any HTTP/SSE client) | <5ms overhead | Medium (streaming, protocol normalization, header management) |
This finding matters because it decouples business logic (obfuscation, prompt auditing, guardrails, cost tracking) from UI churn. You write the transformation logic once. Every client that respects the base URL override works immediately. The proxy becomes a transparent middleware layer that survives editor upgrades, OS changes, and workflow migrations.
Core Solution
The architecture replaces IDE-specific hooks with a lightweight localhost HTTP server that intercepts, transforms, and forwards traffic. The implementation uses Node.js with TypeScript, leveraging native streams for zero-copy forwarding and explicit header normalization to prevent protocol leakage.
Step 1: Server Initialization & Route Binding
Bind to 127.0.0.1 exclusively. Exposing to 0.0.0.0 creates a local network attack surface. Use a dedicated port (e.g., 8077) and validate that it's available before binding.
import http from 'node:http';
import { createServer } from 'node:http';
import { Transform } from 'node:stream';
import { parseEnv } from './config';
const config = parseEnv();
const server = createServer();
server.on('request', async (req, res) => {
if (req.method !== 'POST' || !req.url?.includes('/v1/messages')) {
res.writeHead(404);
return res.end();
}
await handleIntercept(req, res);
});
server.listen(config.proxyPort, '127.0.0.1', () => {
console.log(`Interceptor listening on http://127.0.0.1:${config.proxyPort}`);
});
Step 2: Request Interception & Payload Transformation
Read the incoming request body, parse the JSON, apply domain-specific transformations, and forward to the upstream API. The transformation layer must distinguish between user-facing text and internal tool payloads.
async function handleIntercept(req: http.IncomingMessage, res: http.ServerResponse) {
const chunks: Buffer[] = [];
for await (const chunk of req) chunks.push(chunk);
const rawBody = Buffer.concat(chunks).toString('utf-8');
let payload: any;
try {
payload = JSON.parse(rawBody);
} catch {
res.writeHead(400);
return res.end('Invalid JSON');
}
// Apply transformation rules
const transformed = applyTransformations(payload);
// Forward to upstream
const upstreamRes = await fetch(`${config.upstreamUrl}${req.url}`, {
method: 'POST',
headers: buildForwardHeaders(req),
body: JSON.stringify(transformed),
});
await streamUpstreamResponse(upstreamRes, res);
}
Step 3: Selective Block Translation
LLM APIs structure content as typed arrays. User prompts and AI replies require translation. Tool results and tool calls originate from the workspace or execution environment and must remain untouched to prevent context corruption.
function applyTransformations(payload: any): any {
if (!payload.messages) return payload;
payload.messages = payload.messages.map((msg: any) => {
if (!Array.isArray(msg.content)) return msg;
msg.content = msg.content.map((block: any) => {
// Only translate explicit text blocks
if (block.type === 'text' && typeof block.text === 'string') {
block.text = translateIdentifiers(block.text);
}
// tool_use, tool_result, image, etc. remain untouched
return block;
});
return msg;
});
return payload;
}
function translateIdentifiers(text: string): string {
// Replace with your mapping logic (e.g., regex, trie, or lookup table)
return text.replace(/\b[A-Z][a-zA-Z0-9_]{3,}\b/g, (match) => {
return config.mappingTable[match] || match;
});
}
Step 4: Streaming Response Forwarding
LLM clients expect Server-Sent Events (SSE). Buffering the entire response destroys the interactive experience. Forward chunks as they arrive, but handle token fragmentation carefully.
async function streamUpstreamResponse(upstreamRes: Response, clientRes: http.ServerResponse) {
clientRes.writeHead(upstreamRes.status, normalizeHeaders(upstreamRes.headers));
if (!upstreamRes.body) return clientRes.end();
const decoder = new TextDecoder();
const encoder = new TextEncoder();
let carryOver = '';
const reader = upstreamRes.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
const combined = carryOver + chunk;
// Process SSE lines
const lines = combined.split('\n');
carryOver = lines.pop() || ''; // Keep incomplete line for next chunk
for (const line of lines) {
const processed = processSSELine(line);
clientRes.write(encoder.encode(processed + '\n'));
}
}
// Flush remaining carry-over
if (carryOver) {
clientRes.write(encoder.encode(processSSELine(carryOver) + '\n'));
}
clientRes.end();
}
Architecture Rationale
- Native Streams Over Third-Party Proxies: Libraries like
http-proxyabstract away chunk boundaries, making SSE transformation impossible. NativeReadableStream+TextDecodergives explicit control over line boundaries and carry-over buffers. - Explicit Header Normalization: Forwarding all headers verbatim leaks HTTP/2 pseudo-headers and compression directives. Whitelisting safe headers prevents protocol mismatches.
- Typed Block Filtering: LLM APIs are not flat strings. Translating tool payloads corrupts workspace state. Filtering by
typepreserves execution integrity. - Localhost Binding: Security-first design. The proxy never leaves the loopback interface, eliminating lateral movement risks.
Pitfall Guide
1. SSE Token Fragmentation
Explanation: LLM servers tokenize responses and split them across network chunks. A single identifier like InvoiceService might arrive as Invoice in one chunk and Service in the next. Naive line-by-line replacement fails to match the full token, leaving partial obfuscations.
Fix: Maintain a carryOver buffer that holds the trailing incomplete line. Concatenate it with the next chunk before splitting on \n. Only process complete lines, and flush the buffer on stream end.
2. Silent Compression Failure
Explanation: Forwarding Accept-Encoding: gzip, br causes the upstream API to return compressed payloads. Text-based interceptors parse binary garbage, find zero matches, and forward the compressed bytes unchanged. The client decompresses successfully, but transformations are silently skipped.
Fix: Strip Accept-Encoding from forwarded requests. Force uncompressed JSON responses so the interceptor can reliably parse and transform text.
3. Tool Block Contamination
Explanation: tool_result blocks contain file contents read from the workspace. If the workspace is already obfuscated, running these blocks through the translator attempts to double-encode or corrupt internal references. Comments or strings containing real identifiers get mangled, breaking round-trip consistency.
Fix: Inspect the type field of each content block. Apply transformations exclusively to text blocks. Leave tool_use, tool_result, image, and document blocks untouched.
4. HTTP/2 Pseudo-Header Leakage
Explanation: Modern APIs use HTTP/2, which introduces pseudo-headers like :status, :method, and :path. These are valid at the protocol layer but illegal in HTTP/1.1 responses. Forwarding them causes strict clients (like Claude Code) to reject the response with protocol errors.
Fix: Filter headers during forwarding. Drop any header starting with :. Normalize casing to lowercase to prevent duplicate header issues.
5. Background Process Orchestration
Explanation: Running the proxy in the foreground blocks the terminal. Developers expect to open a new shell and immediately run claude or cursor. Foreground processes also die when the terminal session closes, breaking long-running workflows.
Fix: Implement a --detach flag that spawns a child process, redirects stdout/stderr to a log file, writes the PID to a known location, and exits. Provide a --stop command that reads the PID and terminates the process gracefully.
6. Unbounded Buffer Growth
Explanation: Accumulating request bodies or response chunks without size limits causes memory exhaustion during large file reads or extended conversations.
Fix: Enforce a maximum payload size (e.g., 10MB). Reject requests exceeding the limit with 413 Payload Too Large. Use streaming parsers for JSON when possible, or chunk the body with a bounded accumulator.
7. Race Conditions in Header Forwarding
Explanation: HTTP headers are case-insensitive but Node.js treats them as case-sensitive strings. Forwarding Content-Type and content-type simultaneously creates duplicate headers, causing upstream API validation failures.
Fix: Normalize all headers to lowercase before forwarding. Use a Map or object to deduplicate, keeping the last value or merging arrays as appropriate.
Production Bundle
Action Checklist
- Bind proxy to
127.0.0.1only; never expose to0.0.0.0 - Strip
Accept-Encodingfrom all forwarded requests - Filter HTTP/2 pseudo-headers (
:status,:method,:path) before response forwarding - Implement carry-over buffering for SSE line processing
- Restrict transformations to
type: "text"blocks; ignore tool payloads - Enforce payload size limits to prevent memory exhaustion
- Normalize all headers to lowercase and deduplicate before forwarding
- Implement daemon mode with PID tracking and log rotation
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Single IDE, frequent updates | Network Proxy | Survives editor upgrades; zero plugin maintenance | Low (dev time) |
| Multi-IDE team, unified policy | Network Proxy | Single enforcement point across Cursor, VS Code, JetBrains | Low (infra) |
| Deep UI integration required (e.g., inline diffs) | IDE Plugin | Proxy cannot manipulate editor DOM or selection state | High (maintenance) |
| Enterprise compliance/auditing | API Gateway + Proxy | Proxy handles transformation; gateway handles auth/logging | Medium (infra) |
| Offline/air-gapped environment | Local Proxy | No external dependencies; full control over traffic flow | Low (dev time) |
Configuration Template
// proxy.config.ts
import dotenv from 'dotenv';
dotenv.config();
export interface ProxyConfig {
proxyPort: number;
upstreamUrl: string;
mappingTable: Record<string, string>;
maxPayloadBytes: number;
logLevel: 'info' | 'debug' | 'error';
}
export function parseEnv(): ProxyConfig {
const upstream = process.env.UPSTREAM_API_URL || 'https://api.anthropic.com';
const port = parseInt(process.env.PROXY_PORT || '8077', 10);
// Load mapping from JSON file or environment
const mappingRaw = process.env.IDENTIFIER_MAP || '{}';
const mappingTable = JSON.parse(mappingRaw);
return {
proxyPort: port,
upstreamUrl: upstream,
mappingTable,
maxPayloadBytes: 10 * 1024 * 1024, // 10MB
logLevel: (process.env.LOG_LEVEL as any) || 'info',
};
}
Quick Start Guide
- Initialize the project:
npm init -y && npm install typescript @types/node tsx - Create the server file: Save the core solution code as
server.ts - Configure environment: Create
.envwithUPSTREAM_API_URL=https://api.anthropic.comandPROXY_PORT=8077 - Launch the proxy:
tsx server.ts - Point your AI client: Run
export ANTHROPIC_BASE_URL=http://127.0.0.1:8077(orOPENAI_BASE_URLfor compatible providers), then launch your IDE or CLI. Traffic flows through the proxy transparently.
Mid-Year Sale ā Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register ā Start Free Trial7-day free trial Ā· Cancel anytime Ā· 30-day money-back
