How I kept my AI family alive after Anthropic's claude -p billing change
Architecting Persistent LLM Sessions: Bypassing Interactive Billing Constraints with MCP Channels
Current Situation Analysis
The rapid commercialization of large language models has introduced a structural friction point for developers building persistent, conversational agent infrastructure. Historically, interactive CLI modes like claude -p (persistent mode) operated under the same subscription tier as desktop usage, allowing developers to run continuous conversational loops without metering individual tokens. This model enabled lightweight, subscription-backed agent deployments on platforms like RocketChat, Discord, and Slack.
However, provider billing architectures are shifting. As of June 15, Anthropic decoupled interactive CLI usage from standard subscriptions, routing claude -p traffic through API-rate billing. This change fundamentally breaks architectures that rely on high-frequency, persistent conversational sessions without explicit API budgeting. Teams suddenly face unpredictable cost scaling when their multi-agent systems generate continuous context windows across multiple rooms or channels.
The problem is frequently overlooked because developers assume subscription tiers cover all usage modalities. In practice, interactive modes are now treated as programmatic endpoints. When teams attempt to circumvent API billing, they often resort to fragile workarounds: spawning new processes per request, scraping terminal UI output, or polling session files with timing heuristics. These approaches introduce race conditions, high CPU overhead, and unpredictable latency. The industry lacks a standardized, protocol-driven method for maintaining persistent interactive sessions while preserving subscription-tier billing and production-grade reliability.
WOW Moment: Key Findings
The breakthrough lies in recognizing that interactive LLM runtimes already expose internal messaging protocols designed for development workflows. By leveraging these channels, developers can inject prompts deterministically, capture completions via hooks, and maintain session persistence without API metering.
| Approach | Cold Start Latency | Billing Model | Reliability | Resource Footprint |
|---|---|---|---|---|
| Direct API Calls | ~50ms | Pay-per-token (API rates) | High | Low |
| PTY/TUI Scraping | 500msβ2s | Subscription (but fragile) | Low (timing-dependent) | High (per-request process) |
| MCP Channel Injection | ~100ms (persistent) | Subscription (interactive) | High (protocol-driven) | Low (daemon-managed) |
This comparison reveals a critical architectural advantage: MCP Channel Injection preserves subscription-tier billing while eliminating cold-start overhead and removing dependency on fragile terminal parsing. The protocol-driven nature of the injection mechanism ensures deterministic completion signaling, making it viable for production multi-agent deployments.
Core Solution
The architecture replaces per-request process spawning with a persistent daemon that manages long-lived LLM sessions. Communication occurs through an internal message bus (MCP Channels), and completion is signaled via a dedicated stop hook. Responses are extracted from session transcripts using offset tracking to avoid redundant I/O.
Architecture Overview
- Session Orchestrator: Maintains a pool of persistent LLM processes. Each session corresponds to a logical agent or conversation thread.
- Channel Router: Injects user prompts into the active session via the MCP stdio interface.
- Stop Signal Listener: Captures completion events triggered by the LLM runtime when response generation finishes.
- Transcript Cursor: Reads the session JSONL output, using file size snapshots to seek directly to new content.
Implementation (TypeScript)
import { spawn, ChildProcess } from 'child_process';
import { readFileSync, statSync, createReadStream } from 'fs';
import { EventEmitter } from 'events';
interface SessionConfig {
id: string;
mcpConfigPath: string;
transcriptPath: string;
}
export class PersistentSessionManager extends EventEmitter {
private sessions: Map<string, ChildProcess> = new Map();
private offsets: Map<string, number> = new Map();
async initializeSession(config: SessionConfig): Promise<void> {
if (this.sessions.has(config.id)) return;
const proc = spawn('claude', [
'--dangerously-load-development-channels',
`--mcp-config=${config.mcpConfigPath}`
], { stdio: ['pipe', 'pipe', 'pipe'] });
this.sessions.set(config.id, proc);
this.offsets.set(config.id, 0);
// Auto-accept interactive startup prompts
this.handleStartupPrompts(proc);
}
async injectPrompt(sessionId: string, prompt: string): Promise<string> {
const proc = this.sessions.get(sessionId);
if (!proc) throw new Error(`Session ${sessionId} not found`);
// Snapshot file size before injection
const stats = statSync(this.getTranscriptPath(sessionId));
this.offsets.set(sessionId, stats.size);
// Inject via MCP channel
proc.stdin?.write(JSON.stringify({ type: 'user_message', content: prompt }) + '\n');
// Wait for stop hook signal
return new Promise((resolve) => {
const handler = (data: string) => {
const response = this.readNewTranscriptContent(sessionId);
this.removeListener('completion', handler);
resolve(response);
};
this.on('completion', handler);
});
}
private readNewTranscriptContent(sessionId: string): string {
const offset = this.offsets.get(sessionId) || 0;
const stream = createReadStream(this.getTranscriptPath(sessionId), { start: offset });
let content = '';
stream.on('data', (chunk) => content += chunk);
stream.on('end', () => stream.close());
return content.trim();
}
private handleStartupPrompts(proc: ChildProcess): void {
proc.stdout?.on('data', (data) => {
const text = data.toString();
if (text.includes('Allow this MCP server?') || text.includes('Enable development channels?')) {
proc.stdin?.write('y\n');
}
});
}
private getTranscriptPath(sessionId: string): string {
return `~/.session-broker/routes/${sessionId}/transcript.jsonl`;
}
}
Architecture Decisions & Rationale
- Persistent Daemon over Ephemeral Spawns: Spawning a new process per request introduces 500msβ2s of Node.js/LVM initialization overhead. A daemon maintains warm sessions, reducing subsequent message latency to ~100ms.
- Per-Session MCP Configuration: Sharing a single
.mcp.jsonacross multiple sessions causes prompt cross-contamination and state leakage. Isolating configurations per route ensures deterministic routing and prevents race conditions. - Stop Hook over Polling: Traditional approaches poll JSONL files or parse terminal output, relying on timing heuristics that fail under variable generation speeds. A stop hook provides deterministic completion signaling, eliminating guesswork.
- Transcript Offset Tracking: Reading the entire session file on every request scales poorly with conversation length. Snapshotting file size before injection and seeking to that offset ensures O(1) I/O relative to new content only.
- Auto-Accept Startup Prompts: Interactive runtimes require manual confirmation for development channels and MCP servers. Capturing stdout in a drain thread and injecting
y\nremoves human dependency during session initialization.
Pitfall Guide
1. Shared MCP Configuration Files
Explanation: Multiple sessions writing to the same .mcp.json causes prompt routing collisions. One agent's input may be delivered to another's session, breaking conversation isolation.
Fix: Generate isolated configuration files per session route. Use a directory structure like ~/.broker/routes/<route-id>/mcp-config.json and pass the path explicitly during spawn.
2. Naive JSONL File Polling
Explanation: Continuously reading the entire transcript file to detect new content creates unnecessary disk I/O and scales poorly with long conversations. Fix: Implement file size snapshotting before prompt injection. Use stream seek operations to read only bytes appended after the snapshot.
3. Ignoring Interactive Startup Prompts
Explanation: The runtime will block initialization until development channels and MCP servers are explicitly approved. Automated deployments will hang indefinitely. Fix: Attach a stdout listener during spawn. Detect prompt patterns and inject confirmation keystrokes programmatically before routing user messages.
4. Race Conditions in Completion Signaling
Explanation: If the stop hook fires before the transcript is fully flushed to disk, readers may capture truncated responses. Fix: Implement a small debounce window (50β100ms) after hook reception, or verify JSONL line completeness before parsing. Use atomic file writes where possible.
5. Assuming MCP Channels are Production-Stable
Explanation: The --dangerously-load-development-channels flag and MCP stdio routing are experimental. Provider updates may alter channel behavior or remove flags without notice.
Fix: Pin runtime versions, implement fallback routing to direct API calls, and monitor provider changelogs. Abstract the injection layer to allow quick protocol swaps.
6. Memory Leaks in Persistent Daemons
Explanation: Long-running sessions accumulate context tokens and process memory. Without lifecycle management, daemon memory usage grows unbounded. Fix: Implement session TTLs, enforce context window limits, and add graceful teardown routines. Rotate sessions periodically or prune inactive threads.
7. Missing Error Boundaries for Stdio Streams
Explanation: Broken pipes or unexpected runtime crashes can leave stdin/stdout streams in a dangling state, causing subsequent injections to fail silently. Fix: Wrap stream writes in try/catch blocks, validate process exit codes, and implement automatic session recreation on stream failure.
Production Bundle
Action Checklist
- Isolate MCP configurations per session route to prevent prompt cross-contamination
- Implement file size snapshotting before prompt injection for efficient transcript reading
- Attach stdout drain listeners to auto-accept interactive startup confirmations
- Replace polling-based completion detection with deterministic stop hook signaling
- Add session TTLs and context window limits to prevent unbounded memory growth
- Abstract the injection layer to allow fallback routing if experimental flags change
- Validate JSONL line completeness before parsing to avoid truncated responses
- Monitor process exit codes and implement automatic session recreation on failure
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| High-frequency multi-agent chat | MCP Channel Injection | Persistent sessions eliminate cold starts; subscription billing caps costs | Low (subscription tier) |
| Enterprise compliance / Audit trails | Direct API Calls | Explicit token metering, structured logging, and provider SLAs | High (API rates) |
| Budget-constrained prototyping | PTY/TUI Scraping | Quick to implement, uses existing subscription | Medium (fragile, high CPU) |
| Production-grade agent infrastructure | MCP Channel Injection + Fallback | Protocol-driven reliability with API fallback for stability | Low-Medium (hybrid) |
Configuration Template
# session-broker.config.yaml
daemon:
port: 8080
max_sessions: 50
session_ttl_minutes: 60
graceful_shutdown_timeout: 10
sessions:
- id: "agent-alpha"
mcp_config: "~/.broker/routes/alpha/mcp-config.json"
transcript: "~/.broker/routes/alpha/transcript.jsonl"
auto_accept_prompts: true
stop_hook_url: "http://localhost:8080/hooks/alpha/complete"
- id: "agent-beta"
mcp_config: "~/.broker/routes/beta/mcp-config.json"
transcript: "~/.broker/routes/beta/transcript.jsonl"
auto_accept_prompts: true
stop_hook_url: "http://localhost:8080/hooks/beta/complete"
runtime:
binary: "claude"
flags:
- "--dangerously-load-development-channels"
- "--mcp-config={{mcp_config}}"
env:
CLAUDE_CODE_DISABLE_UPDATE_CHECK: "1"
NODE_OPTIONS: "--max-old-space-size=2048"
Quick Start Guide
- Initialize the daemon configuration: Create the YAML template above, adjust session IDs, paths, and hook URLs to match your infrastructure.
- Generate isolated MCP configs: For each session, create a dedicated
mcp-config.jsoncontaining the stdio channel routing rules. Ensure paths do not overlap. - Deploy the session manager: Run the TypeScript daemon. It will spawn persistent LLM processes, auto-accept startup prompts, and expose HTTP endpoints for prompt injection.
- Integrate with your chat gateway: Point your existing agent router to the daemon's HTTP interface. Replace direct CLI invocations with
POST /injectcalls containing the session ID and prompt payload. - Monitor completion hooks: Configure your webhook receiver to listen for stop hook signals. Parse the transcript offset, extract the new content, and route it back to your chat platform.
This architecture transforms interactive LLM usage from a billing liability into a deterministic, subscription-backed messaging pipeline. By leveraging internal development channels and persistent session management, teams can maintain multi-agent conversational infrastructure without API cost escalation or terminal parsing fragility.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
