e to a connection ID, socket instance, or browser tab. Generate a cryptographically random session identifier at conversation inception. This ID becomes the primary key for all state storage: message history, token offsets, tool call payloads, and agent metadata. The session ID survives transport failures, page reloads, and device switches.
Step 2: Implement Offset-Based Replay with Sequence Tracking
Every message published to the session must carry a monotonically increasing sequence number. The client tracks its lastReceivedSeq. Upon reconnection, the client transmits this offset. The session layer queries the persistent store for all records with sequence > lastReceivedSeq, batches them, and streams them to the client. Deduplication is handled by the client acknowledging receipt and updating its local offset.
Step 3: Build a Transport Adapter with Automatic Fallback
Hardcoding WebSockets creates a single point of failure. Implement a transport abstraction that attempts WebSocket first, degrades to HTTP streaming (SSE or chunked transfer) if the upgrade fails or times out, and falls back to long-polling as a last resort. The adapter must expose a unified publish() and subscribe() interface so the session layer remains transport-agnostic.
Step 4: Decouple Agent Execution from Client Transport
The AI agent should publish state changes to the session layer, not directly to the client socket. This means tool call requests, intermediate reasoning steps, and final responses are all written to the session store before transmission. If the client disconnects, the agent continues executing. When the client reconnects, it replays from its offset and receives the completed tool results without the agent needing to restart or retry.
Implementation Example (TypeScript)
import { EventEmitter } from 'events';
interface SessionMessage {
id: string;
sequence: number;
type: 'token' | 'tool_request' | 'tool_result' | 'context_update';
payload: unknown;
timestamp: number;
}
interface TransportAdapter {
connect(sessionId: string): Promise<void>;
send(data: string): void;
onMessage(callback: (raw: string) => void): void;
disconnect(): void;
}
export class DurableSessionManager extends EventEmitter {
private sessionId: string;
private messageStore: Map<number, SessionMessage> = new Map();
private currentSequence: number = 0;
private transport: TransportAdapter;
private heartbeatInterval: NodeJS.Timeout | null = null;
constructor(sessionId: string, transport: TransportAdapter) {
super();
this.sessionId = sessionId;
this.transport = transport;
}
async initialize(): Promise<void> {
await this.transport.connect(this.sessionId);
this.transport.onMessage((raw) => this.handleIncoming(raw));
this.startHeartbeat();
}
publish(message: Omit<SessionMessage, 'sequence' | 'timestamp'>): void {
const seq = ++this.currentSequence;
const record: SessionMessage = {
...message,
sequence: seq,
timestamp: Date.now()
};
this.messageStore.set(seq, record);
this.transport.send(JSON.stringify(record));
}
replayFromOffset(clientOffset: number): SessionMessage[] {
const missed: SessionMessage[] = [];
for (let seq = clientOffset + 1; seq <= this.currentSequence; seq++) {
const msg = this.messageStore.get(seq);
if (msg) missed.push(msg);
}
return missed;
}
private handleIncoming(raw: string): void {
try {
const data = JSON.parse(raw);
if (data.type === 'heartbeat_ack') return;
this.emit('message', data);
} catch {
// Malformed payload, ignore or log
}
}
private startHeartbeat(): void {
this.heartbeatInterval = setInterval(() => {
this.transport.send(JSON.stringify({ type: 'heartbeat', ts: Date.now() }));
}, 25000);
}
destroy(): void {
if (this.heartbeatInterval) clearInterval(this.heartbeatInterval);
this.transport.disconnect();
this.messageStore.clear();
}
}
Architecture Rationale
- Why sequence numbers over timestamps? Timestamps suffer from clock skew and lack strict ordering guarantees. Monotonic integers provide deterministic replay and simplify deduplication.
- Why store messages in-memory before persistence? In-memory caching reduces latency for active sessions. A background worker should asynchronously flush records to a durable store (Redis, PostgreSQL, or managed message broker) to survive server restarts.
- Why decouple transport from session logic? It enables protocol fallback without rewriting business logic. The session manager only cares about publishing and replaying state, not whether the wire is WebSocket, HTTP/2, or long-polling.
Pitfall Guide
1. Assuming WebSocket Ping/Pong Preserves Application State
Explanation: The WebSocket protocol includes control frames for keep-alive, but they operate at the TCP/transport layer. They do not serialize conversation history, tool payloads, or agent context.
Fix: Treat ping/pong strictly as transport health checks. Maintain a separate application-level state store keyed by session ID.
2. Relying on Client-Side Buffers for Catch-Up
Explanation: Storing missed messages in browser memory or localStorage fails during page reloads, tab crashes, or storage quota limits. It also creates synchronization drift across devices.
Fix: Server-side offset tracking is mandatory. The client should only hold a pointer (lastReceivedSeq), not the payload history.
3. Hardcoding Proxy Timeouts Without Heartbeat Alignment
Explanation: Setting an ALB timeout to 3600 seconds while sending heartbeats every 90 seconds creates a mismatch. If the heartbeat mechanism fails, the connection dies silently.
Fix: Align heartbeat intervals to 25% of the lowest timeout threshold. For Cloudflare's 100s limit, send pings every 20-25 seconds. Monitor heartbeat failure rates in observability dashboards.
4. Ignoring Idempotency During Offset Replay
Explanation: Network flakiness can cause the client to request the same offset range twice, or the server to duplicate messages during a reconnect race condition.
Fix: Implement idempotency keys on the client side. When replaying, check if a message ID already exists in the local conversation store before rendering. Use SET operations with TTL in Redis for deduplication windows.
5. Binding Session State to Connection Lifecycle
Explanation: Tying conversation data to the socket instance means state is garbage-collected when the connection closes. This is the root cause of the "reconnect but lose context" bug.
Fix: Decouple state from transport. Use a session registry that outlives individual connections. Connections should be ephemeral subscribers to a persistent session.
6. Overlooking Multi-Device Synchronization
Explanation: Users expect to start a conversation on desktop and continue on mobile. If state is scoped to a single tab or browser profile, cross-device continuity breaks.
Fix: Authenticate sessions via user tokens or device-agnostic session IDs. Allow multiple active transports to subscribe to the same session ID. Broadcast state changes to all active subscribers.
Explanation: When an agent executes a tool, the result often arrives asynchronously. If the client disconnects during execution, the result has no delivery path.
Fix: Persist tool call metadata (request ID, status, payload) in the session store. When the result arrives, publish it to the session. Reconnecting clients replay it naturally via offset catch-up.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Simple request-response chatbot | SSE or standard HTTP streaming | Unidirectional, lower infrastructure overhead, easier deployment | Low |
| Agentic workflow with tool calls & mid-stream cancellation | WebSockets + Durable Session Layer | Bidirectional signaling required; state persistence prevents context loss | Medium |
| High-traffic consumer app with mobile handoffs | Managed real-time platform with offset replay | Eliminates custom buffer management; handles fallback & multi-device natively | High (SaaS) |
| Internal enterprise tool with controlled network | Self-built session layer + Redis | Full control over state schema; predictable infrastructure costs | Low-Medium |
| Multi-tenant SaaS requiring cross-device sync | Session-per-user architecture with broadcast | Guarantees continuity across tabs/devices; simplifies auth integration | Medium |
Configuration Template
# Nginx WebSocket & AI Chat Proxy Configuration
location /ai-session/ {
proxy_pass http://backend_cluster;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Extend read timeout for long-running agent tasks
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
# Disable buffering for real-time token streaming
proxy_buffering off;
proxy_cache off;
}
# AWS ALB Listener Rule (Terraform)
resource "aws_lb_listener_rule" "ai_chat_ws" {
listener_arn = aws_lb_listener.https.arn
priority = 100
action {
type = "forward"
target_group_arn = aws_lb_target_group.ai_chat.arn
}
condition {
http_request_method {
values = ["GET"]
}
header {
http_header_name = "Upgrade"
values = ["websocket"]
}
}
}
resource "aws_lb_target_group" "ai_chat" {
name = "ai-chat-ws"
port = 8080
protocol = "HTTP"
vpc_id = var.vpc_id
# Critical: Override 60s default
idle_timeout = 3600
health_check {
enabled = true
interval = 30
path = "/health"
timeout = 10
healthy_threshold = 3
unhealthy_threshold = 3
}
}
// Heartbeat & Timeout Alignment Strategy
const PROXY_TIMEOUT_MS = 100_000; // Cloudflare Free/Pro limit
const HEARTBEAT_INTERVAL_MS = Math.floor(PROXY_TIMEOUT_MS * 0.25); // 25s
function configureTransport(adapter: TransportAdapter) {
adapter.on('connect', () => {
setInterval(() => {
adapter.send(JSON.stringify({ type: 'ping', ts: Date.now() }));
}, HEARTBEAT_INTERVAL_MS);
});
adapter.on('error', (err) => {
if (err.code === 'TIMEOUT') {
// Trigger graceful fallback or reconnect
adapter.switchToFallback();
}
});
}
Quick Start Guide
- Initialize Session Registry: Create a service that generates session IDs and maintains a message store (Redis
LIST or PostgreSQL table with sequence column). Expose publish() and replayFrom(offset) endpoints.
- Build Transport Adapter: Implement a client-side class that attempts WebSocket connection. If it fails or times out, automatically switch to HTTP chunked streaming or long-polling. Expose unified
send() and onMessage() methods.
- Implement Offset Tracking: On the client, store
lastReceivedSeq in memory. On reconnect, send { type: 'sync', offset: lastReceivedSeq } to the server. Apply returned messages to the UI, skipping duplicates via message ID.
- Deploy Heartbeat & Timeout Alignment: Configure server-side proxies to allow 3600s idle time. Set client heartbeats to 25s. Monitor connection stability via metrics:
reconnect_count, offset_gap_size, heartbeat_failure_rate.
- Validate Multi-Device Continuity: Open the same session ID on two browsers. Trigger a tool call on one device. Verify the second device receives the tool result via offset replay without manual refresh.