ulti-step workflows involving dozens of tool interactions.
Core Solution
Architecture: Atomic Message Window
The solution requires a context manager that understands the structure of tool interactions. The core algorithm identifies tool_use and tool_result blocks by their unique IDs, groups them into pairs, and treats these pairs as indivisible units during truncation.
Implementation Strategy
- Pair Detection: Scan the message list to map
tool_use IDs to their message indices. Scan again to match tool_result blocks to these IDs.
- Atomic Grouping: Create a set of paired indices. Each pair represents a tool call and its result.
- Sliding Window Calculation: Determine the number of messages to drop based on the configured maximum.
- Constraint-Aware Dropping: Iterate from the oldest messages. If a message is part of a pair, check if the partner is also within the drop zone. Drop both or neither. If the partner is in the safe zone, retain the current message to preserve the pair.
- System Preservation: Always exclude system messages from truncation logic.
TypeScript Implementation
The following TypeScript implementation demonstrates the atomic trimming logic. This code is designed for integration into agent loops using the Anthropic API or similar tool-augmented models.
interface ToolBlock {
type: 'tool_use' | 'tool_result';
id?: string;
tool_use_id?: string;
}
interface Message {
role: 'user' | 'assistant' | 'system';
content: string | ToolBlock[];
}
interface WindowConfig {
maxMessages: number;
preserveSystem: boolean;
minRecentKeep: number;
}
class AtomicMessageWindow {
private config: WindowConfig;
constructor(config: WindowConfig) {
this.config = config;
}
trim(messages: Message[]): Message[] {
if (messages.length <= this.config.maxMessages) {
return messages;
}
const pairs = this.detectToolPairs(messages);
const dropCount = messages.length - this.config.maxMessages;
const dropIndices = new Set<number>();
let dropped = 0;
// Identify indices to drop
for (let i = 0; i < messages.length && dropped < dropCount; i++) {
// Skip system messages if configured
if (this.config.preserveSystem && messages[i].role === 'system') continue;
// Respect minimum recent history
if (i >= messages.length - this.config.minRecentKeep) continue;
const pair = pairs.get(i);
if (pair) {
// Message is part of a tool pair
const partnerIndex = pair;
// Check if partner is also in the drop zone
// Partner must be older than the current safe boundary
const partnerInDropZone = partnerIndex < (messages.length - dropCount + dropped);
if (partnerInDropZone) {
// Drop both members of the pair
dropIndices.add(i);
dropIndices.add(partnerIndex);
dropped += 2;
} else {
// Partner is safe; cannot drop this message without breaking pair
// Skip to preserve atomicity
continue;
}
} else {
// Independent message; safe to drop
dropIndices.add(i);
dropped += 1;
}
}
// Construct trimmed list
return messages.filter((_, index) => !dropIndices.has(index));
}
private detectToolPairs(messages: Message[]): Map<number, number> {
const toolUseMap = new Map<string, number>();
const pairs = new Map<number, number>();
// First pass: index tool_use blocks
messages.forEach((msg, index) => {
if (msg.role === 'assistant' && Array.isArray(msg.content)) {
msg.content.forEach(block => {
if (block.type === 'tool_use' && block.id) {
toolUseMap.set(block.id, index);
}
});
}
});
// Second pass: match tool_result blocks
messages.forEach((msg, index) => {
if (msg.role === 'user' && Array.isArray(msg.content)) {
msg.content.forEach(block => {
if (block.type === 'tool_result' && block.tool_use_id) {
const useIndex = toolUseMap.get(block.tool_use_id);
if (useIndex !== undefined) {
pairs.set(index, useIndex);
pairs.set(useIndex, index);
}
}
});
}
});
return pairs;
}
}
Rationale for Design Choices
- TypeScript Interfaces: Using explicit interfaces (
ToolBlock, Message) ensures type safety when parsing API responses. This prevents runtime errors when accessing nested properties.
- Atomic Drop Logic: The algorithm checks
partnerInDropZone before dropping a pair. This ensures that if a tool call is old but the result is recent (or vice versa), the pair is retained. This prevents the "straddling pair" issue where one half is dropped and the other kept.
- Configurable Constraints:
preserveSystem and minRecentKeep provide flexibility. System prompts must always be retained for agent behavior. minRecentKeep ensures the agent retains immediate context, which is critical for coherence even if token limits are not strictly enforced.
- Deterministic Performance: The algorithm runs in O(N) time, where N is the message count. This is efficient enough to run on every turn without impacting latency.
Pitfall Guide
1. The Orphaned Result Trap
- Explanation: Dropping a
tool_use message while retaining its tool_result. The API sees a result with no corresponding call.
- Fix: Always enforce atomic dropping. If a result is dropped, its call must also be dropped. The pair detection logic must be bidirectional.
2. Token Blindness
- Explanation: Trimming by message count does not guarantee token limits are respected. A single message can contain thousands of tokens.
- Fix: Combine atomic trimming with a token estimator. Use a library like
prompt-token-counter to estimate tokens. If the estimate exceeds a threshold (e.g., 80% of context window), trigger trimming even if message count is within bounds.
3. System Message Ejection
- Explanation: Naive algorithms may drop the system message if it is the oldest entry. This changes agent behavior or causes API errors.
- Fix: Always exclude system messages from the drop set. Implement a
preserveSystem flag that locks the system message index.
4. Batch Tool Result Mismatch
- Explanation: Some agents send multiple tool results in a single user message. If the algorithm only checks the first block, it may miss pairs.
- Fix: Iterate over all content blocks in a message. The pair detection must handle arrays of blocks, not just single blocks.
5. State Mutation
- Explanation: Modifying the message array in-place during trimming can cause race conditions in async loops or unexpected side effects.
- Fix: Always return a new array from the trim function. Use
filter or spread operators to create immutable copies.
6. Straddling Pair Overflow
- Explanation: When a pair straddles the boundary, retaining both may result in the message count exceeding
maxMessages by one or two.
- Fix: Accept this as a trade-off. It is better to slightly exceed the message count than to break a pair. Document this behavior. The window size is a soft limit for pairs, not a hard limit.
7. Ignoring min_keep_recent
- Explanation: Dropping all old messages can cause the agent to lose track of the current task if the recent history is too short.
- Fix: Implement a
minRecentKeep parameter. Always retain the last N messages regardless of pair status. This ensures the agent has sufficient immediate context.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Short-Running Agent (<10 turns) | No Trimming | Context limits unlikely to be reached. Overhead unnecessary. | Low |
| Long-Running Agent with Tools | Atomic Trimming | Prevents API errors from split pairs. Maintains tool integrity. | Low |
| Token-Heavy Messages | Atomic + Token Counter | Message count is insufficient proxy. Token estimation ensures limits. | Medium |
| Critical Context Retention | Summarization Fallback | Preserves semantics of dropped messages via LLM summary. | High |
| Stateful Multi-User Chat | Atomic + Rotate | Combines atomic trimming with user-specific history rotation. | Medium |
Configuration Template
// window.config.ts
import { WindowConfig } from './atomic-window';
export const defaultWindowConfig: WindowConfig = {
maxMessages: 25,
preserveSystem: true,
minRecentKeep: 6,
};
export const tokenAwareConfig: WindowConfig = {
maxMessages: 40,
preserveSystem: true,
minRecentKeep: 8,
// Additional token threshold config
maxTokens: 160000,
tokenThreshold: 0.8,
};
Quick Start Guide
- Initialize Window: Create an instance of
AtomicMessageWindow with your configuration.
const window = new AtomicMessageWindow({
maxMessages: 30,
preserveSystem: true,
minRecentKeep: 4,
});
- Accumulate Messages: Append user and assistant messages to your history array as the agent runs.
- Trim Before Call: Invoke
window.trim(messages) immediately before sending the request to the API.
const trimmedMessages = window.trim(history);
const response = await client.messages.create({
model: 'claude-sonnet-4-6',
messages: trimmedMessages,
// ... other params
});
- Handle Tool Results: Process tool calls and append results. The atomic logic will handle pair integrity automatically on the next trim.
- Monitor: Log trim events to verify pairs are preserved and context bounds are respected. Adjust
maxMessages based on observed performance.
Production Tip: For agents running in production, wrap the trim logic in a try-catch block and log any anomalies. If the API returns a pair-split error despite atomic trimming, it may indicate a non-standard message format or a bug in pair detection. Implement a fallback mechanism that resets context or alerts the operations team. Regularly review trim statistics to optimize window parameters for your specific workload.