tool-result-cache-rs: LRU Memoization for Agent Tool Calls in Rust

By Codcompass Team·2026-05-26·6 min read

Deterministic Tool Memoization: Reducing LLM Agent Latency and API Costs in Rust

Current Situation Analysis

Modern LLM agents operate as stateful loops, dynamically selecting tools based on context. While this flexibility enables complex reasoning, it introduces a critical inefficiency: redundant tool invocation. Agents frequently re-derive information or re-fetch static data across multiple turns, batch items, or retry cycles. Without a memoization layer, every tool call translates to a network request, database query, or external API hit, regardless of whether the inputs and expected outputs are identical to previous executions.

This redundancy is often overlooked because developers prioritize prompt engineering and model selection while neglecting the "plumbing" costs of tool execution. The consequences are measurable and severe:

Quota Exhaustion: External APIs with rate limits or daily quotas are consumed by duplicate requests. A batch job processing 200 entities might trigger 200 API calls, even if only 30 unique data points are required.
Latency Tax: Network round-trip times accumulate linearly with redundant calls. An agent loop that could resolve in milliseconds via cache hits instead suffers hundreds of milliseconds of unnecessary I/O.
Cost Inflation: Commercial tool providers charge per invocation. Redundant calls directly inflate operational costs without adding value.

Empirical analysis of agent workflows reveals that redundancy rates often exceed 80% in low-cardinality scenarios. For example, a research agent tasked with geocoding company headquarters may encounter "San Francisco" dozens of times. Without memoization, the agent issues a distinct API request for each occurrence, burning quota and increasing latency for identical results.

WOW Moment: Key Findings

Implementing an in-process LRU (Least Recently Used) memoization layer transforms the complexity of tool execution from O(N) (where N is total calls) to O(U) (where U is unique inputs). The following comparison illustrates the impact on a representative batch workload:

Strategy	Total API Requests	Unique Data Points	Effective Quota Usage	Avg Latency Impact
Naive Execution	200	30	100% (Quota Exhausted)	High (Network RTT per call)
LRU Memoization	30	30	15%	Low (Cache Hit ~0ms)

Why this matters: Memoization decouples agent logic from infrastructure constraints. It allows agents to operate aggressively—retrying, backtracking, and exploring multiple paths—without penalty. The cache absorbs duplicate requests, ensuring external services only see unique queries. This enables larger batch sizes, longer agent lifespans, and predictable cost models.

Core Solution

The optimal approach for Rust-based agents is an in-process, deterministic memoization store bui

lt on three pillars:

LRU Eviction: Caps memory usage by discarding least-recently accessed entries when capacity is reached.
Canonical Hashing: Generates stable cache keys by hashing the tool name and normalized arguments, ensuring argument order independence.
Lazy TTL: Validates expiration at read time, avoiding background sweep overhead while guaranteeing staleness bounds.

Architecture Decisions

Hand-Rolled LRU: Pulling in external LRU crates introduces transitive dependencies that may conflict with allocation strategies or unsafe code policies. A minimal LRU implementation (~250 lines) using a HashMap and doubly linked list provides O(1) operations with zero external dependencies beyond serde_json and sha2.
SHA-256 Keying: Arguments are serialized to canonical JSON (keys sorted lexicographically) before hashing. This ensures {"city": "Austin", "state": "TX"} and {"state": "TX", "city": "Austin"} produce identical keys. SHA-256 provides collision resistance suitable for production workloads.
Lazy Expiration: Entries are not proactively swept. When get is called, the entry's timestamp is checked. If expired, the entry is removed and None is returned. This avoids timer overhead and is sufficient for agent loops where entries are accessed frequently.

Implementation Example

The following example demonstrates a memoization wrapper for a location resolution tool. Note the use of a builder pattern for configuration and a canonical key generator.

use serde_json::json;
use std::time::Duration;

// Assume MemoStore is the rewritten library
let store = MemoStore::builder()
    .capacity(1024)
    .ttl(Duration::from_secs(600))
    .build();

async fn resolve_location(city: &str) -> Result<LocationData, AgentError> {
    let args = json!({"city": city});
    let key = store.compute_key("resolve_location", &args);

    // Fast path: Check cache
    if let Some(cached) = store.lookup(&key) {
        return Ok(cached);
    }

    // Slow path: Execute tool
    let result = fetch_location_from_api(&args).await?;
    
    // Store result for future calls
    store.insert(key, result.clone());
    Ok(result)
}

For ergonomic integration, a macro or wrapper function can abstract the lookup-insert pattern:

macro_rules! memoize_tool {
    ($store:expr, $tool_name:expr, $args:expr, $impl:expr) => {{
        let key = $store.compute_key($tool_name, $args);
        match $store.lookup(&key) {
            Some(val) => Ok(val),
            None => {
                let result = $impl?;
                $store.insert(key, result.clone());
                Ok(result)
            }
        }
    }};
}

// Usage
let coords = memoize_tool!(
    store,
    "resolve_location",
    &json!({"city": "Austin"}),
    fetch_location_from_api(&json!({"city": "Austin"}))
)?;

Why These Choices?

Builder Pattern: Allows granular configuration (capacity, TTL, eviction policy) without breaking API changes.
Canonical JSON: Eliminates cache fragmentation caused by argument reordering. LLMs often generate JSON with non-deterministic key order; canonicalization ensures hits.
Async Compatibility: The store should support tokio::sync::Mutex for async executors. Lock contention is minimal in agent loops, but async-aware locking prevents executor thread blocking.

Pitfall Guide

Pitfall	Explanation	Fix
Caching Non-Idempotent Tools	Memoizing tools with side effects (e.g., `send_email`, `create_record`) causes duplicate actions. The cache returns the result of the first call, but the side effect may have already occurred.	Audit all tools for idempotency. Only cache read-only or idempotent operations. Use idempotency keys for write operations.
High-Cardinality Arguments	If every tool call has unique arguments (e.g., `query_database` with distinct SQL), the cache hit rate approaches 0%. The overhead of hashing and storage outweighs benefits.	Monitor hit rates. Skip caching for high-cardinality tools. Use parameterized queries or result set caching instead.
Stale Data in Time-Sensitive Tools	Caching tools that require fresh data (e.g., `get_weather`, `stock_price`) with a long TTL returns outdated information.	Set TTL to zero or disable caching for real-time tools. Use short TTLs (seconds) if slight staleness is acceptable.
Cross-Process Assumptions	In-process caches are isolated to the runtime. Distributed agents or multi-worker deployments cannot share cache state.	Use a distributed cache (Redis, Memcached) for cross-process scenarios. The in-process cache can serve as a local L1 layer.
Lock Contention in Async Loops	Using `std::sync::Mutex` in async code blocks the executor thread during lock acquisition, reducing throughput.	Use `tokio::sync::Mutex` or `RwLock` for async contexts. Ensure lock hold times are minimal.
Memory Bloat from Large Results	Caching large payloads (e.g., full document text) can exhaust memory quickly, even with LRU eviction.	Limit cache capacity based on result size. Serialize results to compact formats. Evict large entries aggressively.
Serialization Drift	Changes in argument structure or tool signature can invalidate cache keys, causing silent misses.	Version cache keys or include schema hashes. Test cache stability after tool updates.

Production Bundle

Action Checklist

Audit Tool Idempotency: Review all agent tools. Mark each as idempotent or non-idempotent. Only enable caching for idempotent tools.
Define TTL Policies: Assign TTLs based on data freshness requirements. Static config: 1 hour. Geocoding: 24 hours. Weather: 0 seconds.
Monitor Hit/Miss Ratios: Instrument the cache to track hits, misses, and evictions. Alert if hit rate drops below 50% for cached tools.
Validate Canonical Serialization: Ensure argument objects are serialized with sorted keys. Test with reordered JSON to confirm key stability.
Set Capacity Limits: Configure LRU capacity based on expected unique inputs and available memory. Start with 1024 entries and tune based on eviction rates.
Test Duplicate Inputs: Simulate agent loops with repeated tool calls. Verify cache hits reduce API requests as expected.
Handle Async Locking: Use tokio::sync::Mutex for async agents. Profile lock contention under load.
Implement Graceful Degradation: If the cache fails (e.g., serialization error), fall back to direct tool execution without crashing the agent.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Local Agent with Low Cardinality	In-Process LRU	Zero network overhead, simple setup, high hit rate.	Reduces API costs by 80-90%.
Batch Job with Static Data	In-Process LRU + Disk Persistence	Results persist across runs, avoiding recomputation.	Eliminates redundant API calls entirely.
Distributed Multi-Worker Agent	Redis + In-Process L1	Shared state across workers, local cache for speed.	Moderate infrastructure cost, high efficiency.
Real-Time Data Tool	No Cache or TTL=0	Freshness is critical; cache would return stale data.	No cost savings, but prevents errors.
High-Cardinality Query Tool	No Cache	Hit rate near 0%; cache overhead wastes resources.	No impact; avoids unnecessary hashing.

Configuration Template

use tool_memo_rs::MemoStore;
use std::time::Duration;

let config = MemoConfig {
    capacity: 2048,
    default_ttl: Duration::from_secs(3600),
    eviction_policy: EvictionPolicy::LRU,
    key_hasher: HashAlgorithm::SHA256,
    canonical_json: true,
};

let store = MemoStore::new(config);

// Per-tool TTL override
store.set_tool_ttl("get_weather", Duration::ZERO);
store.set_tool_ttl("resolve_location", Duration::from_secs(86400));

Quick Start Guide

Add Dependency: Include the memoization crate in Cargo.toml.
```
[dependencies]
tool-memo-rs = "0.2"
serde_json = "1"
```

Initialize Store: Create a MemoStore with capacity and TTL.

let store = MemoStore::builder().capacity(1024).ttl(Duration::from_secs(600)).build();

Wrap Tool Calls: Use the lookup-insert pattern or macro to wrap idempotent tools.
```
let result = memoize_tool!(store, "my_tool", &args, my_tool_impl(&args))?;
```

Monitor: Log cache stats periodically to verify hit rates and adjust capacity/TTL.

let stats = store.stats();
println!("Hits: {}, Misses: {}, Evictions: {}", stats.hits, stats.misses, stats.evictions);

Deploy: Run the agent. Redundant calls will be absorbed by the cache, reducing latency and API usage.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back