trait that exposes only what the LLM requires, while hiding execution mechanics.
use serde_json::Value;
use async_trait::async_trait;
#[async_trait]
pub trait AutonomousTool: Send + Sync {
/// Identifier exposed to the model for routing
fn identifier(&self) -> &'static str;
/// Natural language description for model selection
fn capability(&self) -> &'static str;
/// JSON Schema defining expected input structure
fn parameter_schema(&self) -> Value;
/// Actual execution logic, isolated from model context
async fn invoke(&self, payload: Value, runtime: &SandboxContext) -> ToolOutcome;
}
pub struct SandboxContext {
pub workspace_root: std::path::PathBuf,
pub memory_limit_mb: u64,
pub cpu_fuel: u64,
}
pub struct ToolOutcome {
pub success: bool,
pub output: String,
pub diagnostics: Option<String>,
}
Rationale: The model only interacts with identifier, capability, and parameter_schema. It never sees invoke or SandboxContext. This asymmetric information design prevents prompt injection through tool metadata and ensures the orchestration layer remains decoupled from runtime specifics.
Step 2: Wire the Communication Channels (Outer β Middle Boundary)
User interactions require low-latency commands, real-time streaming, and idempotent history retrieval. Mixing these concerns on a single protocol creates bottlenecks. We separate them using Axum for REST and a dedicated WebSocket handler for streaming.
use axum::{routing::{get, post}, Router, extract::WebSocketUpgrade};
use tower_http::cors::CorsLayer;
pub fn build_gateway_router() -> Router {
Router::new()
.route("/api/v1/sessions", post(create_session))
.route("/api/v1/sessions/:id/history", get(fetch_history))
.route("/ws/agent", get(handle_agent_stream))
.layer(CorsLayer::permissive())
}
async fn handle_agent_stream(ws: WebSocketUpgrade) -> axum::response::Response {
ws.on_upgrade(|mut socket| async move {
while let Some(Ok(msg)) = socket.recv().await {
if let Some(cmd) = parse_command(msg) {
match cmd {
AgentCommand::Execute { prompt, session_id } => {
spawn_reasoning_loop(prompt, session_id, &mut socket).await;
}
AgentCommand::Cancel { session_id } => {
terminate_session(session_id).await;
}
}
}
}
})
}
Rationale: REST handles stateful operations (session creation, history retrieval) with built-in caching and idempotency. WebSocket handles bidirectional commands and unidirectional streaming events. This separation prevents long-polling overhead and ensures the UI receives incremental token outputs without blocking request lifecycle.
The execution engine must enforce strict boundaries before loading any module. We use Wasmtime with WASI, applying resource limits and path mapping at initialization.
use wasmtime::{Engine, Module, Store, Config};
use wasmtime_wasi::WasiCtxBuilder;
pub struct WasmRuntime {
engine: Engine,
}
impl WasmRuntime {
pub fn new() -> Self {
let mut config = Config::new();
config.consume_fuel(true);
config.max_memory_size(256 * 1024 * 1024); // 256MB hard limit
config.cranelift_opt_level(wasmtime::OptLevel::Speed);
Self {
engine: Engine::new(&config).unwrap(),
}
}
pub async fn execute_tool(
&self,
wasm_bytes: &[u8],
input_json: &str,
workspace: &std::path::Path,
) -> Result<String, Box<dyn std::error::Error>> {
let module = Module::from_binary(&self.engine, wasm_bytes)?;
let wasi = WasiCtxBuilder::new()
.preopened_dir(workspace, "/sandbox_root")?
.inherit_stdio()
.build();
let mut store = Store::new(&self.engine, wasi);
store.set_fuel(10_000_000)?; // CPU instruction metering
let instance = wasmtime::Linker::new(&self.engine)
.instantiate(&mut store, &module)?
.ensure_no_start(&mut store)?;
let run_func = instance.get_func(&mut store, "run").ok_or("Missing entrypoint")?;
let run = run_func.typed::<(i32, i32), i32>(&store)?;
// Allocate memory, write input JSON, invoke, read output
// ... (memory management omitted for brevity)
Ok("Execution completed".to_string())
}
}
Rationale: Fuel metering prevents infinite loops. Memory limits prevent heap exhaustion. WASI preopen restricts filesystem access to a single mapped directory. Cranelift optimization ensures near-native performance without sacrificing safety. This configuration creates a deterministic execution environment where the agent can operate autonomously without risking host stability.
Pitfall Guide
1. Unbounded WASI Preopen Paths
Explanation: Mapping the entire host filesystem or using relative paths without validation allows tools to escape the sandbox and access sensitive directories.
Fix: Always use absolute, chroot-style mappings. Validate that all requested paths resolve within the designated workspace root before passing them to the WASI builder.
2. Blocking the Async Executor with Heavy WASM Calls
Explanation: Wasmtime's invoke is synchronous by default. Calling it directly on the main async runtime blocks the reactor, causing WebSocket timeouts and UI freezes.
Fix: Wrap WASM execution in tokio::task::spawn_blocking or use a dedicated thread pool. Return control to the async runtime immediately after scheduling.
Explanation: Injecting implementation details, error codes, or internal state into the tool description confuses the model and increases token consumption.
Fix: Keep capability descriptions concise and action-oriented. Move technical constraints to parameter_schema validation and runtime error handling.
4. Ignoring Fuel and Memory Limits
Explanation: Running WASM modules without metering allows a single tool call to consume 100% CPU or allocate gigabytes of memory, crashing the host process.
Fix: Always enable consume_fuel(true) and set max_memory_size. Implement graceful trap handling that converts fuel exhaustion into a structured ToolOutcome error.
5. Mixing Streaming and Stateful REST Endpoints
Explanation: Attempting to return streaming token outputs via standard HTTP responses forces long-polling or chunked encoding that breaks caching and increases latency.
Fix: Reserve REST for idempotent operations (history, config, session management). Use WebSocket exclusively for real-time command dispatch and incremental output streaming.
6. Failing to Cache Compiled WASM Modules
Explanation: Parsing and compiling .wasm binaries on every invocation adds 50β100ms of overhead per tool call, degrading user experience during multi-turn reasoning.
Fix: Use Cranelift's .cwasm cache or pre-compile modules at startup. Store compiled artifacts in memory or on-disk with versioned keys.
Explanation: Allowing arbitrary async functions to bypass the trait contract leads to inconsistent error handling, missing schema validation, and unpredictable model routing.
Fix: Enforce the trait at the registry level. Validate all inputs against parameter_schema before invocation. Standardize ToolOutcome structure across all implementations.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-frequency autonomous coding | WASM Sandbox + Cranelift Cache | Sub-10ms cold start, deterministic isolation, hot-swappable tools | Low compute overhead, moderate dev complexity |
| Enterprise compliance / air-gapped | Containerized Execution | Full OS isolation, auditable image manifests, network egress control | High memory footprint, slower scaling |
| Rapid prototyping / internal tools | Direct Process Spawn | Zero setup, native library access, simple debugging | High security risk, no resource limits |
| Multi-tenant SaaS deployment | WASM Sandbox + Per-Session Fuel | Strict tenant isolation, predictable billing, no cross-contamination | Requires careful quota management |
Configuration Template
// production_runtime.rs
use wasmtime::{Config, Engine};
use wasmtime_wasi::WasiCtxBuilder;
use std::path::PathBuf;
pub struct ProductionSandboxConfig {
pub workspace: PathBuf,
pub memory_mb: u64,
pub fuel_budget: u64,
pub network_whitelist: Vec<String>,
}
impl ProductionSandboxConfig {
pub fn to_engine(&self) -> Engine {
let mut cfg = Config::new();
cfg.consume_fuel(true);
cfg.max_memory_size(self.memory_mb * 1024 * 1024);
cfg.cranelift_opt_level(wasmtime::OptLevel::Speed);
cfg.wasm_multi_memory(true);
Engine::new(&cfg).expect("Failed to initialize Wasmtime engine")
}
pub fn to_wasi(&self) -> wasmtime_wasi::WasiCtx {
WasiCtxBuilder::new()
.preopened_dir(&self.workspace, "/workspace")
.expect("Invalid workspace path")
.inherit_stdio()
.build()
}
}
// Usage in orchestration layer
let config = ProductionSandboxConfig {
workspace: PathBuf::from("/var/agent/sessions/abc123"),
memory_mb: 128,
fuel_budget: 5_000_000,
network_whitelist: vec!["api.openai.com".into()],
};
let engine = config.to_engine();
let wasi_ctx = config.to_wasi();
Quick Start Guide
- Initialize the workspace: Create an isolated directory for each agent session. Set strict filesystem permissions and map it as the WASI preopen root.
- Compile tool modules: Use
cargo build --target wasm32-wasi to generate .wasm artifacts. Pre-compile them to .cwasm using Wasmtime's cache utility.
- Wire the gateway: Deploy an Axum server with separate REST and WebSocket routes. Configure CORS, rate limiting, and authentication middleware.
- Launch the reasoning loop: Initialize the LLM provider, inject the tool registry, and start the turn-based execution cycle. Monitor fuel consumption and memory usage per invocation.
- Validate end-to-end: Send a test command via WebSocket. Verify that the model selects the correct tool, the sandbox enforces boundaries, and the UI receives streaming output without blocking.