A Query Engine for the Agents

By Codcompass Team·2026-05-28·7 min read

Beyond SQL: Architecting Client-Side Analytics for AI Agent Workloads

Current Situation Analysis

Production telemetry is undergoing a structural shift. Unstructured agent narratives—reasoning chains, chat transcripts, tool-use traces, and model output logs—now outpace structured metrics in both volume and velocity. Engineering teams need to interrogate this data to understand failure modes, optimize prompt strategies, and audit agent behavior. Traditional SQL engines, however, hit a fundamental wall: they lack native pathways to evaluate semantic content. You cannot filter for "where the agent exhibited confusion" or "summarize tool-call failures" using standard relational operators without injecting a model into the query execution path.

This gap is frequently overlooked because data engineering tooling has been optimized for server-side, batch-oriented lakehouses. Platforms like Spark, Trino, and managed cloud warehouses assume heavy JVM runtimes, persistent network connections, and dedicated compute clusters. They do not align with the execution model of modern AI-native applications. Client-side agents (Cursor, Claude Code, browser-based copilots, and per-turn sandbox environments) operate inside constrained JavaScript runtimes where spinning up a WASM-heavy analytical engine or maintaining a persistent backend connection introduces unacceptable latency and bundle bloat.

The constraint is threefold. First, the analytics layer must be JS-native to drop directly into the host process without cross-process serialization. Second, the distribution footprint must remain minimal to support cold-start scenarios and ephemeral agent sandboxes. Third, the execution engine must interleave traditional analytic operators (filters, sorts, aggregations) with asynchronous, model-based interpretation. Without lazy evaluation and async-native scheduling, expensive LLM calls will either block the event loop or burn tokens on irrelevant rows. The industry lacks a runtime that satisfies all three simultaneously, forcing teams to either ship telemetry to remote warehouses (introducing latency and privacy friction) or write brittle, imperative scripts that cannot scale.

WOW Moment: Key Findings

The performance gap between traditional WASM analytical engines and async-native JavaScript runtimes becomes stark when queries require model-in-the-loop evaluation. By decoupling predicate evaluation from row materialization and scheduling LLM calls as non-blocking async UDFs, client-side engines can drastically reduce both latency and token expenditure.

Approach	Filter-Bounded Latency	Sort-Bounded Latency	Token Cost (10-Task Suite)
Traditional WASM Engine (DuckDB-WASM)	Baseline	Baseline	100%
Async-Native JS Engine (Hyperparam Stack)	300x faster	192x faster	~33%

This finding matters because it proves that semantic analytics do not require server-side infrastructure. The 300x improvement on filter-bounded queries stems from async UDF scheduling: the engine defers LLM evaluation until downstream operators actually request the result, rather than materializing every row upfront. The 192x gain on sort-bounded queries comes from streaming partial results and applying top-K logic before invoking expensive model calls. Completing a t

en-task agent analyst suite at two-thirds lower cost demonstrates that token budgeting is no longer a theoretical concern—it is an architectural property of the query engine itself. Teams can now run real-time semantic audits directly in the client process, eliminating round-trips to remote warehouses while maintaining strict cost controls.

Core Solution

Building a client-side analytics pipeline for agent telemetry requires three architectural layers: a storage connector that reads columnar formats directly from object storage, an async execution scheduler that interleaves SQL operators with model calls, and a lazy evaluation engine that materializes cells only when demanded.

Step 1: Initialize the Storage Layer

The engine must read Apache Iceberg metadata and stream Parquet files without loading them into memory. This requires a JS-native connector that handles snapshot isolation, manifest pruning, and byte-range requests.

import { IcebergConnector } from '@hyperparam/icebird';
import { ParquetReader } from '@hyperparam/hyparquet';

const storage = new IcebergConnector({
  endpoint: 'https://s3.us-east-1.amazonaws.com',
  bucket: 'agent-telemetry-prod',
  tablePath: 'warehouse/agent_traces',
  snapshotId: 'latest',
  credentials: process.env.AWS_CREDENTIALS
});

const reader = new ParquetReader({
  maxConcurrency: 4,
  prefetchRows: 256,
  compression: 'snappy'
});

Step 2: Define Async Semantic Predicates

Instead of blocking the event loop, LLM-based filters are registered as async UDFs. The engine treats them as deferred computations, scheduling them only when the query planner determines they are necessary.

import { SemanticEngine } from '@hyperparam/squirreling';

const semantic = new SemanticEngine({
  provider: 'openai',
  model: 'gpt-4o-mini',
  maxConcurrency: 8,
  tokenBudget: 50000
});

const isConfusionTrace = semantic.defineAsyncUdf({
  name: 'detect_confusion',
  prompt: 'Analyze the reasoning chain. Return true if the agent shows hesitation, contradictory tool calls, or recovery attempts.',
  inputField: 'reasoning_chain',
  cacheTtl: 3600
});

Step 3: Execute Lazy Query Pipeline

The query planner combines traditional filters with async UDFs. Downstream operators (like LIMIT or TOP_K) trigger upstream evaluation, ensuring expensive cells fire only when required.

async function* analyzeAgentFailures() {
  const stream = storage.scanTable({
    columns: ['trace_id', 'reasoning_chain', 'tool_calls', 'timestamp'],
    filters: [
      { field: 'timestamp', operator: '>=', value: '2024-01-01T00:00:00Z' },
      { field: 'status', operator: '!=', value: 'completed' }
    ]
  });

  const semanticStream = semantic.applyAsyncFilter(stream, isConfusionTrace);
  
  const sorted = semanticStream.sortBy({
    field: 'confidence_score',
    direction: 'desc',
    limit: 50
  });

  for await (const batch of sorted) {
    yield batch;
  }
}

Architecture Decisions & Rationale

Why async-native execution? JavaScript's single-threaded event loop cannot block on I/O. LLM calls are inherently asynchronous. By treating model invocations as scheduled tasks rather than synchronous functions, the engine prevents event loop starvation and allows concurrent token streaming. This matches the runtime reality of modern AI applications.

Why lazy evaluation? Traditional engines materialize rows before applying filters. When filters involve LLM calls, this burns tokens on irrelevant data. The Hyperparam stack uses a demand-driven execution model: downstream operators (like LIMIT 50) signal upstream nodes to stop evaluating once the threshold is met. Expensive cells remain dormant until explicitly requested.

Why bundle size under 70KB? Client-side agents run in ephemeral contexts: browser tabs, IDE extensions, or per-turn sandboxes. A heavy runtime increases cold-start latency and memory pressure. By splitting functionality into three focused libraries (storage reading, async scheduling, semantic execution), the engine stays lightweight while maintaining production-grade capabilities.

Why Iceberg + Parquet? Parquet provides columnar compression and predicate pushdown. Iceberg adds snapshot isolation, schema evolution, and partition pruning. Together, they enable safe, incremental reads from object storage without requiring a dedicated metadata server. The JS connector handles manifest resolution and byte-range fetching natively.

Pitfall Guide

1. Synchronous LLM Blocking in the Event Loop

Explanation: Developers often wrap LLM calls in synchronous-looking functions or await them without concurrency controls. This blocks the main thread, causing UI freezes or agent timeouts. Fix: Always use async generators or promise pools. Limit concurrency with a semaphore pattern and stream responses incrementally.

2. Over-Fetching Parquet Files Without Predicate Pushdown

Explanation: Reading entire Parquet files when only a few columns or rows are needed wastes bandwidth and memory. Fix: Configure column pruning and row group filtering at the connector level. Verify that Iceberg manifest pruning eliminates irrelevant partitions before streaming begins.

3. Ignoring Iceberg Snapshot Isolation

Explanation: Reading from a moving table without specifying a snapshot ID can return inconsistent or partially written data, especially in high-throughput agent environments. Fix: Always pin queries to a specific snapshotId or use timeTravel parameters. Implement retry logic with snapshot validation for long-running analytics jobs.

4. Token Waste from Unbounded Semantic Filtering

Explanation: Applying LLM-based filters to every row without downstream limits causes exponential token consumption. Fix: Use lazy evaluation to defer semantic calls until after traditional filters and LIMIT clauses are applied. Implement token budgets and early-exit conditions in the async UDF scheduler.

5. Memory Leaks from Streaming Large Agent Logs

Explanation: Accumulating streamed rows in memory instead of processing them incrementally leads to heap exhaustion in constrained JS runtimes. Fix: Process data in fixed-size batches. Use backpressure-aware streams that pause upstream fetching when downstream consumers lag.

6. Assuming DuckDB-WASM Scales for Async UDFs

Explanation: WASM engines excel at CPU-bound operations but struggle with I/O-bound async scheduling. They lack native event loop integration and often serialize async calls, negating concurrency benefits. Fix: Reserve WASM engines for pure numeric/columnar workloads. Use JS-native async schedulers for model-in-the-loop queries.

7. Schema Drift Breaking Columnar Reads

Explanation: Agent telemetry schemas evolve rapidly. Adding or renaming fields without Iceberg schema evolution support causes Parquet read failures. Fix: Enable Iceberg's schema evolution features. Use column mapping by ID rather than by name. Validate schema compatibility before deploying new agent versions.

Production Bundle

Action Checklist

Pin all analytics queries to explicit Iceberg snapshot IDs to guarantee consistency
Configure column pruning and row group filtering at the storage connector level
Implement async UDF schedulers with concurrency limits and token budgets
Use lazy evaluation to defer semantic filters until after traditional predicates and limits
Process streamed Parquet data in fixed-size batches with backpressure controls
Monitor event loop lag and async queue depth during peak agent activity
Validate schema evolution compatibility before deploying new telemetry formats

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Real-time semantic audit in browser/IDE	Async-Native JS Engine (Hyperparam)	Runs in-process, zero network latency, lazy evaluation minimizes token spend	Low (client-side compute + LLM tokens)
Batch historical analysis across millions of traces	Server-Side Spark/Trino	Optimized for distributed CPU workloads, handles massive scale efficiently	High (cloud compute + egress)
Numeric aggregations without model calls	DuckDB-WASM	Fast columnar execution, mature SQL dialect, low memory footprint	Low (WASM overhead only)
Ephemeral agent sandbox with cold-start constraints	Async-Native JS Engine	<70KB bundle, instant initialization, no external dependencies	Minimal (cold-start optimized)

Configuration Template

// analytics.config.ts
export const analyticsConfig = {
  storage: {
    type: 'iceberg',
    endpoint: process.env.OBJECT_STORAGE_ENDPOINT,
    bucket: process.env.TELEMETRY_BUCKET,
    tablePath: 'warehouse/agent_traces',
    snapshotStrategy: 'latest',
    maxRetries: 3,
    retryDelayMs: 200
  },
  parquet: {
    maxConcurrency: 4,
    prefetchRows: 256,
    compression: 'snappy',
    columnPruning: true,
    rowGroupFiltering: true
  },
  semantic: {
    provider: 'openai',
    model: 'gpt-4o-mini',
    maxConcurrency: 8,
    tokenBudget: 50000,
    cacheTtl: 3600,
    fallbackModel: 'gpt-3.5-turbo'
  },
  execution: {
    lazyEvaluation: true,
    backpressureThreshold: 1024,
    batchSize: 64,
    eventLoopMonitor: true
  }
};

Quick Start Guide

Install the runtime packages: Add @hyperparam/icebird, @hyperparam/hyparquet, and @hyperparam/squirreling to your project dependencies.
Configure storage credentials: Set environment variables for your object storage endpoint, bucket, and authentication keys.
Define your first async UDF: Register a semantic filter using SemanticEngine.defineAsyncUdf() with a prompt template and input field mapping.
Execute a lazy query: Chain traditional filters, async predicates, and a LIMIT clause. Iterate over the async generator to process results incrementally.
Monitor token usage and event loop health: Enable built-in telemetry to track async queue depth, token consumption, and backpressure events during runtime.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back