Back to KB
Difficulty
Intermediate
Read Time
8 min

Building Sentinel: A WAF for AI Agents with Genkit

By Codcompass Team··8 min read

Runtime Guardrails for LLM Agents: Implementing a Policy-Driven Security Middleware

Current Situation Analysis

Modern AI agent architectures operate in fundamentally open environments. Unlike traditional web applications that validate HTTP payloads against strict schemas, agents consume unbounded text from users, external APIs, vector memory stores, and tool outputs. Every interaction surface is a potential attack vector. Prompt injection, context poisoning, and tool misuse are not rare edge cases; they are expected behaviors in systems designed to interpret and execute natural language instructions.

The industry has largely overlooked this gap because security efforts have historically concentrated on model selection, prompt engineering, and infrastructure hardening. Traditional Web Application Firewalls (WAFs) protect network boundaries, but they cannot inspect semantic intent or intermediate agent states. Consequently, production agents frequently execute malicious instructions embedded in retrieved documents, exfiltrate sensitive context through tool calls, or bypass system prompts through carefully crafted user inputs.

Deterministic pattern analysis with weighted scoring has emerged as a reliable mitigation strategy. By assigning risk weights to known attack signatures, teams can establish predictable thresholds that map directly to enforcement actions. Empirical scoring models demonstrate that injection directives typically carry a base weight of ~30 points, hidden unicode or whitespace manipulation adds ~20 points, encoded payload blobs contribute ~35-40 points, and explicit data exfiltration attempts trigger ~80 points. These weights align with graduated enforcement tiers: 0-20 (ALLOW), 21-50 (WARN), 51-80 (SANITIZE), and 81-100 (BLOCK). This approach provides deterministic security without sacrificing the flexibility required for dynamic agent workflows.

WOW Moment: Key Findings

The following comparison highlights why a deterministic, scored middleware outperforms naive security approaches in production agent environments.

ApproachLatency OverheadPredictabilityFalse Positive RateRemediation Capability
Traditional WAF<5msHighLowNetwork-level only
LLM-as-Judge800-2000msMediumHighContext-aware but non-deterministic
Deterministic Middleware15-40msHighLow-MediumFull pipeline interception + sanitization

Why this matters: Agent runtimes require sub-50ms security checks to maintain conversational fluidity. LLM-based judges introduce unacceptable latency and hallucination risks, while traditional WAFs lack semantic visibility. A deterministic middleware with weighted scoring delivers millisecond-level enforcement, predictable policy application, and graduated remediation (sanitization instead of hard blocks). This enables teams to ship secure agents without degrading user experience or incurring prohibitive inference costs.

Core Solution

Building a production-ready security middleware requires four interconnected components: a scoring engine, a policy resolver, a sanitization pipeline, and a tool execution guard. The architecture intercepts every data surface before it reaches the model or downstream systems.

Step 1: Define the Interception Interface

Agents process multiple data surfaces: user prompts, system instructions, memory retrievals, tool arguments, and model outputs. Each surface requires a unified evaluation contract.

interface SecuritySurface {
  type: 'user_prompt' | 'system_prompt' | 'memory_context' | 'tool_args' | 'model_output';
  payload: string;
  metadata?: Record<string, unknown>;
}

interface RiskSignal {
  category: string;
  weight: number;
  

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register — Start Free Trial

7-day free trial · Cancel anytime · 30-day money-back