Difficulty

Intermediate

Read Time

8 min

The Missing Layer in Agent Security

By Codcompass Team·2026-05-13·8 min read

Current Situation Analysis

Agentic AI systems operate by chaining discrete tool calls into multi-step workflows. Traditional security architectures treat these systems like conventional microservices: they scan static configurations before deployment and enforce per-call policies at runtime. This two-layer model creates a critical blind spot. It evaluates actions in isolation, completely ignoring the temporal sequence that defines agent behavior.

The industry overlooks this gap because security engineering has historically focused on boundary protection. Input validation, rate limiting, and path restrictions work well for stateless APIs. Agents are stateful and autonomous. A single tool call rarely violates policy. The risk emerges from the trajectory: how actions compound, how data moves across steps, and whether the session aligns with its declared purpose.

Production incidents consistently demonstrate this failure mode. In a documented support-agent breach, three sequential actions passed every per-call check: reading account data, formatting it as CSV, and emailing it externally. Each step was individually permitted. The combined sequence constituted data exfiltration. The per-call proxy returned green checkmarks because it lacked session memory.

Regulatory frameworks now mandate what security teams have struggled to implement. Article 72 of the EU AI Act requires post-market monitoring for behavioral drift in high-risk systems. Singapore’s Model Governance Framework for Agentic AI (effective January 2026) explicitly requires kill-switch capability and plan logging. DORA demands four-hour incident reconstruction for financial services. None of these can be satisfied with isolated call validation. You need continuous trajectory scoring.

The attack surface has matured to exploit this exact gap. The postmark-mcp incident demonstrated a malicious MCP server that accumulated 15 legitimate versions before injecting exfiltration logic. The ToxicSkills campaign poisoned agent memory files to trigger delayed behavioral shifts. These attacks succeed because they mimic normal operation at the call level. They only reveal their intent when viewed across a session timeline.

WOW Moment: Key Findings

The fundamental shift occurs when security moves from evaluating individual actions to scoring behavioral trajectories. The following comparison illustrates why trajectory enforcement closes the gap that static analysis and per-call proxies leave open.

Approach	Detection Scope	Temporal Awareness	Enforcement Granularity	Compliance Readiness
Static Config Scan	Pre-deployment only	None	Policy definition	Partial (audit trail)
Per-Call Proxy	Single action	None	Immediate block/rate-limit	Low (no session context)
Trajectory Envelope	Full session	Continuous scoring	Graduated response (warn/pause/kill)	High (drift logging, kill-switch, plan audit)

This finding matters because it redefines how we secure autonomous systems. Per-call enforcement answers: Is this specific tool call allowed? Trajectory enforcement answers: Is this agent still performing its declared function? The latter enables compliance with post-market monitoring mandates, prevents compound exfiltration attacks, and provides forensic-grade session reconstruction. It shifts security from reactive filtering to proactive behavioral governance.

Core Solution

Trajectory enforcement operates by declaring expected behavior upfront, then continuously scoring runtime execution against that declaration. The implementation requires three components: a declarative envelope definition, a scoring engine

that tracks session state, and a graduated response mechanism that integrates with existing per-call proxies.

Step-by-Step Implementation

Declare the Behavioral Envelope Define expected workflows, resource budgets, data flow constraints, and drift tolerances in a structured configuration file. This becomes the ground truth for runtime evaluation.
Initialize the Session Guard Wrap the agent framework with a session manager that intercepts tool calls, annotates data sources/destinations, and maintains a rolling trajectory log.
Score Each Step Against the Envelope Before executing a tool call, evaluate the current trajectory against workflow patterns, budget thresholds, repetition limits, velocity baselines, and cross-action data flow rules.
Apply Graduated Response Map the composite drift score to a response tier: allow, warn, pause for human review, or terminate the session. Compound violations should escalate severity automatically.
Propagate Kill Decisions When a session reaches termination threshold, inject a deny-all policy into the downstream per-call proxy. This ensures immediate enforcement across all layers.

New Code Implementation

The following TypeScript example demonstrates a trajectory guard integrated with an agent framework. The API surface differs from reference implementations but preserves equivalent functionality: agent-envelope, mcpfw, and agentspec remain the underlying runtime components.

import { TrajectoryGuard } from 'agent-envelope';
import { McpProxyClient } from 'mcpfw';

interface ToolCallPayload {
  tool: string;
  args: Record<string, unknown>;
  dataSources?: string[];
  dataTargets?: string[];
}

class AgentRuntime {
  private guard: TrajectoryGuard;
  private proxy: McpProxyClient;

  constructor(envelopePath: string, proxyConfigPath: string) {
    this.guard = new TrajectoryGuard(envelopePath, {
      auditStream: 'audit/session-trace.jsonl',
      scoringInterval: 'per-call'
    });
    this.proxy = new McpProxyClient(proxyConfigPath);
  }

  async executeStep(payload: ToolCallPayload): Promise<void> {
    // Annotate data movement for cross-action tracking
    const annotated = {
      ...payload,
      metadata: {
        readSources: payload.dataSources || [],
        writeTargets: payload.dataTargets || []
      }
    };

    // Evaluate trajectory before execution
    const verdict = await this.guard.evaluate(annotated);

    if (verdict.severity >= 0.8) {
      // Kill session and propagate deny-all to per-call layer
      await this.guard.terminate();
      await this.proxy.injectDenyAllPolicy();
      throw new Error(`Session terminated: drift score ${verdict.compositeScore}`);
    }

    if (verdict.severity >= 0.6) {
      // Pause for human review
      await this.guard.holdForReview(verdict.reasons);
      return;
    }

    if (verdict.severity >= 0.3) {
      // Log warning but continue
      console.warn(`[DRIFT_WARN] ${verdict.reasons.join(', ')}`);
    }

    // Proceed to per-call enforcement
    await this.proxy.forward(annotated);
  }
}

Architecture Decisions and Rationale

Why place the envelope between the framework and the per-call proxy? The scoring engine requires full session context. Per-call proxies operate statelessly and cannot track cross-step data flow or workflow drift. Placing the envelope upstream ensures trajectory evaluation happens before individual calls reach the enforcement layer.

Why use declarative YAML instead of programmatic rules? Declarative configurations are auditable, version-controlled, and decoupled from application logic. Security teams can modify bounds without redeploying agent code. This aligns with compliance requirements for policy transparency and change tracking.

Why implement graduated response instead of binary allow/deny? Agents are probabilistic. Legitimate workflows occasionally deviate due to model variance or edge-case inputs. A hard block on first deviation causes false positives and breaks production reliability. Compound scoring with tiered thresholds preserves security while maintaining operational continuity.

Why propagate kills to the per-call layer? Terminating the session guard alone leaves a race window where queued tool calls execute. Hot-reloading a deny-all policy into mcpfw ensures immediate, system-wide enforcement. This satisfies regulatory kill-switch requirements and prevents partial exfiltration during shutdown.

Pitfall Guide

1. Over-Constraining Workflow Patterns

Explanation: Defining rigid step sequences without tolerance for model variance causes false positives. Agents frequently reorder non-dependent calls or skip optional steps. Fix: Use glob patterns (read_*, format_*) and set unknown_workflow_threshold to 3+ steps before flagging drift. Allow parallel branches where business logic permits.

2. Ignoring Cross-Action Data Flow

Explanation: Tracking only immediate read/write pairs misses delayed exfiltration. An agent can read sensitive data at step 2 and write it externally at step 7, passing all intermediate checks. Fix: Explicitly annotate dataSources and dataTargets on every call. Maintain a session-scoped data flow registry that validates forbidden destination mappings against historically read sources.

3. Hard-Killing on Single Threshold Breach

Explanation: Treating each metric independently allows attackers to stay just below detection limits. A velocity spike alone might be legitimate during peak load. Fix: Implement composite scoring. Multiply independent violation severities by decay factors (e.g., velocity * 0.7 + drift * 0.65 * 0.1). Require multiple concurrent deviations before termination.

4. Static Budget Limits

Explanation: Fixed token, cost, or duration caps break under variable workload complexity. A simple query and a multi-document analysis cannot share identical budgets. Fix: Implement dynamic budgeting based on task classification. Use historical session baselines to set percentile thresholds (e.g., 95th percentile of normal execution). Allow temporary overrides with explicit human approval.

5. Assuming Per-Call Proxy Covers Session Risks

Explanation: Rate limiting and path blocking prevent individual abuse but cannot detect workflow hijacking or prompt injection drift. Fix: Treat per-call enforcement as a tactical layer and trajectory scoring as a strategic layer. They must operate in tandem. Feed proxy audit logs into the envelope engine for correlation.

6. Poor Audit Trail Design

Explanation: Logging only final decisions obscures forensic reconstruction. Compliance frameworks require step-level traceability with correlation IDs. Fix: Write structured JSONL audit streams containing: session ID, step index, tool name, arguments, drift score, verdict, and timestamp. Store logs in immutable storage with cryptographic hashing for tamper evidence.

7. Neglecting Kill Propagation Mechanics

Explanation: Terminating the guard process leaves pending async calls in flight. Attackers exploit this window to complete exfiltration. Fix: Implement synchronous kill propagation. When severity crosses 0.8, write a deny-all rule to the proxy configuration file, trigger a hot-reload, and await acknowledgment before releasing the session lock.

Production Bundle

Action Checklist

Define behavioral envelope with explicit workflow patterns, budget caps, and data flow restrictions
Instrument agent framework to annotate every tool call with read sources and write targets
Configure composite scoring with decay factors to prevent threshold gaming
Implement graduated response tiers (allow/warn/pause/kill) with compound escalation logic
Set up immutable JSONL audit logging with session correlation IDs and cryptographic checksums
Configure kill propagation to hot-reload deny-all policies into downstream per-call proxies
Establish budget baselines using historical session data instead of arbitrary static limits
Map envelope metrics to regulatory requirements (EU AI Act Art 72, Singapore framework, DORA)

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Internal operational tool	Trajectory envelope + per-call proxy	Balances security with developer velocity; prevents accidental data leaks	Low (standard enforcement)
Customer-facing support agent	Full envelope with strict data flow + human pause tier	High exfiltration risk; requires drift detection and kill-switch compliance	Medium (audit storage, review overhead)
Financial/regulated workflow	Envelope + proxy + immutable audit + DORA-aligned reconstruction	Mandatory post-market monitoring; 4-hour incident recovery requirement	High (compliance tooling, forensic logging)
Research/experimental agent	Lightweight envelope with warn-only tier	Allows model variance and workflow exploration without production risk	Low (monitoring only, no enforcement)

Configuration Template

name: production-support-agent
version: 2.1
purpose: Resolve customer inquiries using knowledge base and account records

workflows:
  - name: standard_resolution
    pattern: ["search_kb", "read_account", "format_summary", "send_reply"]
    max_steps: 12
    tolerance: 0.25
  - name: escalation_path
    pattern: ["classify_issue", "create_ticket", "notify_team"]
    max_steps: 6
    tolerance: 0.20

bounds:
  max_actions_per_session: 45
  max_tokens_consumed: 120000
  max_duration_seconds: 240
  max_cost_usd: 1.50

data_flow:
  forbidden_paths:
    - source: "pii_account_data"
      targets: ["email_external", "file_export", "webhook_public"]
    - source: "internal_knowledge"
      targets: ["api_external_untrusted"]

autonomy:
  max_delegation_depth: 2
  require_human_approval_for: ["write_production", "delete_record"]

drift:
  unknown_workflow_threshold: 3
  repetition:
    max_identical_calls: 2
    max_similar_calls: 8
  velocity:
    baseline_actions_per_minute: 15
    spike_multiplier: 2.5

Quick Start Guide

Install the runtime components

pip install agent-envelope mcpfw agentspec

Generate a baseline envelope

agentspec scan --config agent-config.yaml --output envelope.yaml

Validate the configuration

agent-envelope validate --file envelope.yaml --strict

Wrap your agent execution

agent-envelope run --envelope envelope.yaml -- python agent_runner.py

Monitor session drift in real time

agent-envelope tail --audit audit/session-trace.jsonl --score-threshold 0.3

Trajectory enforcement transforms agent security from isolated validation to continuous behavioral governance. By declaring expected workflows, scoring runtime execution, and propagating kill decisions across layers, you close the gap that per-call proxies and static scans leave open. This architecture satisfies modern compliance mandates, prevents compound exfiltration attacks, and provides the forensic visibility required for production-grade autonomous systems.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back