The Missing Layer in Agent Security
Current Situation Analysis
Agentic AI systems operate by chaining discrete tool calls into multi-step workflows. Traditional security architectures treat these systems like conventional microservices: they scan static configurations before deployment and enforce per-call policies at runtime. This two-layer model creates a critical blind spot. It evaluates actions in isolation, completely ignoring the temporal sequence that defines agent behavior.
The industry overlooks this gap because security engineering has historically focused on boundary protection. Input validation, rate limiting, and path restrictions work well for stateless APIs. Agents are stateful and autonomous. A single tool call rarely violates policy. The risk emerges from the trajectory: how actions compound, how data moves across steps, and whether the session aligns with its declared purpose.
Production incidents consistently demonstrate this failure mode. In a documented support-agent breach, three sequential actions passed every per-call check: reading account data, formatting it as CSV, and emailing it externally. Each step was individually permitted. The combined sequence constituted data exfiltration. The per-call proxy returned green checkmarks because it lacked session memory.
Regulatory frameworks now mandate what security teams have struggled to implement. Article 72 of the EU AI Act requires post-market monitoring for behavioral drift in high-risk systems. Singapore’s Model Governance Framework for Agentic AI (effective January 2026) explicitly requires kill-switch capability and plan logging. DORA demands four-hour incident reconstruction for financial services. None of these can be satisfied with isolated call validation. You need continuous trajectory scoring.
The attack surface has matured to exploit this exact gap. The postmark-mcp incident demonstrated a malicious MCP server that accumulated 15 legitimate versions before injecting exfiltration logic. The ToxicSkills campaign poisoned agent memory files to trigger delayed behavioral shifts. These attacks succeed because they mimic normal operation at the call level. They only reveal their intent when viewed across a session timeline.
WOW Moment: Key Findings
The fundamental shift occurs when security moves from evaluating individual actions to scoring behavioral trajectories. The following comparison illustrates why trajectory enforcement closes the gap that static analysis and per-call proxies leave open.
| Approach | Detection Scope | Temporal Awareness | Enforcement Granularity | Compliance Readiness |
|---|---|---|---|---|
| Static Config Scan | Pre-deployment only | None | Policy definition | Partial (audit trail) |
| Per-Call Proxy | Single action | None | Immediate block/rate-limit | Low (no session context) |
| Trajectory Envelope | Full session | Continuous scoring | Graduated response (warn/pause/kill) | High (drift logging, kill-switch, plan audit) |
This finding matters because it redefines how we secure autonomous systems. Per-call enforcement answers: Is this specific tool call allowed? Trajectory enforcement answers: Is this agent still performing its declared function? The latter enables compliance with post-market monitoring mandates, prevents compound exfiltration attacks, and provides forensic-grade session reconstruction. It shifts security from reactive filtering to proactive behavioral governance.
Core Solution
Trajectory enforcement operates by declaring expected behavior upfront, then continuously scoring runtime execution against that declaration. The implementation requires three components: a declarative envelope definition, a scoring engine that tracks session state, and a graduated response mechanism that integrates with existing per-call proxies.
Step-by-Step Implementation
-
Declare the Behavioral Envelope Define expected workflows, resource budgets, data flow constraints, and drift tolerances in a structured configuration file. This becomes the ground truth for runtime evaluation.
-
Initialize the Session Guard Wrap the agent framework with a session manager that intercepts tool calls, annotates data sources/destinations, and maintains a rolling trajectory log.
-
Score Each Step Against the Envelope Before executing a tool call, evaluate the current trajectory against workflow patterns, budget thresholds, repetition limits, velocity baselines, and cross-action data flow rules.
-
Apply Graduated Response Map the composite drift score to a response tier: allow, warn, pause for human review, or terminate the session. Compound violations should escalate severity automatically.
-
Propagate Kill Decisions When a session reaches termination threshold, inject a deny-all policy into the downstream per-call proxy. This ensures immediate enforcement across all layers.
New Code Implementation
The following TypeScript example demonstrates a trajectory guard integrated with an agent framework. The API surface differs from reference implementations but preserves equivalent functionality: agent-envelope, mcpfw, and agentspec remain the underlying runtime components.
import { TrajectoryGuard } from 'agent-envelope';
import { McpProxyClient } from 'mcpfw';
interface ToolCallPayload {
tool: string;
args: Record<string, unknown>;
dataSources?: string[];
dataTargets?: string[];
}
class AgentRuntime {
private guard: TrajectoryGuard;
private proxy: McpProxyClient;
constructor(envelopePath: string, proxyConfigPath: string) {
this.guard = new TrajectoryGuard(envelopePath, {
auditStream: 'audit/session-trace.jsonl',
scoringInterval: 'per-call'
});
this.proxy = new McpProxyClient(proxyConfigPath);
}
async executeStep(payload: ToolCallPayload): Promise<void> {
// Annotate data movement for cross-action tracking
const annotated = {
...payload,
metadata: { readSources: payload.dataSources || [], writeTargets: payload.dataTargets || [] } };
// Evaluate trajectory before execution
const verdict = await this.guard.evaluate(annotated);
if (verdict.severity >= 0.8) {
// Kill session and propagate deny-all to per-call layer
await this.guard.terminate();
await this.proxy.injectDenyAllPolicy();
throw new Error(`Session terminated: drift score ${verdict.compositeScore}`);
}
if (verdict.severity >= 0.6) {
// Pause for human review
await this.guard.holdForReview(verdict.reasons);
return;
}
if (verdict.severity >= 0.3) {
// Log warning but continue
console.warn(`[DRIFT_WARN] ${verdict.reasons.join(', ')}`);
}
// Proceed to per-call enforcement
await this.proxy.forward(annotated);
} }
### Architecture Decisions and Rationale
**Why place the envelope between the framework and the per-call proxy?**
The scoring engine requires full session context. Per-call proxies operate statelessly and cannot track cross-step data flow or workflow drift. Placing the envelope upstream ensures trajectory evaluation happens before individual calls reach the enforcement layer.
**Why use declarative YAML instead of programmatic rules?**
Declarative configurations are auditable, version-controlled, and decoupled from application logic. Security teams can modify bounds without redeploying agent code. This aligns with compliance requirements for policy transparency and change tracking.
**Why implement graduated response instead of binary allow/deny?**
Agents are probabilistic. Legitimate workflows occasionally deviate due to model variance or edge-case inputs. A hard block on first deviation causes false positives and breaks production reliability. Compound scoring with tiered thresholds preserves security while maintaining operational continuity.
**Why propagate kills to the per-call layer?**
Terminating the session guard alone leaves a race window where queued tool calls execute. Hot-reloading a deny-all policy into `mcpfw` ensures immediate, system-wide enforcement. This satisfies regulatory kill-switch requirements and prevents partial exfiltration during shutdown.
## Pitfall Guide
### 1. Over-Constraining Workflow Patterns
**Explanation:** Defining rigid step sequences without tolerance for model variance causes false positives. Agents frequently reorder non-dependent calls or skip optional steps.
**Fix:** Use glob patterns (`read_*`, `format_*`) and set `unknown_workflow_threshold` to 3+ steps before flagging drift. Allow parallel branches where business logic permits.
### 2. Ignoring Cross-Action Data Flow
**Explanation:** Tracking only immediate read/write pairs misses delayed exfiltration. An agent can read sensitive data at step 2 and write it externally at step 7, passing all intermediate checks.
**Fix:** Explicitly annotate `dataSources` and `dataTargets` on every call. Maintain a session-scoped data flow registry that validates forbidden destination mappings against historically read sources.
### 3. Hard-Killing on Single Threshold Breach
**Explanation:** Treating each metric independently allows attackers to stay just below detection limits. A velocity spike alone might be legitimate during peak load.
**Fix:** Implement composite scoring. Multiply independent violation severities by decay factors (e.g., `velocity * 0.7 + drift * 0.65 * 0.1`). Require multiple concurrent deviations before termination.
### 4. Static Budget Limits
**Explanation:** Fixed token, cost, or duration caps break under variable workload complexity. A simple query and a multi-document analysis cannot share identical budgets.
**Fix:** Implement dynamic budgeting based on task classification. Use historical session baselines to set percentile thresholds (e.g., 95th percentile of normal execution). Allow temporary overrides with explicit human approval.
### 5. Assuming Per-Call Proxy Covers Session Risks
**Explanation:** Rate limiting and path blocking prevent individual abuse but cannot detect workflow hijacking or prompt injection drift.
**Fix:** Treat per-call enforcement as a tactical layer and trajectory scoring as a strategic layer. They must operate in tandem. Feed proxy audit logs into the envelope engine for correlation.
### 6. Poor Audit Trail Design
**Explanation:** Logging only final decisions obscures forensic reconstruction. Compliance frameworks require step-level traceability with correlation IDs.
**Fix:** Write structured JSONL audit streams containing: session ID, step index, tool name, arguments, drift score, verdict, and timestamp. Store logs in immutable storage with cryptographic hashing for tamper evidence.
### 7. Neglecting Kill Propagation Mechanics
**Explanation:** Terminating the guard process leaves pending async calls in flight. Attackers exploit this window to complete exfiltration.
**Fix:** Implement synchronous kill propagation. When severity crosses 0.8, write a deny-all rule to the proxy configuration file, trigger a hot-reload, and await acknowledgment before releasing the session lock.
## Production Bundle
### Action Checklist
- [ ] Define behavioral envelope with explicit workflow patterns, budget caps, and data flow restrictions
- [ ] Instrument agent framework to annotate every tool call with read sources and write targets
- [ ] Configure composite scoring with decay factors to prevent threshold gaming
- [ ] Implement graduated response tiers (allow/warn/pause/kill) with compound escalation logic
- [ ] Set up immutable JSONL audit logging with session correlation IDs and cryptographic checksums
- [ ] Configure kill propagation to hot-reload deny-all policies into downstream per-call proxies
- [ ] Establish budget baselines using historical session data instead of arbitrary static limits
- [ ] Map envelope metrics to regulatory requirements (EU AI Act Art 72, Singapore framework, DORA)
### Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|----------|---------------------|-----|-------------|
| Internal operational tool | Trajectory envelope + per-call proxy | Balances security with developer velocity; prevents accidental data leaks | Low (standard enforcement) |
| Customer-facing support agent | Full envelope with strict data flow + human pause tier | High exfiltration risk; requires drift detection and kill-switch compliance | Medium (audit storage, review overhead) |
| Financial/regulated workflow | Envelope + proxy + immutable audit + DORA-aligned reconstruction | Mandatory post-market monitoring; 4-hour incident recovery requirement | High (compliance tooling, forensic logging) |
| Research/experimental agent | Lightweight envelope with warn-only tier | Allows model variance and workflow exploration without production risk | Low (monitoring only, no enforcement) |
### Configuration Template
```yaml
name: production-support-agent
version: 2.1
purpose: Resolve customer inquiries using knowledge base and account records
workflows:
- name: standard_resolution
pattern: ["search_kb", "read_account", "format_summary", "send_reply"]
max_steps: 12
tolerance: 0.25
- name: escalation_path
pattern: ["classify_issue", "create_ticket", "notify_team"]
max_steps: 6
tolerance: 0.20
bounds:
max_actions_per_session: 45
max_tokens_consumed: 120000
max_duration_seconds: 240
max_cost_usd: 1.50
data_flow:
forbidden_paths:
- source: "pii_account_data"
targets: ["email_external", "file_export", "webhook_public"]
- source: "internal_knowledge"
targets: ["api_external_untrusted"]
autonomy:
max_delegation_depth: 2
require_human_approval_for: ["write_production", "delete_record"]
drift:
unknown_workflow_threshold: 3
repetition:
max_identical_calls: 2
max_similar_calls: 8
velocity:
baseline_actions_per_minute: 15
spike_multiplier: 2.5
Quick Start Guide
-
Install the runtime components
pip install agent-envelope mcpfw agentspec -
Generate a baseline envelope
agentspec scan --config agent-config.yaml --output envelope.yaml -
Validate the configuration
agent-envelope validate --file envelope.yaml --strict -
Wrap your agent execution
agent-envelope run --envelope envelope.yaml -- python agent_runner.py -
Monitor session drift in real time
agent-envelope tail --audit audit/session-trace.jsonl --score-threshold 0.3
Trajectory enforcement transforms agent security from isolated validation to continuous behavioral governance. By declaring expected workflows, scoring runtime execution, and propagating kill decisions across layers, you close the gap that per-call proxies and static scans leave open. This architecture satisfies modern compliance mandates, prevents compound exfiltration attacks, and provides the forensic visibility required for production-grade autonomous systems.
