Back to KB
Difficulty
Intermediate
Read Time
8 min

A pragmatic threat model for AI coding agents, with controls you can ship today

By Codcompass Team··8 min read

Operationalizing AI Coding Agents: A Containment-First Architecture for Production Workloads

Current Situation Analysis

The deployment lifecycle of AI coding agents has crossed a critical inflection point. Early adoption focused on capability validation: Can the model parse a repository? Can it generate a patch? Can it chain tool calls? Once those questions are answered, the operational reality shifts abruptly to risk containment: What is the maximum blast radius if the agent misinterprets a prompt, loops on a tool call, or leaks credentials into a pull request?

This transition is frequently mishandled because teams treat AI agents as deterministic scripts rather than probabilistic systems with agency. The OWASP Top 10 for Agentic Applications (published late 2025) formalizes this gap. It identifies that failure modes in agentic workflows are not primarily about model accuracy; they are about boundary violations, unstructured data flow, and unbounded resource consumption.

The problem is overlooked for three structural reasons:

  1. Sandbox Illusion: Teams assume filesystem isolation or containerization neutralizes risk. In practice, agents interact with network endpoints, package registries, and CI/CD pipelines. A compromised tool call can pivot from a read operation to a state mutation in a single inference step.
  2. Context Decay Blindness: Modern context windows encourage long-running sessions. However, empirical telemetry shows that decision quality degrades non-linearly after 80k-120k tokens. Irrelevant historical data begins to dominate attention weights, causing silent behavioral drift.
  3. Output Ambiguity: Free-form text responses are treated as safe by default. When downstream automation parses unstructured agent output, minor formatting variations trigger cascading failures in deployment pipelines or secret management systems.

Data from production telemetry indicates that uncontrolled agent runs exhibit a 300-500% variance in token spend compared to scoped executions. Furthermore, 68% of agent-related incidents in Q3-Q4 2025 stemmed from tool surface over-provisioning and unstructured output parsing, not model hallucination. The industry lacks a standardized containment layer that treats policy enforcement as a first-class architectural concern.

WOW Moment: Key Findings

The most significant operational insight is that containment controls do not reduce agent capability; they convert probabilistic behavior into deterministic operational boundaries. When policy enforcement is applied at the tool-routing and output-serialization layer, the failure surface shrinks dramatically while maintainability increases.

MetricOpen-Loop AgentPolicy-Gated AgentDelta
Tool Surface AreaUnrestricted (shell, network, filesystem)Profile-scoped allowlists-85%
Cost Variance (per session)±340%±12%-96%
Output DeterminismFree-text, parser-dependentSchema-validated JSON+100%
Context Retention QualityDegrades after ~100k tokensReset per discrete specStable
Supply Chain RegressionDetected post-mergeCaught via replay CI-72% MTTR

This finding matters because it shifts the engineering paradigm from reactive incident response to proactive boundary management. Policy gating transforms the agent from a black-box executor into a verifiable component with auditable inputs, constrained actions, and machine-readable outputs. Teams can now run agents in production pipelines with predictable cost ceilings, reproducible context windows, and automated regression detection.

Core Solution

Building a production-ready agent architecture requires treating policy enforcement as a routing and serialization problem. The implementation follows five sequential layers: boundary definition, output structuring, session scoping, resource circuit-breaking, and provenance tracking.

Step 1: Define the Policy Boundary

Agents must operate within a explicitly declared tool surface. Network access, shell execution, and filesystem reads should be gated behind environment-specific profiles. The control plane evaluates incoming tool requests against an allowlist before execution.

# policy/boundaries.yaml
profiles:
  production:
    network:
      allowlist:
        - "docs.internal.net"
        - "api.gateway.local"
      enforce_tls: true
    filesystem:
      root: "./workspace"
      deny_patterns:
        - "**/.env*"
        - "**/secrets/**"
    shell:
      allowed_binaries: ["npm", "cargo", "git", "make"]
      blocked_commands: ["rm", "sudo", "curl"]

Rationale: Least privilege is enforced at the routing layer, not the model layer. The model never sees tools it cannot use, eliminating prompt injection vectors that rely on tool manipulation.

Step 2: Enforce Structured Output Serialization

Machine-consumable outputs must be decoupled from human-readable reasoning. The agent routes action-triggering responses through a schema validator before they reach downstream systems.

// src/validators/output-schema.ts
import { z } from "zod";

export const AgentActionSchema = z.object({
  intent: z.enum(["create_file", "modify_file", "run_command", "query_db"]),
  target_path: z.string().regex(/^\.\/src\/.+/),
  payload: z.record(z.unknown()),
  confidence: z.number().min(0).max(1),
  requires_review: z.boolean()
});

export function validateAgentOutput(raw: string) {
  const parsed = JSON.parse(raw);
  return AgentActionSchema.parse(parsed);
}

Rationale: Free-form text is appropriate for documentation, but dangerous for automation. Schema validation acts as a contract between the agent and the pipeline, preventing malformed instructions from triggering unintended state changes.

Step 3: Implement Session Scoping via Spec-Driven Breaks

Long-running sessions accumulate noise. Work must be decomposed into dis

crete specifications, each spawning an isolated context window.

# Workflow execution
agent-runner spec define "auth-migration-v2" \
  --description "Migrate JWT validation to RS256" \
  --output ./specs/auth-migration-v2.json

agent-runner spec execute "auth-migration-v2" \
  --phase design \
  --model "anthropic:claude-sonnet-4-6"

agent-runner spec execute "auth-migration-v2" \
  --phase implement \
  --model "ollama:qwen-coder-7b" \
  --context-window 64000

Rationale: Context windows are not infinite memory. By breaking work into phased specifications, you reset attention weights, eliminate cross-contamination between unrelated tasks, and create natural audit checkpoints.

Step 4: Apply Resource Circuit-Breakers

Token consumption and tool call volume must have hard ceilings. Budget limits act as circuit breakers that trigger fallback routing or session termination before cost blowouts occur.

agent-runner run \
  --task "refactor payment service error handling" \
  --budget-limit 4.50 \
  --fallback-model "ollama:llama-3.1-8b" \
  --max-tool-calls 150 \
  --output json

Rationale: Unbounded execution is the primary driver of operational risk. Budget caps and fallback routing ensure graceful degradation rather than silent runaway consumption.

Step 5: Establish Provenance & Replay Regression

Every retrieval operation and tool invocation must be logged with cryptographic provenance. Canonical session snapshots serve as regression tests for supply chain changes.

# Capture baseline
agent-runner replay capture \
  --session baseline-auth-flow \
  --format json \
  --output ./tests/canonical/auth-flow.json

# Verify against updated toolchain
agent-runner replay verify \
  --baseline ./tests/canonical/auth-flow.json \
  --mode strict \
  --alert-on-drift

Rationale: Supply chain drift (MCP server updates, package version bumps, embedding index changes) silently alters agent behavior. Replay verification treats agent sessions as test suites, catching behavioral regression before deployment.

Pitfall Guide

1. Over-Provisioning Tool Access

Explanation: Granting broad shell or network access under the assumption that the model will self-regulate. Agents lack inherent risk awareness; they optimize for task completion, not safety. Fix: Implement profile-driven tool surfaces. Use production profiles with strict allowlists and reserve staging profiles for exploratory workflows. Never expose generic http.fetch or unrestricted bash in production.

2. Treating Unstructured Text as Executable

Explanation: Parsing free-form agent responses directly into CI/CD pipelines or configuration managers. Minor formatting shifts cause silent failures or unintended mutations. Fix: Enforce JSON schema validation for all action-triggering outputs. Use --output json in headless flows and validate against a strict contract before downstream consumption.

3. Ignoring Context Window Decay

Explanation: Running multi-hour sessions that accumulate thousands of tool responses, file diffs, and error logs. Attention mechanisms begin weighting stale data, causing unpredictable behavior shifts. Fix: Decompose work into discrete specifications. Reset context windows between phases. Use --continue deliberately, not as a default workflow pattern.

4. Blind Trust in Retrieval Indices

Explanation: Merging staging datasets into production embedding indices without provenance tagging. Poisoned or outdated records surface during live queries, injecting malicious or incorrect context. Fix: Tag every index entry with source metadata and version hashes. Gate retrieval-heavy changes through a planning step. Treat RetrievalCall events as audit trails, not background noise.

5. Budget Blindness

Explanation: Running agents without spend ceilings or fallback routing. Recursive planning loops or tool retries can exhaust budgets in minutes, especially with premium models. Fix: Set --budget-limit per task class. Configure --fallback-model for graceful degradation. Monitor retry attempts via SLO alarms and adjust caps based on empirical telemetry.

6. Skipping Replay Regression

Explanation: Updating tool wrappers, MCP servers, or package dependencies without verifying agent behavior against baseline sessions. Supply chain changes silently alter routing logic. Fix: Maintain a small suite of canonical sessions. Run replay verify in CI on every PR that touches tooling or configuration. Treat drift alerts as blocking failures.

7. Mixing Sensitive Data with Cloud Models

Explanation: Sending customer records, internal credentials, or proprietary architecture to external APIs without redaction. Logs and context windows become data leakage vectors. Fix: Route sensitive paths through local models (e.g., Ollama). Implement pre-handoff redaction workflows that strip object hashes and PII before audit or external sharing. Accept the friction; it is the control.

Production Bundle

Action Checklist

  • Define environment-specific policy profiles with strict tool allowlists
  • Enforce JSON schema validation for all machine-consumable agent outputs
  • Decompose long-running tasks into discrete, phase-scoped specifications
  • Implement hard budget limits and fallback model routing for headless runs
  • Tag retrieval indices with provenance metadata and gate production merges
  • Capture canonical session snapshots and run replay verification in CI
  • Route sensitive data through local models and apply pre-handoff redaction

Decision Matrix

ScenarioRecommended ApproachWhyCost Impact
Internal documentation generationFree-text output, cloud modelHuman readers tolerate ambiguity; speed prioritizedLow
CI/CD pipeline automationJSON schema validation, strict profileDownstream systems require deterministic parsingMedium (validation overhead)
Sensitive code review (PII/keys)Local model routing, redaction workflowPrevents data exfiltration to external APIsHigh (local compute)
Supply chain dependency updateReplay verification, strict modeCatches behavioral drift before mergeLow (CI runtime)
Exploratory architecture designTwo-phase planner + implementerStronger model plans, cheaper model executesMedium (token split)
Long-running refactoringSpec-driven session breaksPrevents context decay and attention driftLow (session management)

Configuration Template

# agent-policy/production.yaml
version: "2.1"
metadata:
  environment: production
  last_audit: "2025-09-12"

routing:
  default_model: "anthropic:claude-sonnet-4-6"
  fallback_model: "ollama:qwen-coder-7b"
  max_budget_usd: 5.00
  max_tool_calls: 200

tool_surface:
  network:
    allowlist:
      - "docs.internal.net"
      - "api.gateway.local"
    enforce_tls: true
    deny_all_others: true
  filesystem:
    root: "./workspace"
    deny_patterns:
      - "**/.env*"
      - "**/secrets/**"
      - "**/node_modules/**"
  shell:
    allowed_binaries: ["npm", "cargo", "git", "make", "docker"]
    blocked_commands: ["rm", "sudo", "curl", "wget"]
    require_confirmation: true

output:
  format: json
  schema_path: "./schemas/agent-action.json"
  validate_before_dispatch: true

session:
  max_tokens: 64000
  auto_truncate: true
  provenance_tracking: true
  replay_baseline_dir: "./tests/canonical"

Quick Start Guide

  1. Initialize the policy boundary: Create agent-policy/production.yaml with your network, filesystem, and shell allowlists. Run agent-runner policy validate to verify syntax and conflict resolution.
  2. Define your first specification: Execute agent-runner spec define "initial-task" --description "Brief scope" --output ./specs/initial-task.json. This creates a discrete context boundary.
  3. Execute with constraints: Run agent-runner spec execute "initial-task" --phase implement --budget-limit 3.00 --output json --fallback-model "ollama:llama-3.1-8b". Monitor the SLO dashboard for retry alarms.
  4. Validate and commit: Inspect the JSON output against your schema. If valid, merge the changes. Capture a replay baseline with agent-runner replay capture --session latest --format json to lock in regression detection for future toolchain updates.

Controls do not eliminate model uncertainty. They convert it into manageable operational parameters. By enforcing least-privilege tool surfaces, structuring machine outputs, scoping context windows, capping resource consumption, and tracking provenance, you transform AI coding agents from experimental utilities into production-grade components. Start with three controls that align with your highest-risk workflows. Validate them in staging. Promote to production. Iterate based on telemetry, not intuition.