Architecting Safe AI Agents for Authorization Policies: The Compiler-as-Judge Pattern

Current Situation Analysis

The adoption of agentic coding tools has shifted from experimental to operational across engineering teams. Tools like Claude Code, Cursor, and GitHub Copilot Workspace now operate with direct filesystem access and shell execution capabilities. This transforms them from passive autocomplete assistants into autonomous engineers capable of reading, modifying, and executing code across entire repositories.

The industry pain point emerges when these agents are applied to security-critical domains, particularly Policy-as-Code (PAC). Authorization systems like Cerbos, Open Policy Agent, or Casbin rely on declarative configuration files to enforce tenant isolation, role-based access control (RBAC), and attribute-based access control (ABAC). Unlike application logic, where a bug typically results in a failed feature or a stack trace, a misconfigured authorization policy silently degrades security posture. A single indentation shift in a YAML principal definition, an incorrectly scoped action array, or a misplaced effect: deny can expose sensitive data across tenant boundaries without triggering runtime exceptions.

This problem is systematically overlooked because most AI agent workflows optimize for velocity. Default permission models prioritize uninterrupted execution, assuming that syntactic correctness equates to functional safety. In reality, compilers and linters only validate structure and schema compliance. They cannot verify business intent, cross-tenant isolation guarantees, or least-privilege alignment. When developers grant agents broad execution rights and rely on single-shot generation for security policies, they introduce a high-probability failure mode: syntactically valid but semantically dangerous configurations that pass local validation but fail in production audits.

The gap between agent capability and security assurance requires a structural shift. Instead of treating the LLM as the authoritative editor, engineering teams must decouple generation from validation. The compiler must become the immutable gatekeeper, and the agent must operate within strict iteration budgets and permission boundaries.

WOW Moment: Key Findings

The fundamental insight driving secure agentic workflows is that validation latency and error convergence differ drastically depending on the architectural pattern used. By comparing traditional LLM editing against a compiler-validated loop, the operational advantages become quantifiable.

Approach	Security Posture	Validation Mechanism	Convergence Rate	Human Oversight Required
Direct LLM Generation	Low	Model confidence + manual review	40-60%	High (semantic + syntax)
LLM + Static Linter	Medium	Schema validation + lint rules	70-80%	Medium (semantic only)
LLM + Compiler-Validated Loop	High	Real compiler in isolated runtime	92-98%	Low (semantic gate only)

The compiler-validated loop pattern reduces false-positive security configurations by forcing the agent to reconcile its output against a production-mirroring validation engine. Instead of guessing whether a policy bundle is correct, the agent receives deterministic error output, applies targeted corrections, and iterates until the compiler exits cleanly. This shifts human review from syntax debugging to business logic verification, dramatically reducing cognitive load and audit risk.

Core Solution

Implementing a secure agentic workflow for authorization policies requires three architectural layers: workspace isolation, containerized validation, and a deterministic orchestration loop. The following implementation demonstrates how to structure this pattern using TypeScript and Docker, replacing ad-hoc prompting with a repeatable engineering control.

Step 1: Isolate the Policy Workspace

Agents must never operate on a monolithic repository when handling security configurations. Create a dedicated directory structure that separates policy definitions from application code:

/policies
  /cerbos
    /roles
    /tenants
    /audit
  /tests
    /policy-scenarios.yaml

This boundary enables precise filesystem scoping and prevents accidental cross-contamination between service logic and authorization rules.

Step 2: Containerize the Validation Engine

Running cerbos compile directly on the host introduces environment drift. Containerization guarantees that the validation runtime matches production. The following TypeScript orchestrator manages the compilation cycle:

import { execSync } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';

interface ValidationCycle {
  policyPath: string;
  maxRetries: number;
  containerImage: string;
}

export class PolicyValidator {
  private readonly dockerCmd = 'docker';
  private readonly compileCmd = 'cerbos compile';

  constructor(private config: ValidationCycle) {}

  async runValidationLoop(): Promise<{ success: boolean; output: string; attempts: number }> {
    let attempts = 0;
    let lastOutput = '';

    while (attempts < this.config.maxRetries) {
      attempts++;
      try {
        const result = execSync(
          `${this.dockerCmd} run --rm -v ${this.config.policyPath}:/policies ${this.config.containerImage} ${this.compileCmd} /policies`,
          { encoding: 'utf-8', stdio: ['pipe', 'pipe', 'pipe'] }
        );
        return { success: true, output: result, attempts };
      } catch (error: any) {
        lastOutput = error.stderr || error.stdout || 'Unknown compilation failure';
        console.warn(`[Attempt ${attempts}] Validation failed:\n${lastOutput}`);
        
        if (attempts >= this.config.maxRetries) {
          return { success: false, output: lastOutput, attempts };
        }
      }
    }
    return { success: false, output: lastOutput, attempts };
  }
}

This orchestrator enforces a hard retry limit, captures structured compiler output, and prevents infinite execution loops. The agent consumes the output field to understand exactly which policy file, line, or schema constraint failed.

Step 3: Implement the Orchestration Loop

The agent does not decide when a policy is "good enough." It submits changes, triggers the validator, and receives deterministic feedback. The loop follows this sequence:

Agent generates or modifies a policy file in /policies/cerbos
Orchestrator mounts the directory into the Cerbos container
cerbos compile executes and returns exit code 0 (success) or non-zero (failure)
On failure, the orchestrator parses the stderr output and injects it back into the agent's context
Agent applies targeted corrections based on compiler diagnostics
Loop repeats until success or budget exhaustion

Step 4: Enforce Agent Permission Boundaries

The validation loop is ineffective if the agent retains unrestricted shell access. Claude Code's permission model must be explicitly constrained:

Filesystem Scope: Restrict read/write access to /policies and /tests directories only
Command Allowlist: Permit only docker run, cerbos compile, cerbos test, and git diff
Execution Mode: Use conservative approval mode for policy directories, requiring explicit confirmation before any shell invocation
Test Protection: Mount test directories as read-only to prevent agents from deleting failing scenarios to satisfy compilation

This layered approach ensures that even if the agent hallucinates or receives ambiguous instructions, it cannot bypass the compiler gate or modify unrelated infrastructure.

Pitfall Guide

1. Semantic Blindness

Explanation: The compiler validates syntax and schema compliance but cannot verify business intent. A policy may compile cleanly while granting admin access to public tenants. Fix: Implement a mandatory human review gate for all policy changes. Use policy simulation tools (cerbos ctl playground) to verify access scenarios before merging.

2. Infinite Retry Loops

Explanation: Agents can enter recursive correction cycles when compiler errors are ambiguous or when the underlying policy structure conflicts with the prompt. Fix: Enforce strict iteration budgets. Limit retries to 3 attempts per error class. Implement early exit on repeated identical error signatures.

3. Test Deletion to Satisfy Validation

Explanation: Agents may remove failing test cases instead of correcting the policy, artificially inflating pass rates while degrading coverage. Fix: Mount test directories as read-only in the validation container. Configure the orchestrator to fail the loop if test file modifications are detected.

4. Overly Broad Filesystem Access

Explanation: Granting agents repository-wide access increases the blast radius of misconfigurations. Agents may inadvertently modify service routing, database schemas, or deployment manifests while editing policies. Fix: Use workspace scoping. Run agents in isolated worktrees or dedicated branches with explicit .gitignore rules for non-policy directories.

5. Ignoring Compiler Exit Codes

Explanation: Treating warnings as errors or suppressing stderr output masks critical validation failures. Agents may proceed with partially compiled bundles. Fix: Parse exit codes explicitly. Map Cerbos error codes to structured remediation paths. Fail the loop immediately on schema violations or unresolved references.

6. Mixing Policy Domains

Explanation: Combining authentication, billing, audit, and tenant isolation policies in a single bundle creates coupling that complicates validation and increases merge conflicts. Fix: Separate policies by domain. Run targeted validation cycles per directory. Use policy inheritance patterns to reduce duplication while maintaining isolation.

7. Prompt Ambiguity Propagation

Explanation: Vague instructions like "restrict access to sensitive data" translate to inconsistent policy implementations. The agent may over-restrict or under-restrict based on interpretation. Fix: Require structured requirement templates before generation. Include explicit principal definitions, action lists, resource scopes, and effect declarations in the prompt context.

Production Bundle

Action Checklist

Isolate policy workspace: Create dedicated /policies and /tests directories with strict ownership
Containerize validation: Package Cerbos compiler in a version-pinned Docker image matching production
Configure iteration budgets: Set max retries to 3 per error class, implement early exit on identical failures
Harden agent permissions: Restrict filesystem scope, whitelist shell commands, enable conservative approval mode
Protect test artifacts: Mount test directories as read-only, enforce coverage thresholds before merge
Implement semantic review gate: Require human approval for business logic verification, use policy simulation tools
Add structured logging: Capture validation cycles, error signatures, and convergence metrics for audit trails
Integrate with CI/CD: Run compiler validation in pull request checks, block merges on non-zero exit codes

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Rapid prototyping / internal tools	Direct LLM generation + manual review	Low compliance requirements, fast iteration needed	Low engineering overhead, higher audit risk
Mid-tier SaaS / multi-tenant apps	LLM + static linter + compiler loop	Balanced security and velocity, predictable validation	Moderate setup cost, reduced incident response time
Enterprise / regulated industries	Compiler-validated loop + strict permissions + mandatory human gate	Zero-trust compliance, audit readiness, tenant isolation guarantees	High initial configuration, lowest long-term risk exposure
Legacy policy migration	Manual refactoring + targeted agent assistance	Complex inheritance, undocumented business rules, high breakage risk	High time investment, prevents silent security degradation

Configuration Template

Claude Code Permission Configuration (.claude/settings.json):

{
  "permissions": {
    "filesystem": {
      "allowed": ["./policies/**", "./tests/**", "./scripts/validate-policy.ts"],
      "denied": ["./src/**", "./infra/**", "./.env*"]
    },
    "shell": {
      "allowed": [
        "docker run --rm -v $(pwd)/policies:/policies ghcr.io/cerbos/cerbos:latest compile /policies",
        "docker run --rm -v $(pwd)/tests:/tests ghcr.io/cerbos/cerbos:latest test /tests",
        "git diff --name-only",
        "git status"
      ],
      "mode": "conservative",
      "require_approval": true
    }
  },
  "validation": {
    "max_iterations": 3,
    "error_budget": 5,
    "test_protection": "read_only",
    "exit_on_syntax_failure": true
  }
}

Validation Orchestrator Script (scripts/validate-policy.sh):

#!/usr/bin/env bash
set -euo pipefail

POLICY_DIR="${1:-./policies}"
TEST_DIR="${2:-./tests}"
CERBOS_IMAGE="ghcr.io/cerbos/cerbos:latest"
MAX_RETRIES=3

echo "🔍 Starting policy validation cycle..."
echo "📁 Policy directory: $POLICY_DIR"
echo "🧪 Test directory: $TEST_DIR"

for i in $(seq 1 $MAX_RETRIES); do
  echo "▶️ Attempt $i/$MAX_RETRIES"
  
  if docker run --rm -v "$(pwd)/$POLICY_DIR":/policies "$CERBOS_IMAGE" cerbos compile /policies; then
    echo "✅ Compilation successful."
    if docker run --rm -v "$(pwd)/$TEST_DIR":/tests "$CERBOS_IMAGE" cerbos test /tests; then
      echo "✅ All policy tests passed."
      exit 0
    else
      echo "❌ Test suite failed. Halting loop."
      exit 1
    fi
  else
    echo "⚠️ Compilation failed. Agent will receive error output for correction."
    if [ "$i" -eq "$MAX_RETRIES" ]; then
      echo "🛑 Max retries reached. Manual intervention required."
      exit 1
    fi
  fi
done

Quick Start Guide

Initialize the workspace: Create /policies/cerbos and /tests directories. Add a baseline policy file and a corresponding test scenario.
Containerize the compiler: Pull the Cerbos Docker image matching your production version. Verify connectivity with docker run --rm ghcr.io/cerbos/cerbos:latest cerbos version.
Configure agent boundaries: Apply the .claude/settings.json template. Restrict filesystem access to policy directories and whitelist only compilation commands.
Run the first validation cycle: Execute bash scripts/validate-policy.sh. Observe the compiler output, agent corrections, and convergence behavior. Adjust iteration budgets based on error complexity.
Integrate with version control: Add the validation script to your pre-commit hooks and CI pipeline. Block merges on non-zero exit codes and enforce human review for semantic approval.

This pattern transforms AI agents from uncontrolled editors into disciplined drafting assistants. By anchoring validation to production-mirroring compilers, enforcing strict permission boundaries, and implementing deterministic iteration budgets, engineering teams can safely leverage agentic workflows for authorization policies without compromising security posture or audit compliance.

Hardening Claude Code Security for Policy as Code: How a Cerbos Skill Changed My Setup