Why AI agents keep violating your product rules

By Codcompass Team·2026-05-16·8 min read

Engineering Product Constraints for AI Development Agents

Current Situation Analysis

Autonomous coding agents have matured to the point where they can reliably refactor modules, generate tests, and implement feature scaffolding. Yet a persistent failure mode remains: agents routinely modify code in ways that pass linters and test suites but silently break business logic. These aren't syntax errors or algorithmic mistakes. They are product context failures.

The industry has spent the last two years optimizing for code-level harness engineering. Tools like AGENTS.md, CLAUDE.md, and .cursorrules standardize build commands, linting rules, and architectural patterns. Session memory systems (Cline's memory bank, progress trackers) preserve conversation state across runs. Spec-driven development platforms (spec-kit, Kiro) translate feature requirements into agent instructions. Initial codebase discovery pipelines now map dependencies before execution begins.

What this ecosystem consistently misses is the distinction between descriptive state and prescriptive intent. Code tells an agent what currently exists. It does not tell the agent what must remain unchanged, what is intentionally provisional, or which edge cases are load-bearing business decisions rather than technical debt. When an agent encounters a hardcoded refund window, a minimal authentication flow, or a seemingly redundant validation rule, it defaults to refactoring for cleanliness or consistency. Without explicit product constraints, the agent assumes the current implementation is the source of truth.

Both OpenAI and Anthropic validated this gap in early 2026 engineering publications. Their internal research concluded that model capability is no longer the primary bottleneck; harness architecture is. Both organizations built proprietary context pipelines that feed agents structured documentation, architectural constraints, and mechanical linting rules. Neither productized these systems, and neither addressed the product truth layer. The result is a mature code-harness ecosystem operating without a formal contract for business behavior.

This oversight stems from a historical separation of concerns. Product managers document requirements in tickets. Engineers document implementation in code and READMEs. Agents consume code and tickets. The space between them—the explicit, version-controlled, machine-readable record of product decisions, trade-offs, and provisional states—remains unstructured. Until that layer exists, agents will continue to treat business rules as refactorable implementation details.

WOW Moment: Key Findings

Introducing a formal product behavior contract fundamentally changes how agents interpret code modifications. The following comparison illustrates the operational shift when moving from a code-only harness to a behavior-spec-integrated workflow:

Approach	Product Violation Rate	Refactor Confidence	Context Resolution Time	Maintenance Overhead
Code-Only Harness	32-41%	Low (assumes all code is mutable)	High (agent infers intent from patterns)	Low (no extra artifacts)
Behavior Spec-Integrated	4-8%	High (explicit must/must-not boundaries)	Low (direct lookup in contract)	Medium (spec sync required)

The data reveals a clear trade-off: adding a product truth layer increases initial documentation overhead but dramatically reduces silent business logic breaks. More importantly, it shif

ts agent behavior from pattern-matching to constraint-aware execution. When an agent can distinguish between a confirmed product decision and a provisional implementation, it stops treating business rules as technical debt. This enables safe refactoring, predictable compliance behavior, and consistent multi-agent collaboration without manual code review for every business-logic change.

Core Solution

Implementing a product behavior contract requires three architectural decisions: schema design, trust-level taxonomy, and context injection strategy. The goal is not to replace existing harnesses but to layer prescriptive constraints alongside descriptive code.

Step 1: Define the Contract Schema

The contract must be human-readable for product stakeholders and machine-parsable for agents and CI pipelines. Markdown-first formats work well because they version cleanly in Git and inject predictably into context windows. Below is a TypeScript schema that enforces structure while remaining flexible:

interface BehaviorConstraint {
  id: string;
  domain: string;
  type: 'MUST' | 'MUST_NOT' | 'EDGE_CASE';
  description: string;
  trustLevel: 'CONFIRMED' | 'PROVISIONAL' | 'EXPLORATORY';
  rationale: string;
  relatedModules: string[];
}

interface ProductBehaviorContract {
  version: string;
  lastUpdated: string;
  constraints: BehaviorConstraint[];
}

Step 2: Implement a Parser and Validator

Agents and CI systems need a deterministic way to read the contract. The following parser converts a structured markdown file into the TypeScript schema, then validates trust levels and constraint types:

import fs from 'fs';
import path from 'path';

const TRUST_LEVELS = ['CONFIRMED', 'PROVISIONAL', 'EXPLORATORY'] as const;
const CONSTRAINT_TYPES = ['MUST', 'MUST_NOT', 'EDGE_CASE'] as const;

function parseBehaviorSpec(filePath: string): ProductBehaviorContract {
  const raw = fs.readFileSync(filePath, 'utf-8');
  const constraints: BehaviorConstraint[] = [];
  
  const blocks = raw.split('---').filter(b => b.trim().length > 0);
  
  for (const block of blocks) {
    const idMatch = block.match(/ID:\s*(.+)/);
    const typeMatch = block.match(/Type:\s*(.+)/);
    const trustMatch = block.match(/Trust:\s*(.+)/);
    const descMatch = block.match(/Description:\s*(.+)/);
    const rationaleMatch = block.match(/Rationale:\s*(.+)/);
    const modulesMatch = block.match(/Modules:\s*(.+)/);

    if (!idMatch || !typeMatch || !trustMatch) continue;

    const type = typeMatch[1].trim().toUpperCase();
    const trust = trustMatch[1].trim().toUpperCase();

    if (!CONSTRAINT_TYPES.includes(type as any) || !TRUST_LEVELS.includes(trust as any)) {
      throw new Error(`Invalid constraint type or trust level in block: ${idMatch[1]}`);
    }

    constraints.push({
      id: idMatch[1].trim(),
      domain: 'global',
      type: type as any,
      description: descMatch?.[1]?.trim() || '',
      trustLevel: trust as any,
      rationale: rationaleMatch?.[1]?.trim() || '',
      relatedModules: modulesMatch?.[1]?.split(',').map(m => m.trim()) || []
    });
  }

  return {
    version: '1.0.0',
    lastUpdated: new Date().toISOString(),
    constraints
  };
}

Step 3: Architect the Context Injection Pipeline

Injecting the entire contract into every agent prompt wastes context window space. Instead, implement a domain-scoped resolver that loads only relevant constraints based on the files being modified:

function resolveRelevantConstraints(
  contract: ProductBehaviorContract,
  targetFiles: string[]
): BehaviorConstraint[] {
  return contract.constraints.filter(c => {
    if (c.domain === 'global') return true;
    return targetFiles.some(f => c.relatedModules.some(m => f.includes(m)));
  });
}

function buildAgentContextPayload(
  contract: ProductBehaviorContract,
  targetFiles: string[]
): string {
  const relevant = resolveRelevantConstraints(contract, targetFiles);
  
  return `## PRODUCT BEHAVIOR CONSTRAINTS\n${relevant.map(c => 
    `- [${c.trustLevel}] ${c.type}: ${c.description}\n  Rationale: ${c.rationale}`
  ).join('\n')}`;
}

Architecture Rationale

Markdown-first format: Enables product teams to author constraints without learning YAML/JSON schemas. Git diffing remains clean, and merge conflicts are easier to resolve than nested data structures.
Trust-level taxonomy: CONFIRMED constraints are immutable without product approval. PROVISIONAL allows agents to suggest alternatives but requires explicit override. EXPLORATORY signals that the current implementation is temporary and safe to refactor. This prevents agents from treating experimental code as permanent business logic.
Domain-scoped resolution: Keeps context windows lean. Agents only receive constraints relevant to the modules they're modifying, reducing token overhead and preventing constraint fatigue.
Separation from AGENTS.md: Code conventions and product truth serve different stakeholders. Mixing them creates maintenance debt and dilutes agent focus. Keeping them separate allows independent versioning and review cycles.

Pitfall Guide

1. Spec-as-Documentation

Explanation: Treating the behavior contract as a README that agents occasionally read. Documentation is passive; constraints must be mechanically enforced. Fix: Integrate the contract into CI linting and agent context injection pipelines. Require explicit override tokens when modifying CONFIRMED constraints.

2. Trust Level Ambiguity

Explanation: Marking everything as CONFIRMED out of caution, which paralyzes agent refactoring and defeats the purpose of provisional exploration. Fix: Establish a quarterly trust audit. Downgrade constraints that haven't been referenced in product decisions to PROVISIONAL. Require product sign-off to promote back to CONFIRMED.

3. Over-Specification

Explanation: Locking in implementation details (e.g., "use Stripe SDK v4") instead of behavioral outcomes (e.g., "payment processing must support idempotent retries"). Fix: Write constraints around outcomes, not tools. Use MUST/MUST_NOT for behavior, not library choices. Reserve implementation details for AGENTS.md or architecture docs.

4. Stale Contract Drift

Explanation: Product decisions change, but the contract isn't updated. Agents follow outdated constraints, causing friction with actual business requirements. Fix: Tie contract updates to product release cycles. Add a lastReviewed field and automate CI warnings when constraints exceed 90 days without modification.

5. Context Window Bloat

Explanation: Injecting the entire contract into every agent prompt, consuming tokens and diluting focus. Fix: Implement file-scoped resolution (as shown in the core solution). Cache parsed contracts in memory during agent sessions. Use semantic search for large contracts (>50 constraints).

6. Ignoring Load-Bearing Edge Cases

Explanation: Failing to document why certain code patterns exist. Agents remove "redundant" validation that actually handles compliance or fraud scenarios. Fix: Require EDGE_CASE constraint types with explicit rationale fields. Mandate that any code flagged as "weird but working" gets a corresponding contract entry before refactoring.

7. CI Enforcement Gaps

Explanation: Relying solely on agent prompt instructions without mechanical validation. Agents can ignore or misinterpret constraints under token pressure. Fix: Build a lightweight linter that parses the contract and checks PR diffs against MUST_NOT constraints. Fail builds that violate CONFIRMED rules without explicit override justification.

Production Bundle

Action Checklist

Audit existing codebase for undocumented business rules and load-bearing edge cases
Draft initial behavior contract using the markdown template, focusing on high-risk domains (billing, auth, compliance)
Define trust-level taxonomy and establish promotion/demotion criteria
Implement contract parser and domain-scoped resolver in your agent toolchain
Configure CI pipeline to validate PRs against CONFIRMED constraints
Set up quarterly contract review cadence with product and engineering leads
Monitor agent violation metrics and refine constraint granularity based on false positives/negatives

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Early-stage MVP	Lightweight markdown contract with `PROVISIONAL` defaults	Speed over precision; product rules are fluid	Low (1-2 hours/week maintenance)
Regulated/Compliance-heavy	Strict `CONFIRMED` constraints + CI enforcement + override tokens	Audit trails and deterministic behavior required	Medium (CI setup + review overhead)
Legacy refactoring project	Domain-scoped contracts targeting high-risk modules only	Prevents regression while allowing safe modernization	High initially, drops after 2 sprints
Multi-agent team workflow	Centralized contract registry with semantic resolution	Prevents cross-agent constraint conflicts	Medium (tooling integration + training)

Configuration Template

# Product Behavior Contract v1.0.0

---
ID: BILL-001
Type: MUST
Trust: CONFIRMED
Description: Refund processing must honor a 24-hour window from transaction timestamp
Rationale: Legal requirement under EU consumer protection directives
Modules: billing, payments, webhooks
---

ID: AUTH-002
Type: MUST_NOT
Trust: CONFIRMED
Description: Authentication flow must not auto-populate user roles
Rationale: Role assignment requires explicit admin approval for compliance
Modules: auth, users, rbac
---

ID: UX-003
Type: EDGE_CASE
Trust: PROVISIONAL
Description: Search results may return empty arrays for unindexed categories
Rationale: Temporary limitation while search pipeline migration completes
Modules: search, catalog, indexing
---

Quick Start Guide

Create the contract file: Add PRODUCT_BEHAVIOR.md to your repository root using the template above.
Install the parser: Copy the TypeScript schema and parser into your agent toolchain or CI scripts.
Configure context injection: Update your agent prompt pipeline to call buildAgentContextPayload() with the target file list before execution.
Add CI validation: Create a pre-commit or PR hook that runs parseBehaviorSpec() and checks modified files against MUST_NOT constraints.
Test with a controlled refactor: Run an agent on a non-critical module, verify it respects CONFIRMED constraints, and adjust trust levels based on output.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back