Back to KB
Difficulty
Intermediate
Read Time
8 min

Six Principles for Agent Systems That Don't Hallucinate

By Codcompass Team··8 min read

Architecting Deterministic LLM Workflows: Beyond Prompt Engineering

Current Situation Analysis

The industry has moved past treating large language models as conversational interfaces. Modern engineering teams are deploying agentic systems that execute structured, multi-step tasks: automated compliance audits, data migration pipelines, infrastructure provisioning, and continuous integration validation. The gap between a working prototype and a production-grade agent system is no longer measured in model capability or token budgets. It is measured in architectural discipline.

Three failure modes consistently derail agentic deployments:

  1. Hallucination Drift – The model generates plausible but factually incorrect outputs when operating outside explicit boundaries.
  2. Non-Reproducibility – Identical inputs yield divergent outputs across runs, making debugging impossible and eroding team trust.
  3. Knowledge Amnesia – Every execution starts from a blank slate. Mistakes, discoveries, and domain patterns are discarded after each run, forcing the system to relearn the same constraints repeatedly.

These issues are rarely solved by tweaking system prompts or switching model providers. They stem from treating agents as stateless completion engines rather than components in a deterministic workflow. Tutorial-driven development emphasizes single-turn interactions, ignoring the compounding effects of multi-run execution. Real-world iteration data demonstrates that switching from ephemeral prompt chains to structured, stateful architectures can improve first-pass success rates from ~14% to 95% within a handful of runs. The improvement does not come from the model getting smarter; it comes from the system retaining context, enforcing contracts, and accumulating execution artifacts.

When teams ignore architectural layering, they pay for it in three ways: escalating context window costs, untraceable failure modes, and brittle pipelines that collapse under minor schema changes. The solution is not better prompting. It is deliberate system design.

WOW Moment: Key Findings

The transition from demo-quality to production-quality agents is quantifiable. The table below contrasts a typical ephemeral prompt chain against a structured stateful architecture across three critical operational metrics.

ApproachReproducibilityContext CostKnowledge Retention
Ephemeral Prompt ChainLow (±30% variance across runs)High (full context reload per execution)None (state discarded after completion)
Structured Stateful SystemHigh (±5% variance with deterministic seeding)Low (cached artifacts reduce token spend by 60–80%)Cumulative (artifacts compound across runs)

This finding matters because it shifts the engineering focus from model selection to workflow determinism. When execution state persists between phases, the system stops treating every run as a fresh experiment. Instead, it operates as a compounding engine: early runs populate discovery artifacts, mid-range runs refine execution plans, and later runs leverage accumulated knowledge to skip redundant processing. The result is a pipeline that becomes cheaper, faster, and more accurate with each iteration. This is the foundation of production-ready agentic systems.

Core Solution

Building a deterministic agent architecture requires treating the language model as one component in a larger execution graph. The system must enforce boundaries, isolate cognitive modes, persist state, and maintain a curated knowledge layer. Below is a step-by-step implementation using TypeScript, followed by architectural rationale.

Step 1: Define and Load an Explicit Contract

The contract is a static boundary document that defines operational rules, scope limits, and environmental assumptions. It is loaded once per pipeline initialization and injected into every role's context.

import { readFileSync } from 'fs';
import { join } from 'path';

export interface OperationalContract {
  scope: string[];
  exclusions: string[];
  executionRules: string[];
  environmentAssumptions: string[];
}

export function loadContract(contractPath: string): OperationalContract {
  const raw = readFileSync(join(contractPath, 'OPERATIONAL_BOUNDARIES.md'), 'utf-8');
  // Parse structured markdown sections into typed interface
  return parseContractMarkdown(raw);
}

Why this works: Without a contract, agents resolve ambiguity using stochastic sampling. A contract replaces guesswork with deterministic constraints. It also serves as a single source of truth for scope changes, preventing prompt drift across roles.

Step 2: Implement Role-Based Execution Graph

Complex tasks require distinct cognitive modes. A single agent cannot simultaneously explore broadly, plan strategically, execute precisely, and validate rigorously. Role separation assigns dedicated tools, context windows, and instructions to each phase.

export type AgentRole = 'scanner' | 'planner' | 'executor' | 'validator';

export interface RoleContext {
  role: AgentRole;
  tools: string[];
  contract: OperationalContract;
  inputArtifacts: Record<string, unknown>;
}

export async function executeRole(roleCtx: RoleContext): Promise<ExecutionArtifact> {
  const systemPrompt = buildRolePrompt(roleCtx);
  const response = await llmClient.complete({
    model: 'production-grade-model',
    messages: [{ role: 'system', content: systemPrompt }, ...roleCtx.inputArtifacts],
    temperature: roleCtx.role === 'validator' ? 0.1 : 0.3,
    seed: generateDeterministicSeed(roleCtx.role),
  });
  return validateAndFormatOutput(response, roleCtx.role);
}

Why this works: Role separation prevents context pollution and instruction contradiction. The scanner focuses on discovery, the planner on strategy, the executor on implementation, and the validator on verification. Each role operates with a tailored toolset and temperature profile, reducing hallucination risk and improving output consistency.

Step 3: Persist State Between Phases

Execution artifacts must survive between runs. In-memory state is ephemeral and untraceable. Disk-based art

ifacts enable idempotent execution, rollback capabilities, and cross-run analysis.

import { existsSync, writeFileSync, readFileSync } from 'fs';
import { stat } from 'fs/promises';

export interface ExecutionArtifact {
  phase: string;
  timestamp: number;
  data: Record<string, unknown>;
  checksum: string;
}

export async function persistArtifact(artifact: ExecutionArtifact, outputPath: string): Promise<void> {
  const filePath = `${outputPath}/${artifact.phase}-cache.json`;
  writeFileSync(filePath, JSON.stringify(artifact, null, 2));
}

export async function loadArtifact(phase: string, outputPath: string): Promise<ExecutionArtifact | null> {
  const filePath = `${outputPath}/${phase}-cache.json`;
  if (!existsSync(filePath)) return null;
  
  const stats = await stat(filePath);
  const isFresh = (Date.now() - stats.mtimeMs) < 86400000; // 24h TTL
  if (!isFresh) return null;
  
  return JSON.parse(readFileSync(filePath, 'utf-8'));
}

Why this works: Persistent state enables idempotent skip logic. If a discovery phase artifact is still fresh, the system bypasses redundant scanning. It also creates an audit trail: every phase writes structured output that downstream roles can consume, and engineers can inspect without parsing raw LLM responses.

Step 4: Maintain a Curated Knowledge Layer

Domain knowledge should live outside the execution code. A curated knowledge base (KB) of Markdown and YAML files provides deterministic context injection. Unlike vector stores, a flat KB guarantees that critical constraints are either fully present or explicitly missing.

export async function injectKnowledgeBase(role: AgentRole, kbPath: string): Promise<string[]> {
  const files = await glob(`${kbPath}/**/*.md`);
  const relevantDocs = files.filter(f => matchesRoleScope(f, role));
  return Promise.all(relevantDocs.map(f => readFileSync(f, 'utf-8')));
}

Why this works: Curated KBs avoid embedding drift and retrieval cutoffs. When platform patterns change, you update a single YAML file rather than retraining vectors or rewriting prompts. The KB travels across projects, making it a portable architectural component rather than a hardcoded dependency.

Step 5: Close the Learning Loop

Production agents must improve across runs. A validation phase compares execution output against expected states, logs discrepancies, and feeds corrections back into the KB or execution plan.

export async function closeLearningLoop(
  executionResult: ExecutionArtifact,
  validationReport: ValidationReport
): Promise<void> {
  if (validationReport.passRate < 0.95) {
    await updateKnowledgeBase(validationReport.failurePatterns);
    await logExecutionMetrics(executionResult, validationReport);
  }
}

Why this works: Closed-loop learning transforms single-run experiments into compounding systems. Failure patterns are extracted, categorized, and stored as explicit constraints. Subsequent runs automatically inherit these corrections, reducing repeat errors and accelerating convergence toward production stability.

Pitfall Guide

1. The God Agent Anti-Pattern

Explanation: Combining discovery, planning, execution, and validation into a single prompt creates instruction contradiction and context bloat. The model receives conflicting directives (e.g., "explore freely" vs "follow strict schema"), degrading output quality. Fix: Enforce strict role separation. Each phase gets its own prompt template, toolset, and context window. Validate outputs before passing them downstream.

2. Embedding-First Knowledge Retrieval

Explanation: Defaulting to vector similarity search for domain knowledge introduces retrieval uncertainty. Critical constraints can fall below the top-k threshold, causing silent hallucinations. Fix: Start with a flat, curated KB. Only migrate to RAG when context exceeds ~200k tokens, sources are uncurated, or history-driven similarity retrieval is explicitly required.

3. In-Memory State Assumption

Explanation: Storing execution artifacts in RAM or session variables makes pipelines non-restartable and untraceable. A single crash destroys hours of discovery work. Fix: Persist all phase outputs to disk using structured formats (JSON for state, JSONL for event logs). Implement modification-time checks to enable idempotent skips.

4. Implicit Contract Drift

Explanation: Relying on undocumented assumptions or scattered prompt instructions causes boundary ambiguity. Agents resolve unstated rules differently across runs, breaking reproducibility. Fix: Maintain a single OPERATIONAL_BOUNDARIES.md file. Load it explicitly at pipeline initialization. Version-control it alongside code. Reject any execution that cannot reference the active contract.

5. Ignoring Idempotency Gates

Explanation: Re-running discovery phases on unchanged codebases wastes context tokens and introduces stochastic variance into stable data. Fix: Implement artifact freshness checks. Skip phases when cached outputs are within a defined TTL. Log skip decisions for auditability.

6. Mixing Execution and Validation Contexts

Explanation: Allowing the executor to also validate its own output creates confirmation bias. The model optimizes for self-consistency rather than external correctness. Fix: Isolate validation into a dedicated role with read-only access to execution artifacts. Use deterministic scoring functions and schema validation before accepting outputs.

7. Unbounded Context Windows

Explanation: Feeding entire codebases or full KBs into every prompt inflates costs and dilutes attention. The model struggles to prioritize relevant constraints. Fix: Implement context budgeting. Load only role-relevant KB sections. Use hierarchical summarization for large artifacts. Enforce token limits with explicit truncation policies.

Production Bundle

Action Checklist

  • Define operational contract: Document scope, exclusions, execution rules, and environment assumptions in a single Markdown file.
  • Separate roles: Create distinct agents for scanning, planning, execution, and validation with isolated toolsets and prompts.
  • Persist artifacts: Write phase outputs to disk using structured JSON/JSONL. Implement TTL-based freshness checks.
  • Build curated KB: Store domain patterns, framework constraints, and platform gotchas in version-controlled Markdown/YAML files.
  • Implement validation gate: Route execution outputs through a dedicated validator role before accepting results.
  • Enable closed-loop learning: Log failure patterns, update KB constraints, and track pass rates across runs.
  • Enforce context budgeting: Load only role-relevant knowledge. Apply deterministic seeding and temperature controls per phase.

Decision Matrix

ScenarioRecommended ApproachWhyCost Impact
Small domain, frequent schema changesFlat curated KB + explicit contractDeterministic injection prevents retrieval drift; easy to versionLow (predictable token spend)
Large unstructured sources, history-driven queriesRAG with hybrid keyword/vector retrievalFlat KB exceeds context limits; similarity search captures prior casesMedium-High (embedding compute + retrieval overhead)
Single-agent prototypeRole separation + artifact persistencePrevents prompt bloat; enables restartability and audit trailsLow (refactor cost offset by reduced debugging)
Multi-team pipelineCentralized KB + shared contractEnsures consistent boundaries across teams; prevents knowledge silosMedium (initial documentation overhead)

Configuration Template

{
  "pipeline": {
    "contractPath": "./config/OPERATIONAL_BOUNDARIES.md",
    "artifactDir": "./.agent-state",
    "knowledgeBase": {
      "local": "./kb/project-specific",
      "global": "./kb/platform-patterns",
      "maxContextTokens": 120000
    },
    "roles": {
      "scanner": { "model": "fast-context-model", "temperature": 0.2, "seed": true },
      "planner": { "model": "reasoning-model", "temperature": 0.1, "seed": true },
      "executor": { "model": "code-model", "temperature": 0.3, "seed": true },
      "validator": { "model": "evaluation-model", "temperature": 0.0, "seed": true }
    },
    "idempotency": {
      "enabled": true,
      "ttlHours": 24,
      "skipOnFresh": true
    }
  }
}

Quick Start Guide

  1. Initialize the contract: Create OPERATIONAL_BOUNDARIES.md with scope limits, execution rules, and environment assumptions. Commit it to version control.
  2. Set up artifact storage: Create a .agent-state directory. Configure the pipeline to write phase outputs as JSON files with modification timestamps.
  3. Deploy the knowledge layer: Populate ./kb/ with Markdown/YAML files covering framework constraints, platform patterns, and domain rules. Ensure they are loaded explicitly, not via embeddings.
  4. Wire the execution graph: Implement role separation with isolated prompts and toolsets. Route outputs through a validation phase before accepting results.
  5. Enable compounding: Log execution metrics, extract failure patterns, and update the KB after each run. Monitor pass rates and context spend to verify architectural ROI.