Back to KB
Difficulty
Intermediate
Read Time
9 min

How to Code Review AI-Generated Code: What Needs Human Eyes vs. What Doesn't.

By Codcompass TeamΒ·Β·9 min read

The Assumption Audit: Reviewing LLM-Generated Code for Production

Current Situation Analysis

The velocity of software delivery has shifted dramatically. Teams are now shipping features authored primarily by large language models (LLMs) at a pace that traditional code review processes cannot sustain. The friction isn't coming from syntax errors or type mismatches. Modern LLMs are exceptionally reliable at producing valid TypeScript, satisfying linters, and implementing the exact happy path requested in a prompt. The breakdown occurs at runtime, where unspoken assumptions surface as silent failures, data leaks, or brittle dependencies.

Traditional code review was designed for human-authored code. When a developer writes a function, they make explicit trade-offs: they choose a caching strategy, they decide how to handle missing data, they structure error boundaries. A reviewer can ask, "Why did you choose this approach?" and get a reasoned answer. LLM-generated code lacks this intentional architecture. It pattern-matches against training corpora, stitching together syntactically valid fragments that assume ideal conditions. The code compiles. The tests pass for the nominal case. But underneath, the logic carries hidden state, magic values, swallowed exceptions, and tangled I/O.

This gap is frequently overlooked because standard PR checklists focus on style, naming conventions, and obvious logic bugs. Teams miss the semantic fragility that only manifests under load, during partial failures, or when requirements evolve. Industry telemetry from engineering organizations adopting AI pair programmers consistently shows a 30–40% increase in commit velocity, paired with a measurable rise in edge-case defects when reviews remain syntax-focused. The problem isn't the AI's output quality; it's that the review lens is misaligned. You cannot audit assumptions by checking for missing semicolons.

WOW Moment: Key Findings

Shifting from syntax-first reviews to assumption-centric audits fundamentally changes defect detection and long-term maintenance costs. The following comparison illustrates the operational impact of adopting an assumption-audit workflow versus traditional PR review practices.

Review ApproachRuntime Failure DetectionRefactoring FrictionTest Coverage GapsReviewer Cognitive Load
Syntax-FirstLow (misses edge cases)High (magic values)High (tangled I/O)High (guessing intent)
Assumption-CentricHigh (explicit boundaries)Low (config-driven)Low (seam injection)Low (checklist-driven)

This finding matters because it decouples review velocity from code complexity. When reviewers stop hunting for typos and start mapping assumption boundaries, they catch defects before they reach staging. The assumption-centric approach also forces architectural discipline: pure logic gets extracted, external contracts get validated, and state gets scoped. The result is code that doesn't just run correctly today, but degrades predictably tomorrow.

Core Solution

Auditing LLM-generated code requires a systematic workflow that isolates assumptions, enforces explicit boundaries, and guarantees testability. The following implementation demonstrates how to transform a fragile AI-generated handler into a production-ready module using TypeScript.

Step 1: Isolate Pure Logic from I/O

LLMs frequently fuse database queries, business calculations, and side effects into a single function. This makes unit testing impossible and obscures the actual decision-making process.

Architecture Decision: Extract all deterministic calculations into pure functions. Pass data in, return results out. Keep I/O at the edge of the module.

// ❌ AI-generated: Logic, I/O, and side effects fused
export async function fulfillShipment(shipmentId: string) {
  const shipment = await db.selectFrom('shipments')
    .selectAll()
    .where('id', '=', shipmentId)
    .executeTakeFirst();

  const totalWeight = shipment.items.reduce((acc, item) => 
    acc + item.weight * item.quantity, 0
  );
  
  const shippingCost = totalWeight > 50 ? totalWeight * 2.5 : totalWeight * 3.0;

  await db.updateTable('shipments')
    .set({ status: 'processing', cost: shippingCost })
    .where('id', '=', shipmentId)
    .execute();

  await notificationService.send(shipment.trackingEmail, `Shipment ${shipmentId} ready`);
}
// βœ… Assumption-audited: Pure logic extracted, I/O separated
export function calculateShippingCost(totalWeight: number): number {
  const threshold = SHIPPING_CONFIG.WEIGHT_THRESHOLD;
  const heavyRate = SHIPPING_CONFIG.HEAVY_RATE;
  const standardRate = SHIPPING_CONFIG.STANDARD_RATE;
  
  return totalWeight > threshold 
    ? totalWeight * heavyRate 
    : totalWeight * standardRate;
}

export async function fulfillShipment(shipmentId: string) {
  const shipment = await shipmentRepository.findById(shipmentId);
  if (!shipment) throw new NotFoundError(`Shipment ${shipmentId} missing`);

  const totalWeight = shipment.items.reduce((acc, item) => 
    acc + item.weight * item.quantity, 0
  );
  
  const shippingCost = calculateShippingCost(totalWeight);

  await shipmentRepository.updateStatus(shipmentId, 'processing', shippingCost);
  await notificationBroker.dispatch(shipment.trackingEmail, `Shipment ${shipmentId} ready`);
}

Why this works: The calculation is now deterministic, trivially testable, and immune to database latency or network failures. The handler only orchestrates I/O, making failure points explicit.

Step 2: Externalize Configuration & Magic Values

LLMs embed literal values directly into business logic because they mirror the prompt or training examples. These become technical debt the moment requirements shift.

Architecture Decision: Centralize all tunable values in a typed configuration object. Reference constants, never literals.

// ❌ AI-generated: Hardcoded limits and status strings
export async function getActiveInventory(warehouseId: string) {
  const results = await db.selectFrom('inventory')
    .where('warehouse_id', '=', warehouseId)
    .where('status', '=', 'available')
    .limit(25)
    .execute();
    
  return {
    items: results,
    hasMore: results.length === 25
  };
}
// βœ… Assumption-audited: Configuration-driven, single source of truth
export const INVENTORY_CONFIG = {
  PAGE_SIZE: 25,
  AVAILABLE_STATUS: 'available',
  MAX_RETRIES: 3,
} as const;

export async function getActiveInventory(warehouseId: string) {
  const results = await db.selectFrom('inven

tory') .where('warehouse_id', '=', warehouseId) .where('status', '=', INVENTORY_CONFIG.AVAILABLE_STATUS) .limit(INVENTORY_CONFIG.PAGE_SIZE) .execute();

return { items: results, hasMore: results.length === INVENTORY_CONFIG.PAGE_SIZE }; }


**Why this works:** Changing pagination or status enums requires a single edit. The `hasMore` logic stays synchronized with the limit, eliminating silent pagination breaks.

### Step 3: Define Explicit Error Boundaries

LLMs frequently wrap operations in `try/catch` blocks that log and return `undefined` or empty arrays. This masks failures and breaks downstream contracts.

**Architecture Decision:** Fail fast at the boundary. Let errors propagate to the caller, which decides whether to retry, fallback, or abort.

```typescript
// ❌ AI-generated: Silent failure swallowing
export async function syncWarehouseMetrics(warehouseId: string) {
  try {
    const metrics = await metricsApi.fetch(warehouseId);
    await db.insertInto('warehouse_metrics').values(metrics).execute();
  } catch (err) {
    console.warn('Sync failed, skipping');
    return;
  }
}
// βœ… Assumption-audited: Explicit error propagation with intent
export async function syncWarehouseMetrics(warehouseId: string): Promise<void> {
  const metrics = await metricsApi.fetch(warehouseId);
  if (!metrics) throw new DataUnavailableError(`Metrics missing for ${warehouseId}`);
  
  await db.insertInto('warehouse_metrics').values(metrics).execute();
}

// Caller decides handling strategy
async function runSyncJob(warehouseId: string) {
  try {
    await syncWarehouseMetrics(warehouseId);
  } catch (error) {
    if (error instanceof DataUnavailableError) {
      await alertingService.notify('metrics-sync', error);
      return; // Graceful skip
    }
    throw error; // Critical failure
  }
}

Why this works: The system no longer hides failures. Observability tools capture explicit error types, and callers implement deliberate retry or fallback logic.

Step 4: Scope State to Execution Context

LLMs often declare module-level variables for caching or counters. In serverless or edge runtimes, these persist across invocations, causing data leakage and race conditions.

Architecture Decision: Never store mutable state at module scope. Pass context objects or use request-scoped storage.

// ❌ AI-generated: Module-level cache leaks across requests
const rateLimiter = new Map<string, number>();

export function checkRateLimit(clientId: string): boolean {
  const current = rateLimiter.get(clientId) ?? 0;
  if (current >= 100) return false;
  rateLimiter.set(clientId, current + 1);
  return true;
}
// βœ… Assumption-audited: Request-scoped state injection
export function checkRateLimit(
  clientId: string, 
  context: { counters: Map<string, number> }
): boolean {
  const current = context.counters.get(clientId) ?? 0;
  if (current >= RATE_LIMIT_CONFIG.MAX_REQUESTS) return false;
  context.counters.set(clientId, current + 1);
  return true;
}

Why this works: State lifecycle matches request lifecycle. Cold starts in serverless environments no longer inherit stale data, and concurrent executions remain isolated.

Pitfall Guide

1. Module-Level State Leakage

Explanation: LLMs generate top-level let or const variables for caching, counters, or singletons. In containerized or serverless runtimes, these survive across function invocations, causing cross-request data contamination. Fix: Replace module state with dependency injection. Pass a context object containing mutable stores, or use a dedicated caching layer (Redis, in-memory with TTL) that respects request boundaries.

2. Implicit Dependency Contracts

Explanation: AI code assumes external APIs, databases, or third-party services return exactly the shape documented in training data. It rarely validates nullability, missing fields, or schema drift. Fix: Implement runtime validation at the I/O boundary. Use Zod or custom type guards to assert shapes before processing. Treat every external response as unknown until validated.

3. Silent Failure Propagation

Explanation: Catch blocks that log to console and return undefined or empty arrays mask runtime failures. Downstream code receives malformed data, triggering cryptic errors hours later. Fix: Enforce explicit error contracts. Either re-throw typed errors, return a Result<T, E> union type, or require the caller to handle failure states. Never swallow exceptions without a documented fallback strategy.

4. Hardcoded Business Rules

Explanation: Magic numbers, status strings, and timeout values embedded in logic create hidden coupling. When requirements change, developers update one location but miss the implicit comparison elsewhere. Fix: Centralize all tunable values in a typed configuration module. Reference constants exclusively. Add lint rules to flag numeric/string literals inside business logic functions.

5. Monolithic Handler Functions

Explanation: AI frequently produces functions that fetch data, calculate results, update databases, and trigger notifications in a single block. This structure prevents unit testing and obscures failure points. Fix: Apply the seam pattern. Extract pure calculations into testable functions. Keep handlers focused on orchestration. Use dependency injection to mock I/O during tests.

6. Over-Reliance on Non-Null Assertions

Explanation: To satisfy TypeScript's strict mode, LLMs append ! to potentially null values. This silences the compiler while guaranteeing runtime crashes when data is missing. Fix: Remove all non-null assertions. Replace with explicit null checks, optional chaining, or early returns. Treat ! as a code smell that requires architectural correction, not a compiler workaround.

7. Missing Idempotency Guards

Explanation: AI-generated handlers assume single execution. In distributed systems, retries, message queue redeliveries, or network timeouts cause duplicate processing, leading to double charges or corrupted state. Fix: Implement idempotency keys at the API boundary. Use database constraints or distributed locks to prevent duplicate processing. Design handlers to be safely re-executable without side effects.

Production Bundle

Action Checklist

  • Extract pure logic: Move all calculations and transformations out of I/O handlers into deterministic functions.
  • Centralize configuration: Replace all magic numbers and strings with typed constants in a dedicated config module.
  • Validate external contracts: Add runtime type guards or schema validation at every network/database boundary.
  • Enforce error propagation: Remove silent catch blocks; require explicit error handling or typed result returns.
  • Scope mutable state: Eliminate module-level variables; inject request-scoped context or use external caching.
  • Add idempotency: Implement deduplication keys or database constraints for all write operations.
  • Verify test seams: Ensure every business rule can be tested in isolation without mocking databases or networks.

Decision Matrix

ScenarioRecommended ApproachWhyCost Impact
Rapid Prototype / MVPSyntax-first review + manual assumption scanSpeed prioritized; assumptions documented in PR commentsLow initial cost, high refactoring risk later
Production MicroserviceAssumption-centric audit + contract validationPredictable failure modes, easier debugging, lower incident rateModerate review overhead, significantly lower MTTR
Edge / Serverless FunctionStrict state scoping + idempotency guardsPrevents cross-request leakage and duplicate processingHigher architectural complexity, eliminates data corruption
Legacy MigrationGradual seam injection + config extractionAllows incremental refactoring without blocking deliveryPhased cost, reduces technical debt accumulation

Configuration Template

// src/config/audit-rules.ts
export const AI_CODE_REVIEW_CONFIG = {
  enforcePureLogic: true,
  forbidModuleState: true,
  requireExplicitErrors: true,
  validateExternalContracts: true,
  idempotencyRequired: true,
  maxFunctionComplexity: 15,
  allowedNonNullablePatterns: [] as string[],
} as const;

// src/types/review-context.ts
export interface ReviewContext {
  requestId: string;
  counters: Map<string, number>;
  cache: Map<string, unknown>;
  logger: { warn: (msg: string, meta?: Record<string, unknown>) => void };
}

// src/utils/contract-validator.ts
import { z } from 'zod';

export function validateContract<T>(schema: z.ZodType<T>, data: unknown): T {
  const result = schema.safeParse(data);
  if (!result.success) {
    throw new ContractValidationError('External data mismatch', result.error.errors);
  }
  return result.data;
}

export class ContractValidationError extends Error {
  constructor(message: string, public readonly details: unknown[]) {
    super(message);
    this.name = 'ContractValidationError';
  }
}

Quick Start Guide

  1. Install validation dependencies: Add zod for runtime contract validation and configure your linter to flag non-null assertions and module-level mutable state.
  2. Create the config module: Copy the AI_CODE_REVIEW_CONFIG template into your project. Set enforcement flags to true for production environments.
  3. Refactor one handler: Pick a recently merged AI-generated function. Extract its calculation logic into a pure function, replace literals with constants, and add explicit error propagation.
  4. Add a review checklist: Pin the Action Checklist to your repository's PR template. Require reviewers to verify each assumption boundary before approving.
  5. Integrate CI checks: Add a pre-commit hook or GitHub Action that scans for console.error in catch blocks, module-level let declarations, and unvalidated external API calls. Fail the build on violations.