TDD with AI: Claude Writes Tests First, Then the Implementation

By Codcompass Team·2026-05-13·8 min read

Specification-First Engineering: Automating TDD Workflows with Claude Code

Current Situation Analysis

Test-Driven Development (TDD) remains the gold standard for software reliability, yet adoption rates in production environments remain stubbornly low. The barrier is rarely philosophical; it is cognitive friction. The traditional Red-Green-Refactor cycle demands that developers define interfaces, anticipate edge cases, and establish module contracts before any implementation logic exists. This requires holding a complete mental model of the system's behavior while staring at a blank file—a state that induces "analysis paralysis" and breaks momentum.

Consequently, most teams default to a code-first approach, writing implementation logic and retrofitting tests afterward. This creates a feedback loop where tests validate what the code does rather than what it should do. The result is test suites that lack coverage of boundary conditions, brittle assertions tied to internal implementation details, and a high risk of regression during refactoring.

Large Language Models (LLMs) like Claude Code fundamentally alter this equation by offloading the cognitive load of specification generation. When integrated correctly, AI transforms TDD from a discipline exercise into a fluid workflow. The developer provides the intent; the AI generates the executable specification. This shifts the developer's role from "writer of boilerplate tests" to "architect of contracts," enabling rigorous test-first practices without the traditional startup cost.

WOW Moment: Key Findings

The divergence between AI-assisted code-first and AI-assisted specification-first workflows is measurable across critical engineering metrics. The following comparison highlights why specification-first yields superior long-term outcomes, even if it introduces slight initial latency.

Strategy	Spec Fidelity	Refactoring Risk	API Cohesion	Edge Case Coverage	Long-term Velocity
AI-Code-First	Low	High	Organic/Loose	Reactive (misses hidden paths)	Degrades over time
AI-Spec-First	High	Low	Contract-Driven	Proactive (defined by requirements)	Accelerates with scale

Why this matters: In code-first workflows, AI generates tests that mirror the implementation structure. If the implementation has a flaw, the test often replicates the same assumption. In specification-first workflows, the test file acts as an immutable contract. The AI implementation must conform to the behavior defined in the spec, catching logic errors before they are codified. This approach decouples the API design from the implementation details, making refactoring safe and enabling parallel development streams.

Core Solution

The specification-first workflow with Claude Code follows a four-phase cycle. This section demonstrates the pattern using a SlidingWindowRateLimiter module, a common backend component requiring precise handling of time, concurrency, and state.

Phase 1: Contract Generation

The developer defines the requirements and constraints. Claude generates the test suite, which serves as the executable specification. The implementation file does not exist yet.

Prompt Strategy:

"Generate a test suite for a SlidingWindowRateLimiter. Requirements:

Constructor accepts maxRequests (number) and windowMs (number).

Method isAllowed(userId: string): boolean.

Sliding window logic: requests expire based on timestamp.

Edge Cases: Boundary conditions at exact limits, window expiration, concurrent requests for same user, distinct users isolation.

Framework: Vitest.

Constraint: Do not write the implementation. Only the test file."

Generated Specification (rate-limiter.spec.ts):

import { describe, it, expect, vi, beforeEach } from 'vitest';
import { RateLimiter } from './rate-limiter';

describe('RateLimiter', () => {
  beforeEach(() => {
    vi.useFakeTimers();
  });

  afterEach(() => {
    vi.useRealTimers();
  });

  it('allows requests within the defined limit', () => {
    const limiter = new RateLimiter({ maxRequests: 3, windowMs: 1000 });
    
    expect(limiter.isAllowed('user-a')).toBe(true);
    expect(limiter.isAllowed('user-a')).toBe(true);
    expect(limiter.isAllowed('user-a')).toBe(true);
  });

  it('denies requests exceeding the limit within the window', () => {
    const limiter = new RateLimiter({ maxRequests: 2, windowMs: 1000 });
    
    limiter.isAllowed('user-b');
    limiter.isAllowed('user-b');
    
    expect(limiter.isAllowed('user-b')).toBe(false);
  });

  it('resets allowance after window expiration', () => {
    const limiter = new RateLimiter({ maxRequests: 1, windowMs: 500 });
    
    limiter.isAllowed('user-c');
    expect(limiter.isAllowed('user-c')).toBe(false);
    
    vi.advanceTimersByTime(501);
    
    expect(limiter.isAllowed('user-c')).toBe(true);
  });

  it('isolates limits between distinct users', () => {
    const limiter = new RateLimiter({ maxRequests: 1, windowMs: 1000 });
    
    limiter.isAllowed('user-d');
    
    expect(limiter.isAllowed('user-e')).toBe(true);
  });

  it('handles partial window expiration correctly', () => {
    const limiter = new RateLimiter({ maxRequests: 2, windowMs: 1000 });
    
    limiter.isAllowed('user-f');
    vi.advanceTimersByTime(600);
    limiter.isAllowed('user-f');
    
    // First request expired, second is still active. Limit should allow one more.
    expect(limiter.isAllowed('user-f')).toBe(true);
    expect(limiter.isAllowed('user-f')).toBe(false);
  });
});

Rationale: The tests lock in the behavior. Notice the explicit handling of time via vi.useFakeTimers(), which is critical for deterministic testing of time-based logic. The spec covers isolation, expiration, and partial windows. This file is now the source of truth.

Phase 2: Minimal Implementation

With the contract established, the developer prompts Claude to generate the implementation. The constraint is strict: the code must satisfy the tests with minimal complexity.

Prompt Strategy:

"Implement the RateLimiter class in rate-limiter.ts to satisfy all tests in

rate-limiter.spec.ts.

Use a Map to store request timestamps per user.

Implement sliding window pruning on each check.

Minimal code. No extra features.

Ensure type safety."

Generated Implementation (rate-limiter.ts):

interface RateLimiterConfig {
  maxRequests: number;
  windowMs: number;
}

export class RateLimiter {
  private config: RateLimiterConfig;
  private buckets: Map<string, number[]>;

  constructor(config: RateLimiterConfig) {
    this.config = config;
    this.buckets = new Map();
  }

  isAllowed(userId: string): boolean {
    const now = Date.now();
    const windowStart = now - this.config.windowMs;
    
    let timestamps = this.buckets.get(userId) || [];
    
    // Prune expired timestamps
    timestamps = timestamps.filter(ts => ts > windowStart);
    
    if (timestamps.length < this.config.maxRequests) {
      timestamps.push(now);
      this.buckets.set(userId, timestamps);
      return true;
    }
    
    this.buckets.set(userId, timestamps);
    return false;
  }
}

Rationale: The implementation is derived directly from the expectations. The filter operation handles the sliding window logic required by the expiration tests. The structure is simple and readable. Running the test suite at this stage should yield 100% pass rate.

Phase 3: Refactoring Under Coverage

Once the tests pass, the developer can safely refactor. The test suite acts as a safety net, ensuring behavior remains invariant.

Prompt Strategy:

"Refactor rate-limiter.ts to optimize performance.

Extract timestamp pruning into a private helper method.

Consider using a more efficient data structure if applicable, or optimize the array operations.

Ensure all tests in rate-limiter.spec.ts continue to pass without modification."

Refactored Implementation:

// ... imports and interface ...

export class RateLimiter {
  private config: RateLimiterConfig;
  private buckets: Map<string, number[]>;

  constructor(config: RateLimiterConfig) {
    this.config = config;
    this.buckets = new Map();
  }

  isAllowed(userId: string): boolean {
    const now = Date.now();
    const validTimestamps = this.pruneExpired(userId, now);
    
    if (validTimestamps.length < this.config.maxRequests) {
      validTimestamps.push(now);
      this.buckets.set(userId, validTimestamps);
      return true;
    }
    
    return false;
  }

  private pruneExpired(userId: string, now: number): number[] {
    const windowStart = now - this.config.windowMs;
    const current = this.buckets.get(userId) || [];
    // Optimization: find index of first valid timestamp to slice
    const firstValidIndex = current.findIndex(ts => ts > windowStart);
    return firstValidIndex === -1 ? [] : current.slice(firstValidIndex);
  }
}

Rationale: The refactoring introduces a pruneExpired helper and optimizes the array operation using findIndex and slice, which is more efficient than filter for large arrays. Because the tests are behavior-bound, this internal restructuring is verified instantly.

Phase 4: Extension via New Specs

Adding features follows the same pattern. New requirements generate new tests first.

Example Extension: Support for burst allowance.

Add test: it('allows burst requests up to burstLimit', ...)
Run tests (fail).
Prompt: "Update implementation to support burstLimit configuration without breaking existing tests."
Run tests (pass).

Pitfall Guide

Integrating AI into TDD workflows introduces specific risks. The following pitfalls are derived from production experience with LLM-augmented development.

The Implementation Leakage Trap
- Explanation: AI may generate tests that assert on internal state or private methods rather than public behavior. This couples tests to implementation, making refactoring impossible.
- Fix: Explicitly instruct the AI: "Tests must only interact with the public API. Do not access private properties or methods." Review generated tests for assertions on internal structures.
Determinism Drift
- Explanation: AI might write tests relying on Date.now() or random values without mocking, leading to flaky tests that pass locally but fail in CI.
- Fix: Include mocking instructions in the prompt: "Use fake timers for all time-dependent logic. Mock external dependencies." Verify test runs multiple times to check for flakiness.
The "Happy Path" Bias
- Explanation: LLMs tend to prioritize standard use cases. Edge cases like null inputs, empty collections, or concurrency races may be omitted unless explicitly requested.
- Fix: In the specification prompt, include a dedicated section: "Edge Cases: [List specific scenarios or 'Analyze input types and suggest boundary conditions']." Review the generated spec for coverage of error paths.
Context Window Saturation
- Explanation: As the test suite grows, the context window may fill, causing the AI to lose track of earlier requirements or generate incomplete implementations.
- Fix: Modularize specifications. Split large modules into focused test files. Use prompt chaining: "Here is the current spec. Add tests for feature X."
Blind Trust in Generated Code
- Explanation: AI-generated implementation may contain subtle logic errors, security vulnerabilities, or performance anti-patterns that still pass the tests.
- Fix: Treat AI output as a draft. Perform code review on generated implementations. Verify algorithmic complexity and security constraints manually.
Prompt Drift and Versioning
- Explanation: Requirements evolve, but the test spec may not be updated consistently, leading to a divergence between the spec and the actual business needs.
- Fix: Version control the test files alongside implementation. Treat the test suite as a living document. Update the spec immediately when requirements change.
Over-Specification
- Explanation: Writing tests for every trivial detail can slow development and create maintenance overhead.
- Fix: Apply the "Testing Trophy" principle. Focus on integration and behavior tests. Avoid unit testing trivial getters/setters or pure utility functions unless they contain complex logic.

Production Bundle

Action Checklist

Define Interface Contract: Draft the input/output types and behavior requirements before prompting.
Generate Spec with Edge Cases: Prompt Claude to create tests, explicitly requesting boundary conditions and error handling.
Review Specification: Manually verify the generated tests cover all requirements and use proper mocking.
Generate Implementation: Prompt Claude to implement the module strictly against the test contract.
Execute Test Suite: Run tests locally. Ensure 100% pass rate. Investigate any failures.
Refactor Safely: Use AI to refactor implementation, relying on tests to preserve behavior.
Commit Together: Commit spec and implementation in the same PR to maintain consistency.
CI Integration: Ensure the CI pipeline runs the full test suite on every push.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
New Microservice	AI-Spec-First	Establishes clear contracts early; reduces integration bugs; enables parallel frontend/backend work.	Higher initial cost; lower long-term maintenance.
Legacy Refactor	AI-Characterization Tests	Generate tests for existing behavior first to create a safety net before refactoring.	Medium cost; prevents regression during migration.
Rapid Prototype	AI-Code-First	Speed is priority; tests can be added later if the prototype matures.	Lower initial cost; high risk if prototype becomes production.
Critical Security Module	AI-Spec-First + Manual Review	Rigorous spec ensures all threat models are tested; manual review catches AI blind spots.	High cost; essential for risk mitigation.

Configuration Template

Use this structured prompt template for consistent specification generation.

# Role
You are a senior test engineer. Generate a comprehensive test suite for the following module.

# Module Context
- Name: [Module Name]
- Purpose: [Brief description of functionality]
- Dependencies: [List external dependencies]

# Requirements
- Input: [Types, examples, constraints]
- Output: [Types, structure, error conditions]
- Behavior: [Key business rules]

# Edge Cases
- [Specific edge case 1]
- [Specific edge case 2]
- Analyze input types for additional boundary conditions.

# Constraints
- Framework: [Vitest/Jest/Pytest]
- Mocking: Mock all external dependencies. Use fake timers for time logic.
- Scope: Tests must only use the public API.
- Output: Provide only the test file code. Do not write implementation.

Quick Start Guide

Initialize Environment: Install Claude Code and your test framework (e.g., npm install -D vitest).
Create Spec File: Create module.spec.ts. Run the configuration template prompt in Claude Code.
Review & Run: Inspect the generated tests. Run npx vitest. Expect failures (Red phase).
Generate Impl: Prompt Claude: "Implement module.ts to pass all tests in module.spec.ts."
Verify: Run tests again. All should pass (Green phase). Commit and proceed.