Why green CI doesn't mean your system works

The Phantom Run Paradox: Detecting Silent Duplication in E2E Test Pipelines

Current Situation Analysis

Engineering teams frequently treat a green CI badge as the definitive proof of system health. However, during technology migrations, this trust becomes a critical liability. The industry pain point is the Phantom Run Paradox: a state where the test runner executes duplicated workloads, inflating runtime and costs while maintaining a perfect pass rate. This issue is pervasive in ecosystems transitioning from JavaScript to TypeScript, where legacy artifacts coexist with modernized code.

This problem is systematically overlooked because it mimics expected behavior. When a team migrates to TypeScript, a slight increase in CI duration is anticipated due to compilation overhead. Engineers often attribute runtime creep to this "normal" migration tax rather than investigating the test runner's discovery mechanism. Consequently, silent duplication persists for weeks, draining CI minutes and slowing feedback loops without triggering a single alert.

Data from migration case studies reveals the severity of this drift. In one documented scenario, a test suite reported 240 passing tests, yet the actual unique test count was only 120. The runner was ingesting both .spec.js and .spec.ts files simultaneously. The result was a 2.0x multiplier on execution time and CI costs, with zero warnings emitted by the runner. The pipeline validated execution, but it failed to validate correctness.

WOW Moment: Key Findings

The core insight is that test count integrity is a stronger health signal than pass rate. A green build with duplicated tests provides false confidence and wastes resources. The following comparison illustrates the impact of explicit configuration versus default runner behavior during a migration phase.

Migration Strategy	CI Runtime	Detected Test Count	Actual Unique Tests	CI Cost Multiplier	Feedback Latency
Default Runner Config	100%	240	120	2.0x	High (Artificial)
Explicit `testMatch` + Sentinel	50%	120	120	1.0x	Baseline
Legacy Cleanup Only	50%	120	120	1.0x	Baseline

Why this matters: The default configuration creates a "Phantom Run" where the runner reports a healthy suite, but the engineering team pays for double the compute. By locking the test discovery pattern and implementing a test count sentinel, teams can instantly halve CI runtime, reduce cloud costs, and restore accurate feedback latency. More importantly, this approach prevents the "silent regression" where tests disappear or duplicate without alerting the pipeline.

Core Solution

Resolving the Phantom Run Paradox requires a two-layer defense: strict test discovery configuration and runtime validation of test counts. This solution assumes a Playwright-based TypeScript migration but applies to any test runner with glob-based file discovery.

Architecture Decisions

Explicit testMatch Regex: Relying on default globs is unsafe during migrations. We must define a strict regular expression that excludes legacy extensions and non-test TypeScript files (e.g., helpers, types).
Test Count Sentinel: A post-run validation script that parses the test report and fails the build if the test count deviates from the expected baseline. This catches scenarios where tests are accidentally excluded or configuration errors result in zero tests.
Artifact Isolation: The configuration should separate test artifacts from source code to prevent accidental ingestion of application files.

Implementation

1. Lock Test Discovery

Update playwright.config.ts to enforce a strict pattern. This prevents the runner from picking up .js files or TypeScript files that do not match the test naming convention.

// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './e2e',
  
  // STRICT DISCOVERY: Only match files ending in .e2e.ts
  // This excludes .js legacy files and .ts helper modules.
  testMatch: /.*\.e2e\.ts$/,
  
  reporter: [
    ['list'],
    ['json', { outputFile: 'test-results/report.json' }]
  ],
  
  use: {
    baseURL: 'http://localhost:3000',
    trace: 'on-first-retry',
  },
  
  projects: [
    {
      name: 'chromium',
      use: { ...devices['Desktop Chrome'] },
    },
  ],
});

Rationale: The regex /.*\.e2e\.ts$/ ensures that only files explicitly marked as E2E tests are executed. This immediately neutralizes the risk of duplicating tests that exist in both .js and .ts formats.

2. Implement Test Count Sentinel

Add a validation script that runs after the test suite. This script reads the JSON report and enforces a minimum test count threshold.

// scripts/validate-test-count.ts
import fs from 'fs';
import path from 'path';

const REPORT_PATH = path.resolve('test-results/report.json');
const MIN_EXPECTED_TESTS = 115; // Baseline with 5% tolerance for dynamic tests

function validateTestCount(): void {
  if (!fs.existsSync(REPORT_PATH)) {
    console.error('FATAL: Test report not found. Runner may have failed silently or config is invalid.');
    process.exit(1);
  }

  try {
    const raw = fs.readFileSync(REPORT_PATH, 'utf-8');
    const report = JSON.parse(raw);
    
    // Calculate total unique specs across all suites
    const totalTests = report.suites.reduce(
      (acc: number, suite: { specs: unknown[] }) => acc + suite.specs.length, 
      0
    );

    if (totalTests < MIN_EXPECTED_TESTS) {
      console.error(`VALIDATION FAILED: Expected at least ${MIN_EXPECTED_TESTS} tests, found ${totalTests}.`);
      console.error('Possible causes: testMatch misconfiguration, missing files, or runner error.');
      process.exit(1);
    }

    console.log(`PASS: Test count ${totalTests} meets threshold of ${MIN_EXPECTED_TESTS}.`);
  } catch (error) {
    console.error('FATAL: Failed to parse test report.', error);
    process.exit(1);
  }
}

validateTestCount();

Rationale: This sentinel acts as a circuit breaker. If a configuration change accidentally excludes all tests, or if a migration step removes files without updating the runner, the build fails immediately. This prevents the "green build with zero tests" scenario, which is equally dangerous as duplication.

3. Pipeline Integration

Update the CI workflow to execute the sentinel after the test run.

# .github/workflows/e2e.yml
- name: Run E2E Tests
  run: npx playwright test --reporter=json

- name: Validate Test Count
  run: npx ts-node scripts/validate-test-count.ts

Pitfall Guide

1. The Compilation Excuse

Explanation: Assuming that increased CI runtime is solely due to TypeScript compilation overhead. Fix: Profile test count alongside runtime. If runtime increases but test count remains stable, investigate runner configuration before blaming compilation.

2. Default Glob Reliance

Explanation: Trusting the test runner's default file discovery patterns during a migration. Fix: Always define testMatch or testRegex explicitly. Defaults are designed for convenience, not migration safety.

3. Delete-Before-Lock Strategy

Explanation: Removing legacy .js files before locking the testMatch pattern. Fix: Lock the configuration first, verify the test count, then remove legacy artifacts. This ensures the runner is not silently falling back to defaults during the transition.

4. Ignoring Test Count Drift

Explanation: Failing to monitor the number of tests executed over time. Fix: Implement a test count sentinel and track the metric in your CI dashboard. Alert on deviations greater than 5%.

5. Mixed Extensions in `testDir`

Explanation: Keeping .js and .ts test files in the same directory without strict filtering. Fix: Use strict regex patterns or separate directories for legacy and modern tests. Prefer regex patterns for flexibility.

6. Pass Rate as Sole Health Metric

Explanation: Relying on 100% pass rate to indicate a healthy suite. Fix: Pass rate is meaningless if tests are duplicated or missing. Combine pass rate with test count and coverage metrics.

7. No Artifact Cleanup in CI

Explanation: CI environments retaining stale artifacts from previous runs. Fix: Ensure CI jobs start with a clean workspace or use fresh clones. Stale files can cause unexpected runner behavior.

Production Bundle

Action Checklist

Define explicit testMatch regex in playwright.config.ts to exclude legacy extensions.
Add a test count sentinel script to validate the number of executed tests.
Baseline the expected test count in your CI dashboard and set alert thresholds.
Audit testDir for mixed file extensions and remove legacy artifacts after config lock.
Implement a "Fail on Zero Tests" guard in the pipeline configuration.
Review migration strategy: rename files to match new patterns before deleting old ones.
Monitor CI runtime and test count trends weekly to detect silent regressions.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Active JS/TS Migration	`testMatch: /.*\.ts$/`	Isolates new code immediately; prevents duplication.	Low config cost; reduces runtime by 50%.
Stable TS Codebase	`testMatch: /.*\.spec\.ts$/`	Prevents accidental inclusion of helpers or types.	Zero runtime cost; improves reliability.
High-Flakiness Suite	`testMatch` + Retry Config	Focuses on stability; retries only valid tests.	Higher compute for retries; better signal.
Multi-Project Monorepo	`testMatch` per project	Ensures each package runs only its own tests.	Reduces cross-contamination; optimizes parallelism.

Configuration Template

Copy this template for a robust Playwright configuration with strict discovery and reporting.

// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './e2e',
  
  // Strict pattern: Only .e2e.ts files
  testMatch: /.*\.e2e\.ts$/,
  
  // Reporters for CI and local debugging
  reporter: process.env.CI 
    ? [['json', { outputFile: 'test-results/report.json' }], ['github']]
    : [['list'], ['html', { open: 'never' }]],
  
  // Global settings
  timeout: 30000,
  expect: { timeout: 5000 },
  
  // Retry logic for flaky tests
  retries: process.env.CI ? 2 : 0,
  
  // Parallel execution
  workers: process.env.CI ? 1 : undefined,
  
  use: {
    baseURL: process.env.BASE_URL || 'http://localhost:3000',
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
  },
  
  projects: [
    {
      name: 'chromium',
      use: { ...devices['Desktop Chrome'] },
    },
  ],
});

Quick Start Guide

Update Config: Add testMatch: /.*\.e2e\.ts$/ to your playwright.config.ts.
Verify Count: Run npx playwright test --list to confirm the runner detects only the expected tests.
Add Sentinel: Create scripts/validate-test-count.ts and add it to your CI pipeline after the test step.
Commit & Monitor: Push changes and verify that CI runtime drops and the test count matches your baseline.
Clean Up: Once validated, remove legacy .js test files to finalize the migration.

Mid-Year Sale — Unlock Full Article