Why green CI doesn't mean your system works
The Phantom Run Paradox: Detecting Silent Duplication in E2E Test Pipelines
Current Situation Analysis
Engineering teams frequently treat a green CI badge as the definitive proof of system health. However, during technology migrations, this trust becomes a critical liability. The industry pain point is the Phantom Run Paradox: a state where the test runner executes duplicated workloads, inflating runtime and costs while maintaining a perfect pass rate. This issue is pervasive in ecosystems transitioning from JavaScript to TypeScript, where legacy artifacts coexist with modernized code.
This problem is systematically overlooked because it mimics expected behavior. When a team migrates to TypeScript, a slight increase in CI duration is anticipated due to compilation overhead. Engineers often attribute runtime creep to this "normal" migration tax rather than investigating the test runner's discovery mechanism. Consequently, silent duplication persists for weeks, draining CI minutes and slowing feedback loops without triggering a single alert.
Data from migration case studies reveals the severity of this drift. In one documented scenario, a test suite reported 240 passing tests, yet the actual unique test count was only 120. The runner was ingesting both .spec.js and .spec.ts files simultaneously. The result was a 2.0x multiplier on execution time and CI costs, with zero warnings emitted by the runner. The pipeline validated execution, but it failed to validate correctness.
WOW Moment: Key Findings
The core insight is that test count integrity is a stronger health signal than pass rate. A green build with duplicated tests provides false confidence and wastes resources. The following comparison illustrates the impact of explicit configuration versus default runner behavior during a migration phase.
| Migration Strategy | CI Runtime | Detected Test Count | Actual Unique Tests | CI Cost Multiplier | Feedback Latency |
|---|---|---|---|---|---|
| Default Runner Config | 100% | 240 | 120 | 2.0x | High (Artificial) |
Explicit testMatch + Sentinel |
50% | 120 | 120 | 1.0x | Baseline |
| Legacy Cleanup Only | 50% | 120 | 120 | 1.0x | Baseline |
Why this matters: The default configuration creates a "Phantom Run" where the runner reports a healthy suite, but the engineering team pays for double the compute. By locking the test discovery pattern and implementing a test count sentinel, teams can instantly halve CI runtime, reduce cloud costs, and restore accurate feedback latency. More importantly, this approach prevents the "silent regression" where tests disappear or duplicate without alerting the pipeline.
Core Solution
Resolving the Phantom Run Paradox requires a two-layer defense: strict test discovery configuration and runtime validation of test counts. This solution assumes a Playwright-based TypeScript migration but applies to any test runner with glob-based file discovery.
Architecture Decisions
- Explicit
testMatchRegex: Relying on default globs is unsafe during migrations. We must define a strict regular expression that excludes legacy extensions and non-test TypeScript files (e.g., helpers, types). - Test Count Sentinel: A post-run validation script that parses the test report and fails the build if the test count deviates from the expected baseline. This catches scenarios where tests are accidentally excluded or configuration errors result in zero tests.
- Artifact Isolation: The configuration should separate test artifacts from source code to prevent accidental ingestion of application files.
Implementation
1. Lock Test Discovery
Update playwright.config.ts to enforce a strict pattern. This prevents the runner from picking up .js files or TypeScript files that do not match the test naming convention.
// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './e2e',
// STRICT DISCOVERY: Only match files ending in .e2e.ts
// This excludes .js legacy files and .ts helper modules.
testMatch: /.*\.e2e\.ts$/,
reporter: [
['list'],
['json', { outputFile: 'test-results/report.json' }]
],
use: {
baseURL: 'http://localhost:3000',
trace: 'on-first-retry',
},
projects: [
{
name: 'chromium',
use: { ...devices['Desktop Chrome'] },
},
],
});
Rationale: The regex /.*\.e2e\.ts$/ ensures that only files explicitly marked as E2E tests are executed. This immediately neutralizes the risk of duplicating tests that exist in both .js and .ts formats.
2. Implement Test Count Sentinel
Add a validation script that runs after the test suite. This script reads the JSON report and enforces a minimum test count threshold.
// scripts/validate-test-count.ts
import fs from 'fs';
import path from 'path';
const REPORT_PATH = path.resolve('test-results/report.json');
const MIN_EXPECTED_TESTS = 115; // Baseline with 5% tolerance for dynamic tests
function validateTestCount(): void {
if (!fs.existsSync(REPORT_PATH)) {
console.error('FATAL: Test report not found. Runner may have failed silently or config is invalid.');
process.exit(1);
}
try {
const raw = fs.readFileSync(REPORT_PATH, 'utf-8');
const report = JSON.parse(raw);
// Calculate total unique specs across all suites
const totalTests = report.suites.reduce(
(acc: number, suite: { specs: unknown[] }) => acc + suite.specs.length,
0
);
if (totalTests < MIN_EXPECTED_TESTS) {
console.error(`VALIDATION FAILED: Expected at least ${MIN_EXPECTED_TESTS} tests, found ${totalTests}.`);
console.error('Possible causes: testMatch misconfiguration, missing files, or runner error.');
process.exit(1);
}
console.log(`PASS: Test count ${totalTests} meets threshold of ${MIN_EXPECTED_TESTS}.`);
} catch (error) {
console.error('FATAL: Failed to parse test report.', error);
process.exit(1);
}
}
validateTestCount();
Rationale: This sentinel acts as a circuit breaker. If a configuration change accidentally excludes all tests, or if a migration step removes files without updating the runner, the build fails immediately. This prevents the "green build with zero tests" scenario, which is equally dangerous as duplication.
3. Pipeline Integration
Update the CI workflow to execute the sentinel after the test run.
# .github/workflows/e2e.yml
- name: Run E2E Tests
run: npx playwright test --reporter=json
- name: Validate Test Count
run: npx ts-node scripts/validate-test-count.ts
Pitfall Guide
1. The Compilation Excuse
Explanation: Assuming that increased CI runtime is solely due to TypeScript compilation overhead. Fix: Profile test count alongside runtime. If runtime increases but test count remains stable, investigate runner configuration before blaming compilation.
2. Default Glob Reliance
Explanation: Trusting the test runner's default file discovery patterns during a migration.
Fix: Always define testMatch or testRegex explicitly. Defaults are designed for convenience, not migration safety.
3. Delete-Before-Lock Strategy
Explanation: Removing legacy .js files before locking the testMatch pattern.
Fix: Lock the configuration first, verify the test count, then remove legacy artifacts. This ensures the runner is not silently falling back to defaults during the transition.
4. Ignoring Test Count Drift
Explanation: Failing to monitor the number of tests executed over time. Fix: Implement a test count sentinel and track the metric in your CI dashboard. Alert on deviations greater than 5%.
5. Mixed Extensions in testDir
Explanation: Keeping .js and .ts test files in the same directory without strict filtering.
Fix: Use strict regex patterns or separate directories for legacy and modern tests. Prefer regex patterns for flexibility.
6. Pass Rate as Sole Health Metric
Explanation: Relying on 100% pass rate to indicate a healthy suite. Fix: Pass rate is meaningless if tests are duplicated or missing. Combine pass rate with test count and coverage metrics.
7. No Artifact Cleanup in CI
Explanation: CI environments retaining stale artifacts from previous runs. Fix: Ensure CI jobs start with a clean workspace or use fresh clones. Stale files can cause unexpected runner behavior.
Production Bundle
Action Checklist
- Define explicit
testMatchregex inplaywright.config.tsto exclude legacy extensions. - Add a test count sentinel script to validate the number of executed tests.
- Baseline the expected test count in your CI dashboard and set alert thresholds.
- Audit
testDirfor mixed file extensions and remove legacy artifacts after config lock. - Implement a "Fail on Zero Tests" guard in the pipeline configuration.
- Review migration strategy: rename files to match new patterns before deleting old ones.
- Monitor CI runtime and test count trends weekly to detect silent regressions.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Active JS/TS Migration | testMatch: /.*\.ts$/ |
Isolates new code immediately; prevents duplication. | Low config cost; reduces runtime by 50%. |
| Stable TS Codebase | testMatch: /.*\.spec\.ts$/ |
Prevents accidental inclusion of helpers or types. | Zero runtime cost; improves reliability. |
| High-Flakiness Suite | testMatch + Retry Config |
Focuses on stability; retries only valid tests. | Higher compute for retries; better signal. |
| Multi-Project Monorepo | testMatch per project |
Ensures each package runs only its own tests. | Reduces cross-contamination; optimizes parallelism. |
Configuration Template
Copy this template for a robust Playwright configuration with strict discovery and reporting.
// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './e2e',
// Strict pattern: Only .e2e.ts files
testMatch: /.*\.e2e\.ts$/,
// Reporters for CI and local debugging
reporter: process.env.CI
? [['json', { outputFile: 'test-results/report.json' }], ['github']]
: [['list'], ['html', { open: 'never' }]],
// Global settings
timeout: 30000,
expect: { timeout: 5000 },
// Retry logic for flaky tests
retries: process.env.CI ? 2 : 0,
// Parallel execution
workers: process.env.CI ? 1 : undefined,
use: {
baseURL: process.env.BASE_URL || 'http://localhost:3000',
trace: 'on-first-retry',
screenshot: 'only-on-failure',
},
projects: [
{
name: 'chromium',
use: { ...devices['Desktop Chrome'] },
},
],
});
Quick Start Guide
- Update Config: Add
testMatch: /.*\.e2e\.ts$/to yourplaywright.config.ts. - Verify Count: Run
npx playwright test --listto confirm the runner detects only the expected tests. - Add Sentinel: Create
scripts/validate-test-count.tsand add it to your CI pipeline after the test step. - Commit & Monitor: Push changes and verify that CI runtime drops and the test count matches your baseline.
- Clean Up: Once validated, remove legacy
.jstest files to finalize the migration.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
