Back to KB
Difficulty
Intermediate
Read Time
4 min

How I Used AI to Fix Our E2E Test Architecture

By Codcompass TeamΒ·Β·4 min read

Current Situation Analysis

The project inherited a Playwright E2E test suite comprising 38 spec files, ~165 tests, and ~14,000 lines of test infrastructure. Initial local execution revealed a critical failure mode: only 8 out of 130 non-skipped tests passed, yielding a 6% pass rate. Paradoxically, CI pipelines reported green builds. Investigation revealed that CI enforced workers: 1, masking concurrency issues, environment pollution, and race conditions that surfaced immediately when running locally with multiple workers.

Traditional refactoring approaches failed due to several compounding factors:

  • Lack of Domain Context: Zero initial knowledge of custom wrappers, architectural decisions, or test isolation boundaries made manual migration error-prone and slow.
  • Boilerplate Overload: Heavy reliance on repeated beforeAll/afterAll blocks and manual try/finally cleanup caused redundant API calls (15 per file) and fragile state management.
  • Serial Cascade Failures: Tests configured in serial mode caused a single failure to block all subsequent tests in a describe block, inflating 4 actual failures into 57 phantom "did not run" results.
  • AI Without Guardrails: Unstructured AI prompts produced inconsistent quality, outdated pattern suggestions, and hallucinations when applied directly to complex test architectures.

WOW Moment: Key Findings

By implementing a structured AI collaboration workflow combined with Playwright-native architectural patterns, we achieved measurable reductions in infrastructure overhead and flakiness. The sweet spot emerged when combining worker-scoped fixtures for expensive setup, MFE-scoped project splitting for parallelism, and a strict 7-step AI validation skill.

ApproachAPI Calls/FileSetup Lines (UI)Setup/Cleanup Lines (API)
Traditional Boilerplate15815
AI-Assisted Fixtures733

Key Findings:

  • 53% API Call Reduction: Worker-scoped fixtures shared across files eliminated redundant getUser(), createOrg(), and createProject() calls.
  • 80% Cleanup Line Reduction: Native Playwright fixture teardown replaced manual try/finally blocks across 15 files.
  • ~1,000 Lines of Boilerplate Removed: Direct Playwright locator calls replaced the legacy Actions wrapper.
  • Serial Mode Stabilization: Dedicated projects, increased timeouts (30s β†’ 60s for beforeAll), and worker capping eliminated cascade failures.
  • AI Skill Consistency: The pw-test-improvement skill enforced identical baseline testing, benchmarking, and documentation standards across all 33 migration tasks.

Core Solution

The refactoring followed a tracer bullet methodology (8 phased slices) validated by a dependency graph, enabling safe parallel execution of up to 4 AI sessions. Each bullet proved a thin end-to-end slice before expanding.

1

Results-Driven

The key to reducing hallucination by 35% lies in the Re-ranking weight matrix and dynamic tuning code below. Stop letting garbage data pollute your context window and company budget. Upgrade to Pro for the complete production-grade implementation + Blueprint (docker-compose + benchmark scripts).

Upgrade Pro, Get Full Implementation

Cancel anytime Β· 30-day money-back guarantee