How I Used AI to Fix Our E2E Test Architecture

By Codcompass Team·2026-05-07·4 min read

Current Situation Analysis

The project inherited a Playwright E2E test suite comprising 38 spec files, ~165 tests, and ~14,000 lines of test infrastructure. Initial local execution revealed a critical failure mode: only 8 out of 130 non-skipped tests passed, yielding a 6% pass rate. Paradoxically, CI pipelines reported green builds. Investigation revealed that CI enforced workers: 1, masking concurrency issues, environment pollution, and race conditions that surfaced immediately when running locally with multiple workers.

Traditional refactoring approaches failed due to several compounding factors:

Lack of Domain Context: Zero initial knowledge of custom wrappers, architectural decisions, or test isolation boundaries made manual migration error-prone and slow.
Boilerplate Overload: Heavy reliance on repeated beforeAll/afterAll blocks and manual try/finally cleanup caused redundant API calls (15 per file) and fragile state management.
Serial Cascade Failures: Tests configured in serial mode caused a single failure to block all subsequent tests in a describe block, inflating 4 actual failures into 57 phantom "did not run" results.
AI Without Guardrails: Unstructured AI prompts produced inconsistent quality, outdated pattern suggestions, and hallucinations when applied directly to complex test architectures.

WOW Moment: Key Findings

By implementing a structured AI collaboration workflow combined with Playwright-native architectural patterns, we achieved measurable reductions in infrastructure overhead and flakiness. The sweet spot emerged when combining worker-scoped fixtures for expensive setup, MFE-scoped project splitting for parallelism, and a strict 7-step AI validation skill.

Approach	API Calls/File	Setup Lines (UI)	Setup/Cleanup Lines (API)
Traditional Boilerplate	15	8	15
AI-Assisted Fixtures	7	3	3

Key Findings:

53% API Call Reduction: Worker-scoped fixtures shared across files eliminated redundant getUser(), createOrg(), and createProject() calls.
80% Cleanup Line Reduction: Native Playwright fixture teardown replaced manual try/finally blocks across 15 files.
~1,000 Lines of Boilerplate Removed: Direct Playwright locator calls replaced the legacy Actions wrapper.
Serial Mode Stabilization: Dedicated projects, increased timeouts (30s → 60s for beforeAll), and worker capping eliminated cascade failures.
AI Skill Consistency: The pw-test-improvement skill enforced identical baseline testing, benchmarking, and documentation standards across all 33 migration tasks.

Core Solution

The refactoring followed a tracer bullet methodology (8 phased slices) validated by a dependency graph, enabling safe parallel execution of up to 4 AI sessions. Each bullet proved a thin end-to-end slice before expanding.

1 Results-Driven

The key to reducing hallucination by 35% lies in the Re-ranking weight matrix and dynamic tuning code below. Stop letting garbage data pollute your context window and company budget. Upgrade to Pro for the complete production-grade implementation + Blueprint (docker-compose + benchmark scripts).

Upgrade Pro, Get Full Implementation

Cancel anytime · 30-day money-back guarantee

. Fixture Architecture Overhaul Replaced imperative beforeAll/afterAll blocks with Playwright fixtures, strictly differentiating scope:

Worker-scoped ({ scope: 'worker' }): Initialized once per worker, shared across tests. Used for expensive, immutable setup (orgs, projects).
Test-scoped (default): Fresh instance per test. Used for mutable data to prevent cross-test pollution.

2. MFE-Scoped Project Structure

Split the monolithic Playwright project into 7 isolated projects aligned with Micro-Frontend (MFE) boundaries. Each project declares a Setup dependency, enabling targeted execution, grouped HTML reports, and independent parallelism tuning.

{ name: 'Applications',  testDir: 'apps/ui/applications/e2e', dependencies: ['Setup'] },
{ name: 'Organizations', testDir: 'apps/ui/organizations/e2e', dependencies: ['Setup'] },
{ name: 'Projects',      testDir: 'apps/ui/projects/e2e',      dependencies: ['Setup'] },
// ... Subscriptions, Host, User Profile

3. Serial Cascade Mitigation

Heavy specs using serial mode were extracted into dedicated projects. Timeouts were increased to 60s for beforeAll, workers were capped to prevent API rate-limiting, and worker-scoped fixtures ensured shared setup without state leakage.

4. AI Skill Implementation (`pw-test-improvement`)

A strict 7-step validation pipeline was embedded into the AI workflow:

Identify: Select one item from the implementation tracker.
Baseline: Run affected tests 3× pre-change; record pass rate & timing.
Fix: Apply changes using embedded Playwright best practices (locator priority, anti-pattern avoidance).
Test: Run 3× post-change; all must pass.
Compare: Document before/after benchmarks.
Update: Mark tracker item complete.
Commit: Generate structured PR description only upon explicit approval.

The skill enforced Playwright CLI execution, captured raw results, and cross-referenced official documentation to prevent outdated pattern adoption.

Pitfall Guide

CI Green ≠ Local Stability: Single-worker CI masks concurrency bugs, race conditions, and environment pollution. Always validate with workers: >1 locally before trusting pipeline results.
Serial Mode Cascade Failures: One failure in serial blocks all subsequent tests, creating phantom failures. Mitigate by splitting heavy specs, increasing beforeAll timeouts, capping workers, and avoiding shared mutable state.
Over-Fixturing: Not everything belongs in a fixture. Worker-scoped fixtures share state across files, which pollutes serial tests requiring per-file isolation. Use beforeAll when test-specific setup is mandatory.
Teardown Projects in Shared CI Environments: Cleanup projects running against shared environments interfere with parallel pipelines. Revert teardown to dedicated ephemeral environments or isolate cleanup to post-run scripts.
Unstructured AI Prompts: AI excels at applying known patterns, not inventing them. Without a strict process (like the 7-step skill), AI produces inconsistent quality, hallucinations, and architectural drift.
Ignoring Documentation as Ground Truth: AI training data lags behind framework updates. Always cross-reference suggestions with official Playwright docs; treat AI output as a draft, not a specification.
Unattended AI Sessions: Running AI without oversight leads to silent regressions. Cap parallel sessions (max 4), maintain active human review, and verify every benchmark against actual CLI output.

Deliverables

📘 Playwright E2E Refactoring Blueprint: A phased tracer-bullet architecture map detailing dependency graphs, fixture scope guidelines, MFE project splitting strategy, and serial mode mitigation patterns. Includes decision matrices for worker vs test-scoped fixtures and API vs UI migration paths.
✅ AI-Assisted Test Migration Checklist: A 7-step validation workflow covering baseline execution, anti-pattern verification, documentation cross-checking, benchmark comparison, and structured commit generation. Designed for repeatable, auditable AI-human collaboration.
⚙️ Configuration Templates: Production-ready Playwright config snippets for MFE-scoped projects, worker-scoped fixture definitions, CI workflow adjustments for parallel execution, and teardown isolation patterns. Ready to drop into existing playwright.config.ts implementations.

Current Situation Analysis

WOW Moment: Key Findings

Core Solution

1

Results-Driven

Production Bundle