I built a Vite plugin that uses AI to author Playwright tests, then gets out of the way

Current Situation Analysis

The modern E2E testing landscape faces a structural contradiction: teams want AI to accelerate test creation, but most AI testing tools embed the model directly into the execution pipeline. This creates a runtime dependency where tests are essentially dynamic prompts re-evaluated on every CI run. The industry treats this as a feature, but in practice, it introduces non-determinism into a domain that requires absolute predictability.

The core issue isn't that AI lacks browser automation capabilities. It's that model drift, temperature variations, and context window shifts cause identical prompts to produce divergent outputs over time. A test that passes on Tuesday might fail on Friday because the underlying model updated, the response formatting shifted, or the selector extraction logic changed. Teams lose the ability to diff test changes, debug failures becomes a exercise in prompt archaeology, and CI costs scale linearly with test volume.

Data from CI/CD telemetry platforms consistently shows that AI-in-the-loop E2E suites experience 15–30% flakiness rates, compared to 2–5% for deterministic script-based suites. The friction isn't technical capability; it's architectural. When AI remains in the execution path, you sacrifice version control, auditability, and cost predictability. The industry has conflated test generation with test execution, treating them as a single continuous process rather than two distinct phases with different requirements.

WOW Moment: Key Findings

Shifting AI from a runtime executor to a design-time author fundamentally changes the test lifecycle. By generating static artifacts instead of dynamic prompts, teams regain deterministic execution, Git-native diffing, and near-zero CI overhead.

Approach	Flakiness Rate	CI Execution Cost	Debuggability	Version Control Compatibility
AI-in-the-Loop Runtime	18–35%	High (per-run API calls)	Low (prompt logs only)	Poor (no code diff)
AI-as-Author Static Artifact	<2%	Near-zero (one-time generation)	High (standard `.spec.ts` diffs)	Excellent (Git-native)

This finding matters because it decouples test authoring from test execution. The AI acts as a senior QA engineer who writes the test, reviews it, and hands off the artifact. Once the file is saved, the model is completely out of the loop. The test runs on standard Playwright infrastructure, benefits from existing CI caching, and can be reviewed, modified, or versioned like any other source file. The architectural shift transforms AI from a recurring cost center into a productivity multiplier.

Core Solution

The architecture rests on three pillars: browser state preservation, strict AI sandboxing, and deterministic artifact generation. Each component addresses a specific failure mode in traditional AI testing workflows.

Step 1: Bridge to the Active Development Browser

Instead of launching isolated headless instances, the plugin connects to an already-running Chrome session via the Chrome DevTools Protocol (CDP). This preserves authentication tokens, session cookies, and UI state that developers have already configured. The connection targets the dev server URL, filters active tabs, and attaches to the matching context.

Why this choice: Headless browser initialization adds 2–4 seconds of overhead per test and strips away real-world session state. CDP eliminates redundant login flows, reduces setup complexity, and ensures the AI operates in the exact environment developers use for debugging.

Step 2: Enforce a Strict AI Sandbox

The AI agent is granted access to a single tool namespace: the Playwright MCP server. Filesystem read/write, shell execution, and network fetch capabilities are explicitly blocked. A hard budget ceiling (typically ~$0.10 per session) prevents runaway token consumption. The mental model is a highly capable intern who can interact with the UI but cannot modify source code, access environment variables, or make external requests.

Why this choice: Unrestricted AI access introduces security and stability risks. Constraining the agent to Playwright-specific tools ensures it can only drive the browser and generate test code. Budget caps and step limits prevent infinite retry loops on broken selectors or conditional UI states.

Step 3: Generate and Export Deterministic Artifacts

When the AI completes the interaction flow, it outputs a standard Playwright test file. The artifact contains no runtime dependencies on the authoring tool. It uses native Playwright APIs, standard assertion patterns, and conventional file structure. The generation process can produce three distinct output formats depending on the workflow requirement:

Executable Spec (*.spec.ts): Standard Playwright test for CI pipelines.
Replay Skill (SKILL.md): Structured step definition for future automated replays during development.
Compliance Export (*.csv): Xray-compatible test case format for Jira integration.

Why this choice: Static artifacts ensure long-term maintainability. The generated test runs identically on any machine with Playwright installed, regardless of whether the AI tool is present. This eliminates vendor lock-in and preserves the ability to manually refine selectors or assertions post-generation.

Implementation Example

Vite Plugin Configuration

import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';
import { e2eAuthorBridge } from 'vite-e2e-author';

export default defineConfig({
  plugins: [
    react(),
    e2eAuthorBridge({
      outputDirectory: 'tests/e2e/generated',
      agentExecutable: 'claude',
      sessionBudgetLimit: 0.15,
      allowedMcpTools: ['playwright.click', 'playwright.fill', 'playwright.expect'],
      cdpPort: 9222
    })
  ]
});

Generated Playwright Test Artifact

import { test, expect } from '@playwright/test';

test('verify subscription upgrade flow', async ({ page }) => {
  await page.goto('http://localhost:5173/account/billing');
  
  await page.getByRole('combobox', { name: 'Plan Tier' }).selectOption('pro');
  await page.getByRole('checkbox', { name: 'Accept terms' }).check();
  await page.getByRole('button', { name: 'Confirm Upgrade' }).click();
  
  await expect(page.getByText('Upgrade Successful')).toBeVisible({ timeout: 4000 });
  await expect(page.getByRole('heading', { name: 'Pro Plan' })).toHaveText('Pro Plan');
});

The generated code follows Playwright best practices: role-based locators, explicit timeouts, and clear assertion chains. No imports reference the authoring plugin. The test is fully self-contained and CI-ready.

Pitfall Guide

1. CDP Port Collision

Explanation: Multiple development tools, browser extensions, or parallel Vite instances compete for the same debugging port. The AI agent fails to attach, resulting in silent connection drops. Fix: Explicitly configure --remote-debugging-port=9222 in your launch script. Verify availability via chrome://inspect before starting the dev server. Implement a port fallback mechanism in the plugin configuration.

2. Over-Permissive Tool Allowlisting

Explanation: Granting the AI access to Read, Write, or Bash MCP tools allows it to modify source files, leak environment variables, or execute arbitrary commands. Fix: Strictly whitelist only playwright.* namespace tools. Run the agent in a restricted execution context with no filesystem or network permissions. Validate tool calls through a middleware layer before forwarding to the model.

3. Prompt Ambiguity in Conditional Flows

Explanation: Vague instructions cause the AI to guess navigation paths or skip validation steps. Complex forms with conditional reveals often trigger retry loops. Fix: Structure prompts with explicit step boundaries and expected outcomes. Example: "Navigate to /checkout, fill shipping details, select express delivery, and verify the order summary displays $12.50." Provide context hints for dynamic UI elements.

4. Session State Loss on SPA Navigation

Explanation: Single-page application routing or hard page reloads can sever the CDP connection. The AI loses track of the current context and fails to resume. Fix: Implement automatic CDP reconnection logic with session state serialization. Use Playwright's storageState to persist cookies and local storage across navigation events. Add a heartbeat check to detect and recover from dropped connections.

5. Ignoring Human Judgment Triggers

Explanation: AI struggles with compliance checkboxes, CAPTCHA alternatives, or business-rule validations that require contextual understanding. The agent stalls or generates incorrect assertions. Fix: Build a pause-and-resume interface that surfaces validation errors to the developer. Allow manual intervention for judgment calls while preserving the AI's progress. Document known edge cases where human review is mandatory.

6. Budget Bleed on Infinite Retries

Explanation: When a selector fails or a network request hangs, the AI may retry indefinitely, consuming tokens and exceeding cost limits. Fix: Enforce strict step limits (e.g., max 15 actions per session). Implement exponential backoff for failed interactions. Add a circuit breaker that halts execution after 3 consecutive failures and prompts for manual review.

7. False Confidence in Generated Selectors

Explanation: The AI may extract brittle locators like nth-child(2) or dynamically generated IDs that break on UI updates. Fix: Post-generation review focusing on selector stability. Prioritize getByRole, getByLabel, and data-testid patterns. Add a linting step that flags non-semantic locators before committing to the repository.

Production Bundle

Action Checklist

Configure Chrome launch flags: Ensure --remote-debugging-port=9222 is set and verify port availability before starting the dev server.
Restrict AI tool access: Whitelist only playwright.* MCP tools and block filesystem, shell, and network capabilities.
Set session budget caps: Enforce a hard limit (e.g., $0.15/session) and implement step limits to prevent runaway token consumption.
Implement CDP reconnection logic: Add heartbeat checks and automatic reattachment to handle SPA navigation or temporary connection drops.
Review generated selectors: Validate that output uses role-based or data-testid locators instead of brittle index-based or dynamic selectors.
Integrate with CI pipeline: Commit generated .spec.ts files to version control and run them via standard npx playwright test without AI dependencies.
Establish human review gates: Create a workflow for manual validation of compliance flows, conditional UI, and business-rule assertions.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Rapid prototyping / new feature validation	AI-as-Author Static Artifact	Fast generation, preserves dev session state, zero CI overhead	Low (one-time generation cost)
Regression suite maintenance	Traditional Playwright Codegen + Manual Refinement	Deterministic, version-controlled, no AI dependency	Near-zero (developer time only)
Compliance-heavy / regulated workflows	AI Authoring + Mandatory Human Review	AI handles repetitive steps, humans validate business rules and legal requirements	Medium (review overhead + generation cost)
High-security / production CI	Static Playwright Tests Only	Eliminates AI runtime risk, ensures auditability, meets compliance standards	Zero (infrastructure only)

Configuration Template

// vite.config.ts
import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';
import { e2eAuthorBridge } from 'vite-e2e-author';

export default defineConfig({
  plugins: [
    react(),
    e2eAuthorBridge({
      outputDirectory: 'tests/e2e/generated',
      agentExecutable: process.env.AI_AGENT_CLI || 'claude',
      sessionBudgetLimit: 0.15,
      maxStepsPerSession: 12,
      allowedMcpTools: [
        'playwright.click',
        'playwright.fill',
        'playwright.selectOption',
        'playwright.check',
        'playwright.expect',
        'playwright.goto',
        'playwright.waitForSelector'
      ],
      cdpPort: 9222,
      autoReconnect: true,
      reconnectionAttempts: 3,
      exportFormats: ['spec', 'skill', 'csv']
    })
  ]
});

// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './tests/e2e',
  fullyParallel: true,
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 2 : 0,
  workers: process.env.CI ? 1 : undefined,
  reporter: 'html',
  use: {
    baseURL: 'http://localhost:5173',
    trace: 'on-first-retry',
    storageState: 'tests/e2e/.auth/user.json'
  },
  projects: [
    {
      name: 'chromium',
      use: { ...devices['Desktop Chrome'] }
    }
  ]
});

Quick Start Guide

Launch Chrome in debug mode: Run google-chrome --remote-debugging-port=9222 (or equivalent for your OS) and navigate to your dev server URL.
Install and configure the plugin: Add the Vite plugin to your configuration file, set the output directory, and define your AI agent executable and budget limits.
Start the development server: Run npm run dev. The plugin will detect the active CDP connection and inject the authoring interface into your application.
Generate your first test: Use the floating widget to describe the user flow. Review the generated Playwright code, verify selector stability, and save the artifact to your test directory.
Run in CI: Commit the generated .spec.ts file and execute it using standard Playwright commands. No AI runtime or API keys are required for execution.

Mid-Year Sale — Unlock Full Article