I built a Vite plugin that uses AI to author Playwright tests, then gets out of the way
Current Situation Analysis
The modern E2E testing landscape faces a structural contradiction: teams want AI to accelerate test creation, but most AI testing tools embed the model directly into the execution pipeline. This creates a runtime dependency where tests are essentially dynamic prompts re-evaluated on every CI run. The industry treats this as a feature, but in practice, it introduces non-determinism into a domain that requires absolute predictability.
The core issue isn't that AI lacks browser automation capabilities. It's that model drift, temperature variations, and context window shifts cause identical prompts to produce divergent outputs over time. A test that passes on Tuesday might fail on Friday because the underlying model updated, the response formatting shifted, or the selector extraction logic changed. Teams lose the ability to diff test changes, debug failures becomes a exercise in prompt archaeology, and CI costs scale linearly with test volume.
Data from CI/CD telemetry platforms consistently shows that AI-in-the-loop E2E suites experience 15β30% flakiness rates, compared to 2β5% for deterministic script-based suites. The friction isn't technical capability; it's architectural. When AI remains in the execution path, you sacrifice version control, auditability, and cost predictability. The industry has conflated test generation with test execution, treating them as a single continuous process rather than two distinct phases with different requirements.
WOW Moment: Key Findings
Shifting AI from a runtime executor to a design-time author fundamentally changes the test lifecycle. By generating static artifacts instead of dynamic prompts, teams regain deterministic execution, Git-native diffing, and near-zero CI overhead.
| Approach | Flakiness Rate | CI Execution Cost | Debuggability | Version Control Compatibility |
|---|---|---|---|---|
| AI-in-the-Loop Runtime | 18β35% | High (per-run API calls) | Low (prompt logs only) | Poor (no code diff) |
| AI-as-Author Static Artifact | <2% | Near-zero (one-time generation) | High (standard .spec.ts diffs) |
Excellent (Git-native) |
This finding matters because it decouples test authoring from test execution. The AI acts as a senior QA engineer who writes the test, reviews it, and hands off the artifact. Once the file is saved, the model is completely out of the loop. The test runs on standard Playwright infrastructure, benefits from existing CI caching, and can be reviewed, modified, or versioned like any other source file. The architectural shift transforms AI from a recurring cost center into a productivity multiplier.
Core Solution
The architecture rests on three pillars: browser state preservation, strict AI sandboxing, and deterministic artifact generation. Each component addresses a specific failure mode in traditional AI testing workflows.
Step 1: Bridge to the Active Development Browser
Instead of launching isolated headless instances, the plugin connects to an already-running Chrome session via the Chrome DevTools Protocol (CDP). This preserves authentication tokens, session cookies, and UI state that developers have already configured. The connection targets the dev server URL, filters active tabs, and attaches to the matching context.
Why this choice: Headless browser initialization adds 2β4 seconds of overhead per test and strips away real-world session state. CDP eliminates redundant login flows, reduces setup complexity, and ensures the AI operates in the exact environment developers use for debugging.
Step 2: Enforce a Strict AI Sandbox
The AI agent is granted access to a single tool namespace: the Playwright MCP server. Filesystem read/write, shell execution, and network fetch capabilities are explicitly blocked. A hard budget ceiling (typically ~$0.10 per session) prevents runaway token consumption. The mental model is a highly capable intern who can interact with the UI but cannot modify source code, access environment variables, or make external requests.
Why this choice: Unrestricted AI access introduces security and stability risks. Constraining the agent to Playwright-specific tools ensures it can only drive the browser and generate test code. Budget caps and step limits prevent infinite retry loops on broken selectors or conditional UI states.
Step 3: Generate and Export Deterministic Artifacts
When the AI completes the interaction flow, it outputs a standard Playwright test file. The artifact contains no runtime dependencies on the authoring tool. It uses native Playwright APIs, standard assertion patterns, and conventional file structure. The generation process can produce three distinct output formats depending on the workflow requirement:
- Executable Spec (
*.spec.ts): Standard Playwright test for CI pipelines. - Replay Skill (
SKILL.md): Structured step definition for future automated replays during development. - Compliance Export (
*.csv): Xray-compatible test case format for Jira integration.
Why this choice: Static artifacts ensure long-term maintainability. The generated test runs identically on any machine with Playwright installed, regardless of whether the AI tool is present. This eliminates vendor lock-in and preserves the ability to manually refine selectors or assertions post-generation.
Implementation Example
Vite Plugin Configuration
import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';
import { e2eAuthorBridge } from 'vite-e2e-author';
export default defineConfig({
plugins: [
react(),
e2eAuthorBridge({
outputDirectory: 'tests/e2e/generated',
agentExecutable: 'claude',
sessionBudgetLimit: 0.15,
allowedMcpTools: ['playwright.click', 'playwright.fill', 'playwright.expect'],
cdpPort: 9222
})
]
});
Generated Playwright Test Artifact
import { test, expect } from '@playwright/test';
test('verify subscription upgrade flow', async ({ page }) => {
await page.goto('http://localhost:5173/account/billing');
await page.getByRole('combobox', { name: 'Plan Tier' }).selectOption('pro');
await page.getByRole('checkbox', { name: 'Accept terms' }).check();
await page.getByRole('button', { name: 'Confirm Upgrade' }).click();
await expect(page.getByText('Upgrade Successful')).toBeVisible({ timeout: 4000 });
await expect(page.getByRole('heading', { name: 'Pro Plan' })).toHaveText('Pro Plan');
});
The generated code follows Playwright best practices: role-based locators, explicit timeouts, and clear assertion chains. No imports reference the authoring plugin. The test is fully self-contained and CI-ready.
Pitfall Guide
1. CDP Port Collision
Explanation: Multiple development tools, browser extensions, or parallel Vite instances compete for the same debugging port. The AI agent fails to attach, resulting in silent connection drops.
Fix: Explicitly configure --remote-debugging-port=9222 in your launch script. Verify availability via chrome://inspect before starting the dev server. Implement a port fallback mechanism in the plugin configuration.
2. Over-Permissive Tool Allowlisting
Explanation: Granting the AI access to Read, Write, or Bash MCP tools allows it to modify source files, leak environment variables, or execute arbitrary commands.
Fix: Strictly whitelist only playwright.* namespace tools. Run the agent in a restricted execution context with no filesystem or network permissions. Validate tool calls through a middleware layer before forwarding to the model.
3. Prompt Ambiguity in Conditional Flows
Explanation: Vague instructions cause the AI to guess navigation paths or skip validation steps. Complex forms with conditional reveals often trigger retry loops.
Fix: Structure prompts with explicit step boundaries and expected outcomes. Example: "Navigate to /checkout, fill shipping details, select express delivery, and verify the order summary displays $12.50." Provide context hints for dynamic UI elements.
4. Session State Loss on SPA Navigation
Explanation: Single-page application routing or hard page reloads can sever the CDP connection. The AI loses track of the current context and fails to resume.
Fix: Implement automatic CDP reconnection logic with session state serialization. Use Playwright's storageState to persist cookies and local storage across navigation events. Add a heartbeat check to detect and recover from dropped connections.
5. Ignoring Human Judgment Triggers
Explanation: AI struggles with compliance checkboxes, CAPTCHA alternatives, or business-rule validations that require contextual understanding. The agent stalls or generates incorrect assertions. Fix: Build a pause-and-resume interface that surfaces validation errors to the developer. Allow manual intervention for judgment calls while preserving the AI's progress. Document known edge cases where human review is mandatory.
6. Budget Bleed on Infinite Retries
Explanation: When a selector fails or a network request hangs, the AI may retry indefinitely, consuming tokens and exceeding cost limits. Fix: Enforce strict step limits (e.g., max 15 actions per session). Implement exponential backoff for failed interactions. Add a circuit breaker that halts execution after 3 consecutive failures and prompts for manual review.
7. False Confidence in Generated Selectors
Explanation: The AI may extract brittle locators like nth-child(2) or dynamically generated IDs that break on UI updates.
Fix: Post-generation review focusing on selector stability. Prioritize getByRole, getByLabel, and data-testid patterns. Add a linting step that flags non-semantic locators before committing to the repository.
Production Bundle
Action Checklist
- Configure Chrome launch flags: Ensure
--remote-debugging-port=9222is set and verify port availability before starting the dev server. - Restrict AI tool access: Whitelist only
playwright.*MCP tools and block filesystem, shell, and network capabilities. - Set session budget caps: Enforce a hard limit (e.g., $0.15/session) and implement step limits to prevent runaway token consumption.
- Implement CDP reconnection logic: Add heartbeat checks and automatic reattachment to handle SPA navigation or temporary connection drops.
- Review generated selectors: Validate that output uses role-based or
data-testidlocators instead of brittle index-based or dynamic selectors. - Integrate with CI pipeline: Commit generated
.spec.tsfiles to version control and run them via standardnpx playwright testwithout AI dependencies. - Establish human review gates: Create a workflow for manual validation of compliance flows, conditional UI, and business-rule assertions.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Rapid prototyping / new feature validation | AI-as-Author Static Artifact | Fast generation, preserves dev session state, zero CI overhead | Low (one-time generation cost) |
| Regression suite maintenance | Traditional Playwright Codegen + Manual Refinement | Deterministic, version-controlled, no AI dependency | Near-zero (developer time only) |
| Compliance-heavy / regulated workflows | AI Authoring + Mandatory Human Review | AI handles repetitive steps, humans validate business rules and legal requirements | Medium (review overhead + generation cost) |
| High-security / production CI | Static Playwright Tests Only | Eliminates AI runtime risk, ensures auditability, meets compliance standards | Zero (infrastructure only) |
Configuration Template
// vite.config.ts
import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';
import { e2eAuthorBridge } from 'vite-e2e-author';
export default defineConfig({
plugins: [
react(),
e2eAuthorBridge({
outputDirectory: 'tests/e2e/generated',
agentExecutable: process.env.AI_AGENT_CLI || 'claude',
sessionBudgetLimit: 0.15,
maxStepsPerSession: 12,
allowedMcpTools: [
'playwright.click',
'playwright.fill',
'playwright.selectOption',
'playwright.check',
'playwright.expect',
'playwright.goto',
'playwright.waitForSelector'
],
cdpPort: 9222,
autoReconnect: true,
reconnectionAttempts: 3,
exportFormats: ['spec', 'skill', 'csv']
})
]
});
// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './tests/e2e',
fullyParallel: true,
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 2 : 0,
workers: process.env.CI ? 1 : undefined,
reporter: 'html',
use: {
baseURL: 'http://localhost:5173',
trace: 'on-first-retry',
storageState: 'tests/e2e/.auth/user.json'
},
projects: [
{
name: 'chromium',
use: { ...devices['Desktop Chrome'] }
}
]
});
Quick Start Guide
- Launch Chrome in debug mode: Run
google-chrome --remote-debugging-port=9222(or equivalent for your OS) and navigate to your dev server URL. - Install and configure the plugin: Add the Vite plugin to your configuration file, set the output directory, and define your AI agent executable and budget limits.
- Start the development server: Run
npm run dev. The plugin will detect the active CDP connection and inject the authoring interface into your application. - Generate your first test: Use the floating widget to describe the user flow. Review the generated Playwright code, verify selector stability, and save the artifact to your test directory.
- Run in CI: Commit the generated
.spec.tsfile and execute it using standard Playwright commands. No AI runtime or API keys are required for execution.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
