Open Claude Design: A Weekend Harness Built on Atomic

Current Situation Analysis

Chat-based AI design tools like Anthropic's Claude Design present a conversational UX that masks a highly deterministic, multi-stage pipeline. Organizations attempting to replicate this capability typically fall into a critical failure mode: monolithic agent reconstruction. Instead of treating existing coding agents as composable primitives, teams rebuild tool loops, permission models, and sub-agent dispatch systems from scratch. This approach introduces massive technical debt, tight provider coupling, and poor cross-agent compatibility.

Traditional methods fail because they:

Treat the agent framework as the product rather than a harnessable component
Rely on heavy DSLs or YAML graph declarations that obscure orchestration logic
Lack deterministic control over refinement loops, leading to unbounded compute spend
Use text-only critique mechanisms that cause agents to hallucinate past visual mistakes instead of correcting them
Fail to separate headless (cost/speed-optimized) stages from visible (quality/interaction-optimized) stages, resulting in inefficient model routing

WOW Moment: Key Findings

Approach	Setup Time	Cross-Agent Compatibility	Headless Cost Reduction	Refinement Accuracy	Orchestration LoC
Monolithic Agent Rebuild	14–21 days	No (provider-locked)	~0%	Text-only (hallucination-prone)	~2,000+
Thin Harness (Atomic SDK)	3 days	Yes (Claude/Copilot/opencode)	~65–70%	Visual grounding + structured critique	~500 per provider

Key Findings:

Orchestration over Reconstruction: Plain TypeScript orchestration (Promise.all, early-exit loops) outperforms custom DSLs in flexibility and debuggability.
Model Routing as a Knob: Pinning headless analyzers to Sonnet while reserving Opus for visible/refinement stages yields significant cost savings without degrading output quality.
Visual Grounding Closes the Loop: Pairing creative passes with Playwright screenshot validation eliminates text-only hallucination drift in refinement cycles.
Prompt Engineering > Model Swapping: Adjusting prompt functions (buildDesignLocatorPrompt, etc.) has a higher impact on output taste and framework alignment than swapping model capacity.

Core Solution

The pipeline consists of five deterministic phases:

Design System Onboarding: Parallel headless fan-out (codebase locator, analyzer, pattern finder) → HIL approval
Import: URL/file/codebase capture (headless)
Generation: First design version (visible)
Refinement Loop: ≤5 iterations with HIL + parallel critique + screenshot validation
Export + Handoff: Claude Code / Copilot CLI / opencode bundle

Headless stages run on Sonnet with bypassPermissions for cost/speed (Claude provider only; Copilot/opencode inherit orchestrator model). Visible stages inherit the orchestrator model (Opus) and surface to the user. The refinement loop implements a bounded HIL cycle with early exit on signal phrases ("approved", "ship it", "done").

The architecture relies on ctx.stage as a thin wrapper around s.session.query, calling into native agent capabilities rather than reimplementing them. Model selection and prompt instructions act as independent tuning knobs:

// Layer 1: three headless agents analyze the codebase in parallel
const [locator, analyzer, patterns] = await Promise.all([
  ctx.stage(
    { name: "ds-locator", headless: true },
    {}, {},
    async (s) => s.session.query(
      buildDesignLocatorPrompt({ root }),
      { agent: "codebase-locator", ...HEADLESS_OPTS },
    ),
  ),
  ctx.stage(
    { name: "ds-analyzer", headless: true },
    {}, {},
    async (s) => s.session.query(
      buildDesignAnalyzerPrompt({ root }),
      { agent: "codebase-analyzer", ...HEADLESS_OPTS },
    ),
  ),
  ctx.stage(
    { name: "ds-patterns", headless: true },
    {}, {},
    async (s) => s.session.query(
      buildDesignPatternPrompt({ root }),
      { agent: "codebase-pattern-finder", ...HEADLESS_OPTS },
    ),
  ),
]);

// Layer 2: visible agent reviews the findings with the user
await ctx.stage(
  { name: "design-system-builder" },
  {}, {},
  async (s) => s.session.query(
    buildDesignSystemBuilderPrompt({
      root,
      locatorOutput: locator.result,
      analyzerOutput: analyzer.result,
      patternsOutput: patterns.result,
    }),
  ),
);

Enter fullscreen mode Exit fullscreen mode

Architecture Decisions:

No DSL/YAML: Orchestration uses native TypeScript control flow. ctx.stage manages session lifecycle; early breaks on signal phrases prevent infinite loops.
Minimum Toolset per Stage: Headless analyzers get bypassPermissions; visible stages inherit Opus; refinement loops use AskUserQuestion. Each stage sees only required capabilities.
Cross-Agent Porting: Since the only abstraction over the agent is s.session.query(...), porting is mechanical. The Copilot CLI and opencode providers reuse the same five-phase topology, handling only provider-specific message formats (SessionEvent[]).

CLI Surface: Identical invocation across providers:

atomic workflow -n open-claude-design -a claude --prompt "Landing page for a dev tool"
atomic workflow -n open-claude-design -a copilot --prompt "Landing page for a dev tool"
atomic workflow -n open-claude-design -a opencode --prompt "Landing page for a dev tool"

Pitfall Guide

Rebuilding the Agent Instead of Harnessing It: Do not reimplement tool loops, permission models, or sub-agent dispatch. Use s.session.query to delegate to native agent capabilities. Rebuilding creates tight coupling and maintenance overhead.
Hardcoding Headless Model Selection: Treat model routing as a configurable constant (HEADLESS_OPTS), not a fixed implementation detail. Swap between Sonnet and Opus based on codebase complexity without refactoring orchestration logic.
Relying on Text-Only Refinement Loops: Without visual grounding, critique sub-agents hallucinate past rendering errors. Always pair creative passes with screenshot capture (e.g., Playwright CLI) so the agent inspects actual output, not predicted output.
Over-Abstracting Orchestration with DSLs/YAML: Plain TypeScript (Promise.all, for loops, early break) is more transparent and debuggable than graph declarations. Custom orchestration DSLs obscure control flow and hinder cross-provider porting.
Prioritizing Model Swaps Over Prompt Tuning: Swapping models only changes compute capacity. Adjusting prompt functions (buildDesignLocatorPrompt, critique templates, etc.) dials in taste, framework conventions, and output structure. Always tune prompts first; change models only when capacity is the bottleneck.
Missing Early-Exit Signal Handling: Bounded refinement loops require explicit completion phrases ("approved", "ship it", "done"). Without early-exit logic, loops run to maximum iterations, wasting tokens and delaying handoff.
Ignoring Provider-Specific Model Knobs: The Claude Agent SDK allows per-stage model pinning; Copilot CLI and opencode do not. Failing to account for this difference causes headless stages to inherit expensive orchestrator models unnecessarily. Implement provider-aware fallbacks.

Deliverables

📘 Workflow Architecture Blueprint: Complete phase topology, HIL routing logic, and cross-agent porting strategy (sourced from src/sdk/workflows/builtin/open-claude-design)
✅ Pipeline Validation Checklist: Phase-by-phase verification steps, signal phrase coverage, screenshot grounding validation, and early-exit condition testing
⚙️ Configuration Templates:
- HEADLESS_OPTS variants (Sonnet pinning, Opus fallback, permission bypass toggles)
- ctx.stage scaffolding for parallel fan-out and bounded refinement loops
- Prompt function templates (buildDesignLocatorPrompt, buildDesignAnalyzerPrompt, critique/refinement prompts)
- Playwright screenshot integration config for visual grounding validation