← Back to Blog
AI/ML2026-05-07Β·42 min read

Open Claude Design: A Weekend Harness Built on Atomic

By Mixture of Experts

Open Claude Design: A Weekend Harness Built on Atomic

Current Situation Analysis

Chat-based AI design tools like Anthropic's Claude Design present a conversational UX that masks a highly deterministic, multi-stage pipeline. Organizations attempting to replicate this capability typically fall into a critical failure mode: monolithic agent reconstruction. Instead of treating existing coding agents as composable primitives, teams rebuild tool loops, permission models, and sub-agent dispatch systems from scratch. This approach introduces massive technical debt, tight provider coupling, and poor cross-agent compatibility.

Traditional methods fail because they:

  • Treat the agent framework as the product rather than a harnessable component
  • Rely on heavy DSLs or YAML graph declarations that obscure orchestration logic
  • Lack deterministic control over refinement loops, leading to unbounded compute spend
  • Use text-only critique mechanisms that cause agents to hallucinate past visual mistakes instead of correcting them
  • Fail to separate headless (cost/speed-optimized) stages from visible (quality/interaction-optimized) stages, resulting in inefficient model routing

WOW Moment: Key Findings

Approach Setup Time Cross-Agent Compatibility Headless Cost Reduction Refinement Accuracy Orchestration LoC
Monolithic Agent Rebuild 14–21 days No (provider-locked) ~0% Text-only (hallucination-prone) ~2,000+
Thin Harness (Atomic SDK) 3 days Yes (Claude/Copilot/opencode) ~65–70% Visual grounding + structured critique ~500 per provider

Key Findings:

  • Orchestration over Reconstruction: Plain TypeScript orchestration (Promise.all, early-exit loops) outperforms custom DSLs in flexibility and debuggability.
  • Model Routing as a Knob: Pinning headless analyzers to Sonnet while reserving Opus for visible/refinement stages yields significant cost savings without degrading output quality.
  • Visual Grounding Closes the Loop: Pairing creative passes with Playwright screenshot validation eliminates text-only hallucination drift in refinement cycles.
  • Prompt Engineering > Model Swapping: Adjusting prompt functions (buildDesignLocatorPrompt, etc.) has a higher impact on output taste and framework alignment than swapping model capacity.

Core Solution

The pipeline consists of five deterministic phases:

  1. Design System Onboarding: Parallel headless fan-out (codebase locator, analyzer, pattern finder) β†’ HIL approval
  2. Import: URL/file/codebase capture (headless)
  3. Generation: First design version (visible)
  4. Refinement Loop: ≀5 iterations with HIL + parallel critique + screenshot validation
  5. Export + Handoff: Claude Code / Copilot CLI / opencode bundle

Headless stages run on Sonnet with bypassPermissions for cost/speed (Claude provider only; Copilot/opencode inherit orchestrator model). Visible stages inherit the orchestrator model (Opus) and surface to the user. The refinement loop implements a bounded HIL cycle with early exit on signal phrases ("approved", "ship it", "done").

The architecture relies on ctx.stage as a thin wrapper around s.session.query, calling into native agent capabilities rather than reimplementing them. Model selection and prompt instructions act as independent tuning knobs:

// Layer 1: three headless agents analyze the codebase in parallel
const [locator, analyzer, patterns] = await Promise.all([
  ctx.stage(
    { name: "ds-locator", headless: true },
    {}, {},
    async (s) => s.session.query(
      buildDesignLocatorPrompt({ root }),
      { agent: "codebase-locator", ...HEADLESS_OPTS },
    ),
  ),
  ctx.stage(
    { name: "ds-analyzer", headless: true },
    {}, {},
    async (s) => s.session.query(
      buildDesignAnalyzerPrompt({ root }),
      { agent: "codebase-analyzer", ...HEADLESS_OPTS },
    ),
  ),
  ctx.stage(
    { name: "ds-patterns", headless: true },
    {}, {},
    async (s) => s.session.query(
      buildDesignPatternPrompt({ root }),
      { agent: "codebase-pattern-finder", ...HEADLESS_OPTS },
    ),
  ),
]);

// Layer 2: visible agent reviews the findings with the user
await ctx.stage(
  { name: "design-system-builder" },
  {}, {},
  async (s) => s.session.query(
    buildDesignSystemBuilderPrompt({
      root,
      locatorOutput: locator.result,
      analyzerOutput: analyzer.result,
      patternsOutput: patterns.result,
    }),
  ),
);

Enter fullscreen mode Exit fullscreen mode

Architecture Decisions:

  • No DSL/YAML: Orchestration uses native TypeScript control flow. ctx.stage manages session lifecycle; early breaks on signal phrases prevent infinite loops.
  • Minimum Toolset per Stage: Headless analyzers get bypassPermissions; visible stages inherit Opus; refinement loops use AskUserQuestion. Each stage sees only required capabilities.
  • Cross-Agent Porting: Since the only abstraction over the agent is s.session.query(...), porting is mechanical. The Copilot CLI and opencode providers reuse the same five-phase topology, handling only provider-specific message formats (SessionEvent[]).
  • CLI Surface: Identical invocation across providers:
    atomic workflow -n open-claude-design -a claude --prompt "Landing page for a dev tool"
    atomic workflow -n open-claude-design -a copilot --prompt "Landing page for a dev tool"
    atomic workflow -n open-claude-design -a opencode --prompt "Landing page for a dev tool"
    

Pitfall Guide

  1. Rebuilding the Agent Instead of Harnessing It: Do not reimplement tool loops, permission models, or sub-agent dispatch. Use s.session.query to delegate to native agent capabilities. Rebuilding creates tight coupling and maintenance overhead.
  2. Hardcoding Headless Model Selection: Treat model routing as a configurable constant (HEADLESS_OPTS), not a fixed implementation detail. Swap between Sonnet and Opus based on codebase complexity without refactoring orchestration logic.
  3. Relying on Text-Only Refinement Loops: Without visual grounding, critique sub-agents hallucinate past rendering errors. Always pair creative passes with screenshot capture (e.g., Playwright CLI) so the agent inspects actual output, not predicted output.
  4. Over-Abstracting Orchestration with DSLs/YAML: Plain TypeScript (Promise.all, for loops, early break) is more transparent and debuggable than graph declarations. Custom orchestration DSLs obscure control flow and hinder cross-provider porting.
  5. Prioritizing Model Swaps Over Prompt Tuning: Swapping models only changes compute capacity. Adjusting prompt functions (buildDesignLocatorPrompt, critique templates, etc.) dials in taste, framework conventions, and output structure. Always tune prompts first; change models only when capacity is the bottleneck.
  6. Missing Early-Exit Signal Handling: Bounded refinement loops require explicit completion phrases ("approved", "ship it", "done"). Without early-exit logic, loops run to maximum iterations, wasting tokens and delaying handoff.
  7. Ignoring Provider-Specific Model Knobs: The Claude Agent SDK allows per-stage model pinning; Copilot CLI and opencode do not. Failing to account for this difference causes headless stages to inherit expensive orchestrator models unnecessarily. Implement provider-aware fallbacks.

Deliverables

  • πŸ“˜ Workflow Architecture Blueprint: Complete phase topology, HIL routing logic, and cross-agent porting strategy (sourced from src/sdk/workflows/builtin/open-claude-design)
  • βœ… Pipeline Validation Checklist: Phase-by-phase verification steps, signal phrase coverage, screenshot grounding validation, and early-exit condition testing
  • βš™οΈ Configuration Templates:
    • HEADLESS_OPTS variants (Sonnet pinning, Opus fallback, permission bypass toggles)
    • ctx.stage scaffolding for parallel fan-out and bounded refinement loops
    • Prompt function templates (buildDesignLocatorPrompt, buildDesignAnalyzerPrompt, critique/refinement prompts)
    • Playwright screenshot integration config for visual grounding validation