Open Claude Design: A Weekend Harness Built on Atomic
Open Claude Design: A Weekend Harness Built on Atomic
Current Situation Analysis
Chat-based AI design tools like Anthropic's Claude Design present a conversational UX that masks a highly deterministic, multi-stage pipeline. Organizations attempting to replicate this capability typically fall into a critical failure mode: monolithic agent reconstruction. Instead of treating existing coding agents as composable primitives, teams rebuild tool loops, permission models, and sub-agent dispatch systems from scratch. This approach introduces massive technical debt, tight provider coupling, and poor cross-agent compatibility.
Traditional methods fail because they:
- Treat the agent framework as the product rather than a harnessable component
- Rely on heavy DSLs or YAML graph declarations that obscure orchestration logic
- Lack deterministic control over refinement loops, leading to unbounded compute spend
- Use text-only critique mechanisms that cause agents to hallucinate past visual mistakes instead of correcting them
- Fail to separate headless (cost/speed-optimized) stages from visible (quality/interaction-optimized) stages, resulting in inefficient model routing
WOW Moment: Key Findings
| Approach | Setup Time | Cross-Agent Compatibility | Headless Cost Reduction | Refinement Accuracy | Orchestration LoC |
|---|---|---|---|---|---|
| Monolithic Agent Rebuild | 14β21 days | No (provider-locked) | ~0% | Text-only (hallucination-prone) | ~2,000+ |
| Thin Harness (Atomic SDK) | 3 days | Yes (Claude/Copilot/opencode) | ~65β70% | Visual grounding + structured critique | ~500 per provider |
Key Findings:
- Orchestration over Reconstruction: Plain TypeScript orchestration (
Promise.all, early-exit loops) outperforms custom DSLs in flexibility and debuggability. - Model Routing as a Knob: Pinning headless analyzers to Sonnet while reserving Opus for visible/refinement stages yields significant cost savings without degrading output quality.
- Visual Grounding Closes the Loop: Pairing creative passes with Playwright screenshot validation eliminates text-only hallucination drift in refinement cycles.
- Prompt Engineering > Model Swapping: Adjusting prompt functions (
buildDesignLocatorPrompt, etc.) has a higher impact on output taste and framework alignment than swapping model capacity.
Core Solution
The pipeline consists of five deterministic phases:
- Design System Onboarding: Parallel headless fan-out (codebase locator, analyzer, pattern finder) β HIL approval
- Import: URL/file/codebase capture (headless)
- Generation: First design version (visible)
- Refinement Loop: β€5 iterations with HIL + parallel critique + screenshot validation
- Export + Handoff: Claude Code / Copilot CLI / opencode bundle
Headless stages run on Sonnet with bypassPermissions for cost/speed (Claude provider only; Copilot/opencode inherit orchestrator model). Visible stages inherit the orchestrator model (Opus) and surface to the user. The refinement loop implements a bounded HIL cycle with early exit on signal phrases ("approved", "ship it", "done").
The architecture relies on ctx.stage as a thin wrapper around s.session.query, calling into native agent capabilities rather than reimplementing them. Model selection and prompt instructions act as independent tuning knobs:
// Layer 1: three headless agents analyze the codebase in parallel
const [locator, analyzer, patterns] = await Promise.all([
ctx.stage(
{ name: "ds-locator", headless: true },
{}, {},
async (s) => s.session.query(
buildDesignLocatorPrompt({ root }),
{ agent: "codebase-locator", ...HEADLESS_OPTS },
),
),
ctx.stage(
{ name: "ds-analyzer", headless: true },
{}, {},
async (s) => s.session.query(
buildDesignAnalyzerPrompt({ root }),
{ agent: "codebase-analyzer", ...HEADLESS_OPTS },
),
),
ctx.stage(
{ name: "ds-patterns", headless: true },
{}, {},
async (s) => s.session.query(
buildDesignPatternPrompt({ root }),
{ agent: "codebase-pattern-finder", ...HEADLESS_OPTS },
),
),
]);
// Layer 2: visible agent reviews the findings with the user
await ctx.stage(
{ name: "design-system-builder" },
{}, {},
async (s) => s.session.query(
buildDesignSystemBuilderPrompt({
root,
locatorOutput: locator.result,
analyzerOutput: analyzer.result,
patternsOutput: patterns.result,
}),
),
);
Enter fullscreen mode Exit fullscreen mode
Architecture Decisions:
- No DSL/YAML: Orchestration uses native TypeScript control flow.
ctx.stagemanages session lifecycle; early breaks on signal phrases prevent infinite loops. - Minimum Toolset per Stage: Headless analyzers get
bypassPermissions; visible stages inherit Opus; refinement loops useAskUserQuestion. Each stage sees only required capabilities. - Cross-Agent Porting: Since the only abstraction over the agent is
s.session.query(...), porting is mechanical. The Copilot CLI and opencode providers reuse the same five-phase topology, handling only provider-specific message formats (SessionEvent[]). - CLI Surface: Identical invocation across providers:
atomic workflow -n open-claude-design -a claude --prompt "Landing page for a dev tool" atomic workflow -n open-claude-design -a copilot --prompt "Landing page for a dev tool" atomic workflow -n open-claude-design -a opencode --prompt "Landing page for a dev tool"
Pitfall Guide
- Rebuilding the Agent Instead of Harnessing It: Do not reimplement tool loops, permission models, or sub-agent dispatch. Use
s.session.queryto delegate to native agent capabilities. Rebuilding creates tight coupling and maintenance overhead. - Hardcoding Headless Model Selection: Treat model routing as a configurable constant (
HEADLESS_OPTS), not a fixed implementation detail. Swap between Sonnet and Opus based on codebase complexity without refactoring orchestration logic. - Relying on Text-Only Refinement Loops: Without visual grounding, critique sub-agents hallucinate past rendering errors. Always pair creative passes with screenshot capture (e.g., Playwright CLI) so the agent inspects actual output, not predicted output.
- Over-Abstracting Orchestration with DSLs/YAML: Plain TypeScript (
Promise.all,forloops, earlybreak) is more transparent and debuggable than graph declarations. Custom orchestration DSLs obscure control flow and hinder cross-provider porting. - Prioritizing Model Swaps Over Prompt Tuning: Swapping models only changes compute capacity. Adjusting prompt functions (
buildDesignLocatorPrompt, critique templates, etc.) dials in taste, framework conventions, and output structure. Always tune prompts first; change models only when capacity is the bottleneck. - Missing Early-Exit Signal Handling: Bounded refinement loops require explicit completion phrases (
"approved","ship it","done"). Without early-exit logic, loops run to maximum iterations, wasting tokens and delaying handoff. - Ignoring Provider-Specific Model Knobs: The Claude Agent SDK allows per-stage model pinning; Copilot CLI and opencode do not. Failing to account for this difference causes headless stages to inherit expensive orchestrator models unnecessarily. Implement provider-aware fallbacks.
Deliverables
- π Workflow Architecture Blueprint: Complete phase topology, HIL routing logic, and cross-agent porting strategy (sourced from
src/sdk/workflows/builtin/open-claude-design) - β Pipeline Validation Checklist: Phase-by-phase verification steps, signal phrase coverage, screenshot grounding validation, and early-exit condition testing
- βοΈ Configuration Templates:
HEADLESS_OPTSvariants (Sonnet pinning, Opus fallback, permission bypass toggles)ctx.stagescaffolding for parallel fan-out and bounded refinement loops- Prompt function templates (
buildDesignLocatorPrompt,buildDesignAnalyzerPrompt, critique/refinement prompts) - Playwright screenshot integration config for visual grounding validation
