arch notes"),
("edit", "Review for accuracy, clarity, and completeness"),
]
context = ""
for step_name, prompt in pipeline_steps:
result = query(f"{prompt}\n\nContext:\n{context}")
context += f"\n## {step_name} output:\n{result.final_message}"
# Validation gate: reject and retry if output is empty or incoherent
if not result.final_message or len(result.final_message) < 50:
raise ValueError(f"Pipeline step '{step_name}' produced insufficient output")
Enter fullscreen mode Exit fullscreen mode
**Why it works:** Predictable cost, easy to eval each step independently, low latency overhead. Perfect for tasks that naturally decompose into linear stages.
**Where it breaks:** No parallelism. If any step is slow, the whole pipeline is slow. If any step fails, the whole pipeline stalls. Mitigation: add retry logic and timeout circuit breakers at each stage.
### [](#3-fanout-parallel-specialists)3\. Fan-Out / Parallel Specialists
Multiple agents work on the same task simultaneously, then results are merged.
Fan-out pattern β parallel execution with aggregation
import asyncio
from claude_agent_sdk import Agent, query
security_scanner = Agent(
name="security-scanner",
description="Security vulnerability analysis",
)
style_checker = Agent(
name="style-checker",
description="Code style and best practices review",
)
test_coverage = Agent(
name="test-coverage",
description="Test coverage analysis and gap identification",
)
Run all three in parallel
results = await asyncio.gather(
query("Scan for security vulnerabilities in this codebase", agents=[security_scanner]),
query("Review code style and best practices", agents=[style_checker]),
query("Analyze test coverage gaps", agents=[test_coverage]),
)
Merge results β each subagent returns only its final message
Parent context sees 3 summaries, not 3Γ full conversation histories
merged_report = f"""
Security: {results[0].final_message}
Style: {results[1].final_message}
Coverage: {results[2].final_message}
"""
Enter fullscreen mode Exit fullscreen mode
**Why it works:** Dramatic speed improvement for tasks where independent perspectives add value. Code review with multiple lenses is genuinely better.
**Where it breaks:** Cost scales linearly with agent count. Debate patterns run ~2.5Γ single-model cost. Mitigation: only fan out when the task genuinely benefits from multiple perspectives.
### [](#4-debate-negotiator)4\. Debate / Negotiator
Two agents negotiate until they agree. Proposer + critic. Buyer + seller. The smallest useful "multi-agent" pattern.
**Why it works:** Forces reasoning depth without the cost explosion of larger swarms. Two heads genuinely better than one, with manageable coordination overhead.
**Where it breaks:** Can loop forever if neither agent concedes. Mitigation: set a maximum round count and force a resolution strategy.
### [](#5-swarm-largescale-parallel)5\. Swarm (Large-Scale Parallel)
Kimi K2.6 runs 300-agent swarms for complex research. Each agent works independently on a subtask, coordinating through shared state or message bus.
**Why it works:** Unmatched throughput for massive parallel tasks like comprehensive research reviews or large-scale data processing.
**Where it breaks:** Debugging nightmare. When 300 agents run simultaneously, tracing which agent introduced an error is like finding a needle in a haystack of needles. Cost is astronomical β only viable when the task value justifies it.
## [](#the-cascade-problem-why-agents-break-at-scale)The Cascade Problem: Why Agents Break at Scale
An inventory management agent hallucinated a SKU. The item didn't exist. The agent returned it as verified stock with a price, quantity, and warehouse location. That output passed to three downstream agents. Each treated it as legitimate data. Within two hours, the hallucinated item appeared in purchase orders, shipping manifests, and customer-facing inventory pages.
This is the cascade problem (identified by Tian Pan's research): it's not a model failure or a prompt failure β it's a systems failure that unit tests structurally cannot catch, because unit tests execute in isolation by design.
The question testing asks is: "does this agent produce correct output given this input?" The question production asks is: "what happens when 100 copies of this agent run simultaneously against the same database, filesystem, and external APIs?"
These are different questions. The gap between them is where cascades live.
**Three cascade mechanisms to guard against:**
1. **TOCTOU races**: Two agents read the same "next unprocessed item" before either marks it done β the same task gets processed twice.
2. **Retry amplification**: An agent fails, retries, the retry fails, the failure handler spawns three more attempts β a single transient error becomes nine requests.
3. **Shared state corruption**: Two agents updating the same config file β last writer wins, changes silently lost.
## [](#subagents-when-isolation-is-the-feature)Subagents: When Isolation Is the Feature
The Claude Agent SDK's subagent system addresses the cascade problem directly through context isolation. Each subagent runs in its own fresh conversation β intermediate tool calls stay inside the subagent, and only the final message returns to the parent.
From the official docs: "A research subagent can read 40 files, evaluate them, and return a 200-word summary. The parent never sees the 40 files."
This isn't just about token efficiency. It's about blast radius containment. When a subagent goes wrong β hallucinates, loops, or produces garbage β the damage is contained to that subagent's context window. The parent sees only the final output, which it can validate before passing downstream.
**Key rule**: spawn a subagent when the task involves more information than the parent needs to remember. Handle inline when the task is short and the parent will reference the output repeatedly.
Anthropic's own multi-agent research system beat single-agent Claude Opus 4 by 90.2% on their internal research eval β but at roughly 15x the token cost. Subagents are not free. They are a quality lever you pull when the task value justifies the spend.
## [](#production-checklist-what-actually-keeps-agents-alive)Production Checklist: What Actually Keeps Agents Alive
Based on production deployments and failure analysis:
**Before deployment:**
- \[ \] Each agent has a clear input/output contract (JSON schema validation)
- \[ \] Timeout circuit breakers on every agent call
- \[ \] Retry logic with exponential backoff, not infinite loops
- \[ \] Idempotency keys on all state-mutating operations
- \[ \] Health checks that verify the agent can complete a simple task
**During operation:**
- \[ \] Structured logging of every agent's final output (not just errors)
- \[ \] Cost monitoring per agent, with alerts at 2x baseline
- \[ \] Deduplication on shared state writes
- \[ \] Circuit breakers that fail fast when downstream services degrade
**When things break:**
- \[ \] Fallback to a simpler agent or human review, not infinite retry
- \[ \] Rollback mechanism for state mutations
- \[ \] Alerting on cascade indicators (retry rate > baseline, duplicate outputs)
## [](#the-decision-tree)The Decision Tree
Choosing an orchestration pattern isn't a style preference β it determines your cost structure, failure surface, and which frameworks support what you need:
Is the task naturally sequential?
βββ Yes β Pipeline
βββ No
βββ Does the task benefit from multiple perspectives?
β βββ Yes, 2 perspectives β Debate/Negotiator
β βββ Yes, 3+ perspectives β Fan-Out
βββ No
βββ Is the task decomposable into clear subtasks?
β βββ Yes β Supervisor + Specialists
β βββ No β Single agent (don't over-engineer)
βββ Massive scale (100+ agents)? β Swarm
Enter fullscreen mode Exit fullscreen mode
**Default choice**: Supervisor + Specialists. It's the 96.3% success rate pattern for a reason. Start here. Add complexity only when the task demands it and the data supports it.
* * *
## [](#sources)Sources
- Abemon, "AI Agent Orchestration: 96% Success Rate with Supervisor Pattern" (2026)
- Balys Kriksciunas, "Multi-Agent Orchestration Infrastructure: Lessons from Production" (TURION.AI, 2026)
- Ranjan Kumar, "Multi-Agent Pipeline Orchestration and Failure Propagation: Designing for Blast Radius" (2026)
- Tian Pan, "The Cascade Problem: Why Agent Side Effects Explode at Scale" (2026)
- Digital Applied, "Multi-Agent Orchestration: 5 Patterns That Work in 2026" (2026)
- Anthropic, Claude Agent SDK Subagents Documentation (2026)
- Growth Engineer, "How to Build Subagents with the Claude Agent SDK" (2026)