ypical scopes include correctness, security exposure, test coverage, and performance. Isolation prevents attention bleed and forces adversarial reasoning.
Step 2: Build the Dispatch Layer
Agents run in parallel. The orchestrator clones the diff context, injects scope-specific instructions, and spawns independent Claude Code sessions. Parallel execution minimizes wall-clock latency while maintaining strict prompt boundaries.
Step 3: Implement Consolidation
Raw agent outputs are noisy. A consolidation layer deduplicates findings, weights severity, resolves contradictions, and formats a single comment thread. This transforms fragmented reports into an actionable checklist.
Architecture Implementation (TypeScript)
import { spawn } from 'child_process';
import { readFileSync } from 'fs';
interface AgentConfig {
id: string;
scope: 'correctness' | 'security' | 'performance' | 'testing';
systemPrompt: string;
maxTokens: number;
}
interface ReviewFinding {
agentId: string;
file: string;
line: number;
severity: 'critical' | 'high' | 'medium' | 'low';
description: string;
suggestion?: string;
}
class ReviewOrchestrator {
private agents: AgentConfig[];
private diffPath: string;
constructor(agents: AgentConfig[], diffPath: string) {
this.agents = agents;
this.diffPath = diffPath;
}
async dispatch(): Promise<ReviewFinding[]> {
const diffContent = readFileSync(this.diffPath, 'utf-8');
const promises = this.agents.map(agent => this.runAgent(agent, diffContent));
const rawResults = await Promise.all(promises);
return this.consolidate(rawResults.flat());
}
private runAgent(agent: AgentConfig, diff: string): Promise<ReviewFinding[]> {
return new Promise((resolve, reject) => {
const claudeProcess = spawn('claude', [
'--non-interactive',
'--max-tokens', agent.maxTokens.toString(),
'--system-prompt', agent.systemPrompt,
'--input', diff
]);
let output = '';
claudeProcess.stdout.on('data', chunk => output += chunk.toString());
claudeProcess.on('close', code => {
if (code === 0) resolve(this.parseAgentOutput(agent.id, output));
else reject(new Error(`Agent ${agent.id} failed with exit code ${code}`));
});
});
}
private parseAgentOutput(agentId: string, raw: string): ReviewFinding[] {
// Structured JSON extraction from agent stdout
try {
const parsed = JSON.parse(raw);
return parsed.findings.map((f: any) => ({ ...f, agentId }));
} catch {
return [];
}
}
private consolidate(findings: ReviewFinding[]): ReviewFinding[] {
const dedupMap = new Map<string, ReviewFinding>();
findings.forEach(f => {
const key = `${f.file}:${f.line}:${f.severity}`;
const existing = dedupMap.get(key);
if (!existing || f.severity === 'critical') {
dedupMap.set(key, f);
}
});
return Array.from(dedupMap.values())
.sort((a, b) => {
const severityOrder = { critical: 0, high: 1, medium: 2, low: 3 };
return severityOrder[a.severity] - severityOrder[b.severity];
});
}
}
Architecture Rationale
- Parallel Dispatch: Sequential agent execution compounds latency. Parallel spawning keeps wall-clock time proportional to the slowest agent, not the sum.
- CLI/Runtime Binding: Running through Claude Code preserves local context. Agents can execute
grep, read package.json, or query MCP tools for dependency graphs. Direct API calls lose this environmental awareness.
- Consolidation Layer: Raw LLM output is unstructured. Deduplication by file/line/severity prevents comment spam. Severity weighting ensures critical findings surface first.
- Scope Isolation: Narrow system prompts force the model into adversarial or analytical modes. A security agent instructed to "enumerate attack vectors" behaves differently than a general reviewer told to "check for issues."
Pitfall Guide
1. Prompt Contamination
Explanation: Agents bleed into each other's scope when system prompts overlap or context windows share unfiltered diff data. A performance agent starts commenting on security flaws, diluting its focus.
Fix: Enforce strict system prompt boundaries. Pass only the diff subset relevant to each scope. Use JSON schema validation on agent output to reject out-of-scope findings.
2. Unbounded Token Spend
Explanation: Running five agents on a 50-line PR burns tokens on ceremony. Cost scales with agent count, not diff size. Without gating, budgets explode on trivial changes.
Fix: Implement size-based routing. Use git diff --stat to calculate changed lines. Route PRs under 150 lines to a single fast model. Trigger multi-agent orchestration only above threshold.
3. Context Cache Neglect
Explanation: Each agent re-reads repository context from scratch. Without prompt caching, identical baseline code is tokenized repeatedly, inflating costs by 3-4x.
Fix: Enable Claude Code's prompt caching for repository roots. Pre-warm context with README.md, architecture docs, and shared type definitions. Cache hits drop per-agent costs significantly.
4. Verdict Automation
Explanation: Teams treat consolidated output as a final approval gate. AI lacks product context, business logic awareness, and architectural intent. Automating merges based on AI review guarantees production incidents.
Fix: Design output as a mechanical checklist. Require human sign-off for architecture, naming, and business alignment. AI surfaces; humans decide.
5. Shallow Adversarial Scoping
Explanation: Security agents default to regex patterns (eval, unsafe-inline). They miss logic flaws, IDOR vulnerabilities, or race conditions because prompts lack threat modeling structure.
Fix: Structure security prompts around attack vectors: input validation, privilege escalation, data leakage, state manipulation. Force enumeration before conclusion.
6. Merge Logic Conflicts
Explanation: Two agents flag the same line with contradictory suggestions. Without resolution logic, developers receive noise instead of direction.
Fix: Implement conflict resolution in the consolidation layer. Prioritize critical severity. If severity matches, prefer the agent with higher historical accuracy. Log contradictions for human review.
7. CI/CD Bottlenecks
Explanation: Multi-agent review blocks PR merges while waiting for all agents to finish. Slow agents or API rate limits stall pipelines.
Fix: Decouple review from merge gates. Post findings as PR comments asynchronously. Use status checks only for critical security agents. Allow developers to address findings without blocking CI.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Trivial PR (<50 lines) | Single fast model or manual merge | Orchestration overhead exceeds value | Minimal |
| Medium refactor (150-400 lines) | Multi-agent with correctness + testing scopes | Catches regression patterns without full suite | Moderate |
| Large feature (>500 lines) | Full multi-agent orchestration | Attention dilution makes single-pass unreliable | High |
| Security-critical module | Dedicated security agent + human audit | AI misses business-logic flaws; human verifies intent | High + human time |
| Dependency bump | Automated lint + single model | Structural analysis unnecessary for version updates | Low |
Configuration Template
review_orchestrator:
size_threshold: 200
cache_enabled: true
cache_ttl_minutes: 30
token_budget_per_agent: 8000
parallel_limit: 4
agents:
- id: correctness
scope: correctness
system_prompt: |
Analyze the diff for logical errors, type mismatches, and control flow issues.
Focus on edge cases, null propagation, and state mutations.
Return findings as JSON with file, line, severity, description, suggestion.
max_tokens: 6000
- id: security
scope: security
system_prompt: |
Evaluate the diff for injection vectors, privilege escalation, data exposure,
and insecure deserialization. Enumerate attack paths before concluding.
Return findings as JSON with file, line, severity, description, suggestion.
max_tokens: 7000
- id: performance
scope: performance
system_prompt: |
Identify N+1 queries, unnecessary allocations, synchronous blocking calls,
and algorithmic inefficiencies. Suggest concrete optimizations.
Return findings as JSON with file, line, severity, description, suggestion.
max_tokens: 5000
consolidation:
dedup_strategy: file_line_severity
severity_priority: [critical, high, medium, low]
conflict_resolution: prefer_critical_or_log
output_format: markdown_checklist
Quick Start Guide
- Install the runtime: Ensure Claude Code CLI is installed and authenticated. Verify MCP servers and local tooling are accessible.
- Configure scopes: Copy the configuration template. Adjust
size_threshold, token_budget_per_agent, and system prompts to match your codebase conventions.
- Initialize the orchestrator: Run
node review-orchestrator.js --diff pr-142.diff --config review-config.yaml. The script spawns parallel agents, waits for completion, and outputs a consolidated markdown checklist.
- Integrate with CI: Add a GitHub Action or GitLab CI step that triggers the orchestrator on PR creation. Configure it to post findings as comments rather than blocking status checks.
- Validate and iterate: Review the first 10 runs. Check token logs, verify deduplication accuracy, and refine system prompts based on false positive patterns. Adjust thresholds as your team adopts the workflow.