familiarity, risk).
2. Topology Selector: Maps task shape to a topology (Solo, Hierarchical, Parallel-Specialized).
3. Harness Pruner: Constructs the execution environment by removing irrelevant tools and context.
4. Model Tier Router: Assigns sub-tasks to models based on difficulty and cost constraints.
5. Verification Gates: Inserts deterministic checks between phases to prevent drift.
Implementation (TypeScript)
The following implementation demonstrates a router that enforces hierarchical decomposition for macro tasks, prunes the harness to reduce noise, and applies model tiering.
// Core Types
type TaskScope = 'micro' | 'macro';
type Parallelism = 'sequential' | 'concurrent';
type Familiarity = 'known' | 'unknown';
type RiskLevel = 'low' | 'high';
interface TaskProfile {
scope: TaskScope;
parallelism: Parallelism;
familiarity: Familiarity;
risk: RiskLevel;
description: string;
}
type AgentTopology = 'solo' | 'hierarchical' | 'parallel-specialized';
type ModelTier = 'economy' | 'workhorse' | 'frontier';
interface HarnessConfig {
tools: string[];
contextLimit: number;
deterministicChecks: boolean;
}
// 1. Topology Selection Logic
function selectTopology(profile: TaskProfile): AgentTopology {
// Experts on familiar code should avoid heavy AI to prevent review traps
if (profile.familiarity === 'known' && profile.scope === 'micro') {
return 'solo';
}
// Parallelism requires specialized agents to avoid context collision
if (profile.parallelism === 'concurrent') {
return 'parallel-specialized';
}
// Macro tasks need hierarchy to prevent flat-plan drift
if (profile.scope === 'macro') {
return 'hierarchical';
}
return 'solo';
}
// 2. Harness Pruning Strategy
// Reduces noise by removing tools irrelevant to the task shape
function buildHarness(topology: AgentTopology, profile: TaskProfile): HarnessConfig {
const baseTools = ['file_read', 'file_write', 'terminal_exec', 'search', 'test_runner'];
// Economy tier tasks don't need complex search or test tools
const isEconomy = profile.familiarity === 'known' && profile.risk === 'low';
let tools = baseTools;
if (isEconomy) {
tools = tools.filter(t => !['search', 'test_runner'].includes(t));
}
// Parallel topologies require strict file locking tools
if (topology === 'parallel-specialized') {
tools = [...tools, 'file_lock', 'diff_merge'];
}
return {
tools,
contextLimit: topology === 'hierarchical' ? 4096 : 8192, // Hierarchy allows smaller context windows
deterministicChecks: true, // Always enforce checks in production
};
}
// 3. Model Tier Routing
// Routes based on ambiguity and risk, not just task type
function routeToModelTier(taskType: string, profile: TaskProfile): ModelTier {
const ambiguityKeywords = ['architecture', 'refactor', 'security', 'debug', 'ambiguous'];
const isAmbiguous = ambiguityKeywords.some(k => taskType.toLowerCase().includes(k));
if (profile.risk === 'high' || isAmbiguous) {
return 'frontier';
}
if (['boilerplate', 'formatting', 'docs', 'search'].includes(taskType)) {
return 'economy';
}
return 'workhorse';
}
// 4. Hierarchical Decomposition Generator
// Enforces Goal -> Milestone -> Interface -> File -> Gate structure
function generateHierarchy(goal: string): Record<string, any> {
return {
goal,
milestones: [
{
id: 'm1',
description: 'Define interfaces and contracts',
verification: 'Interface signature review',
subtasks: [
{ type: 'spec', tier: 'frontier' },
{ type: 'review', tier: 'workhorse' }
]
},
{
id: 'm2',
description: 'Implement core logic',
verification: 'Unit test pass rate > 90%',
subtasks: [
{ type: 'implementation', tier: 'workhorse' },
{ type: 'test_generation', tier: 'workhorse' }
]
}
],
gates: ['spec_approved', 'tests_passing', 'diff_reviewed']
};
}
// Orchestrator Example
async function executeTask(task: TaskProfile) {
const topology = selectTopology(task);
const harness = buildHarness(topology, task);
console.log(`Selected Topology: ${topology}`);
console.log(`Harness Tools: ${harness.tools.join(', ')}`);
if (topology === 'hierarchical') {
const plan = generateHierarchy(task.description);
// Execute milestones sequentially with verification gates
for (const milestone of plan.milestones) {
const tier = routeToModelTier(milestone.description, task);
console.log(`Executing ${milestone.id} with ${tier} model`);
// Run subtasks, then verify
// if (!verify(milestone.verification)) throw new Error('Gate failed');
}
}
}
Architecture Decisions
- Hierarchical Decomposition: Flat plans fail on long-horizon tasks because the agent cannot maintain coherence across a large sequence of steps. The hierarchy limits compounding error by breaking work into milestones with explicit interfaces. Context-grounding hooks before each stage and validation hooks after improve judged quality and pass rates, as evidenced by Spec Kit Agents.
- Harness Pruning: Adding tools and context does not improve performance; it increases noise. The
buildHarness function removes tools like search and test_runner for economy tasks where they are unnecessary. This aligns with findings that harness design can cause 6x variance. Orchestration logic should be deterministic outside the model where possible.
- Model Tiering: Token consumption varies wildly, and higher spend does not guarantee accuracy. The router assigns
economy models to boilerplate, formatting, and search. Workhorse models handle standard implementation. Frontier models are reserved for ambiguity, security, and final review. This measured escalation controls costs while preserving quality where it matters.
- Verification Gates: Specs define intent; tests define success. The hierarchy includes verification steps (e.g., interface review, test pass rates) between milestones. This ensures that drift is detected early. A spec alone does not make agents brilliant; it makes failures visible earlier.
Pitfall Guide
1. Flat Plan Collapse
- Explanation: Attempting to execute a long-horizon task with a single flat list of steps causes the agent to lose context and drift. Compounding errors accumulate, leading to incoherent output.
- Fix: Enforce hierarchical decomposition. Structure work as Goal β Milestones β Interfaces β File-level tasks β Verification gates. Use context-grounding hooks before each stage.
2. The Expert Review Trap
- Explanation: Experienced developers working on familiar codebases can be slowed down by AI. The METR study showed a 19% increase in completion time because the cost of reviewing and correcting plausible AI output exceeded generation savings.
- Fix: For experts on known code, scope AI to narrow support tasks: search, test scaffolds, migration drafts, and review checklists. Avoid full-code generation for precise edits in familiar systems.
3. Harness Bloat
- Explanation: Providing agents with excessive tools, irrelevant context, or complex orchestration logic degrades performance. The model may hallucinate tool usage or become confused by noise.
- Fix: Audit the harness rigorously. Remove tools the agent does not need. Keep irrelevant context out of the prompt. Place orchestration logic where the model can understand it, or move it to deterministic code.
4. Parallel Context Collision
- Explanation: Fan-out parallel agents when tasks share mutable context leads to conflicts. Agents may overwrite each other's changes or operate on stale state.
- Fix: Only use parallel agents when tasks are genuinely independent. Implement explicit handoffs with changed files, task intent, and known risks. Use file locking or read-only sharing mechanisms.
5. Frontier Model Default
- Explanation: Routing every step to the most expensive model inflates costs without improving accuracy. Agentic tasks can consume far more tokens than simple chat, and higher token spend does not reliably correlate with higher accuracy.
- Fix: Implement model tiering. Route easy steps (search, boilerplate, formatting) to cheaper models. Reserve frontier models for ambiguity, architecture, security, and review.
6. Spec-Test Misalignment
- Explanation: Writing a spec that defines constraints but no tests, or tests that do not reflect the spec, leads to drift. Specs prevent hallucination of APIs and ignore repo conventions; tests encode executable signals of success.
- Fix: Pair specs with tests. The spec should name behavior, constraints, non-goals, and acceptance criteria. Tests should prove the change works. Use a final review pass to read the diff against the original spec.
7. Review Cost Blindness
- Explanation: Focusing solely on generation speed while ignoring the cost of review. AI can produce plausible code that requires significant cleanup, especially in production environments.
- Fix: Include deterministic checks between phases. Use specialized reviewer agents to catch different bug classes before human review. Assess the review cost before enabling AI for a task.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Solo Dev, Small Project | Single Strong Agent | Lower overhead, faster iteration for sequential work. | Low |
| Expert on Familiar Code | Minimal AI / Scaffolds | Avoids review trap; AI can slow down experts by 19%. | Low |
| Large Feature, Parallelizable | Planner + Parallel Implementers | Real parallelism with scoped context; reduces latency. | Medium |
| Production Code Review | Specialized Reviewer Agents | Fresh passes catch different bug classes; improves F1. | Medium |
| Long-Horizon Project | Hierarchical Decomposition | Prevents flat-plan drift; limits compounding error. | Medium |
| Security-Sensitive Change | Frontier Model + Reviewer | Handles ambiguity and risk; ensures correctness. | High |
| Cost-Sensitive Pipeline | Model Tiering | Spends frontier tokens only where needed; controls variance. | Low/Medium |
Configuration Template
Use this YAML template to configure a task-shape router for your agentic pipeline. Adjust tiers and tools based on your specific model availability and cost constraints.
task_router:
topologies:
solo:
max_context: 8192
tools: [file_read, file_write, terminal_exec]
model_tier: workhorse
hierarchical:
max_context: 4096
tools: [file_read, file_write, search]
model_tier: workhorse
decomposition:
levels: [goal, milestone, interface, file, gate]
verification: true
parallel_specialized:
max_context: 8192
tools: [file_read, file_write, file_lock, diff_merge]
model_tier: workhorse
handoffs:
explicit: true
include: [changed_files, task_intent, tests_run, risks]
model_tiers:
economy:
models: [local-7b, cheap-api]
tasks: [search, summarization, boilerplate, formatting, docs]
workhorse:
models: [mid-tier-api]
tasks: [implementation, test_generation, refactors]
frontier:
models: [top-tier-api]
tasks: [architecture, debugging, security, review, ambiguity]
harness_pruning:
remove_unused_tools: true
context_relevance_filter: true
deterministic_orchestration: true
verification:
gates_between_phases: true
spec_test_pairing: true
diff_review_against_spec: true
Quick Start Guide
- Define Task Profile: Create a
TaskProfile object for your request, specifying scope, parallelism, familiarity, and risk.
- Run Router: Execute
selectTopology and buildHarness to determine the topology and prune the environment.
- Generate Hierarchy: For macro tasks, call
generateHierarchy to create milestones with verification gates.
- Execute with Tiering: Route sub-tasks using
routeToModelTier. Run milestones sequentially, verifying gates between steps.
- Review and Iterate: Inspect the output against the spec. If verification fails, analyze the harness and topology before escalating model tier.