My AI Agents Kept Burning Tokens on Subagents That Can't Code β So I Built a Decision Gate
My AI Agents Kept Burning Tokens on Subagents That Can't Code β So I Built a Decision Gate
Current Situation Analysis
Running a production fleet of 19 autonomous AI agents revealed a critical structural flaw in multi-agent delegation workflows. Subagents are architecturally optimized for read-only operations (research, analysis, data gathering), but the delegation mechanism blindly routes write, build, and verification tasks to them. This capability mismatch triggers a predictable failure mode:
- Subagent spawns and begins execution
- File operations or build commands fail silently
- The subagent retries with alternative approaches without detecting the failure
- After ~600 seconds, the session times out with zero usable output
Traditional mitigation strategies fail because they treat delegation as a production primitive rather than a UI demo. Prompt engineering alone cannot overcome the structural lack of routing guardrails. Without a pre-execution classifier, agents continuously burn context windows and thousands of tokens on tasks they are fundamentally unequipped to complete, causing systemic workflow collapse in always-on deployments.
WOW Moment: Key Findings
Implementing a decision gate protocol fundamentally altered token economics and task reliability. By intercepting delegation calls and routing based on capability constraints, the system prevents silent failures before they consume context.
| Approach | Token Consumption (per task) | Task Success Rate | Avg Latency | Context Window Retention |
|---|---|---|---|---|
| Naive Delegation | ~12,500 | 15% | 600s (timeout) | <20% |
| Prompt-Engineered Delegation | ~8,200 | 45% | 180s | ~40% |
| Agentic Delegation (Decision Gate) | ~2,100 | 96% | 12s | >90% |
Key Findings:
- 85% of subtasks should be handled directly by the main agent via
write_file,patch, orterminalcalls. Delegation is only optimal for bounded, read-only research. - Validation gates outperform prompt engineering. A 20-line keyword enforcement function consistently catches misclassified tasks where complex few-shot prompts fail.
- Overhead is negligible. The protocol adds ~200ms per delegation decision while preventing catastrophic context window exhaustion.
Core Solution
The Agentic Delegation protocol is a thin decision layer (~400 lines) that intercepts delegate_task calls and enforces capability-aware routing. It operates across three deterministic stages:
1. The Decision Tree
Before any delegation call, the protocol classifies the work against hard rules defined in SKILL.md:
CODING β BLOCKED. Routed to write_file/patch/terminal (10x faster, 100% reliable)
RESEARCH β ALLOWED. But verified after completion, max 2 retries
UNKNOWN β DECOMPOSED. Broken into atomic subtasks first, then routed individually
This is a hard constraint, not a suggestion. If the agent bypasses the rule and delegates coding anyway, a self-correction protocol triggers post-timeout to recover state.
2. The Task Decomposer
Complex prompts are split into atomic subtasks by a lightweight classifier (local LLM or Gemini Flash fallback). No external dependencies beyond Python stdlib:
$ python3.11 scripts/decompose.py \
"Research GRPO training papers, write a summary, and add it to README"
[
{"id": "1", "description": "Research GRPO training papers", "tool": "delegate"},
{"id": "2", "description": "Write a summary of the findings", "tool": "direct"},
{"id": "3", "description": "Update the project README", "tool": "direct"}
]
Three subtasks are generated. Only the research component is delegated. File operations and synthesis are routed directly, ensuring subagents never touch disk I/O.
3. The Validation Gate
Models occasionally misclassify coding tasks as delegable. The validation gate applies a hard keyword check and forcibly reassigns them:
$ echo '[{"id":"1","description":"implement JWT auth","tool":"delegate"}]' \
| python3.11 scripts/decompose.py --validate-only
[{"id": "1", "description": "implement JWT auth", "tool": "direct",
"verify": "[FIXED: was delegate]"}]
The verify annotation preserves an audit trail, explicitly documenting the model's original intent versus the enforced routing decision.
Architecture Flow
User gives agent a complex task
β
βΌ
βββββββββββββββββββββββ
β Decision Tree β β SKILL.md rules
β Coding? β BLOCKED β
β Research? β ALLOW β
β Unknown? β SPLIT β
ββββββββββ¬βββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Task Decomposer β β decompose.py
β Local LLM (free) β
β or Gemini Flash β
ββββββββββ¬βββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Validation Gate β β Hard rule check
β No codingβdelegate β
β Fixed if violated β
ββββββββββ¬βββββββββββββ
β
βΌ
Route each subtask:
direct β write_file / patch
delegate β delegate_task (bounded)
terminal β terminal()
clarify β ask user
The protocol loads automatically as a Hermes skill when delegation triggers fire, or runs as a standalone Python tool. It enforces strict capability boundaries without modifying the underlying agent runtime.
Pitfall Guide
- Delegating Write/Execute Tasks to Read-Only Subagents: Subagents lack reliable file I/O, build execution, and output verification capabilities. Routing coding or patch tasks to them guarantees silent failures and timeout loops. Always map write/terminal operations to direct tool calls.
- Over-Engineering Decomposition Prompts: Investing tokens in complex few-shot examples, strict formatting constraints, or verbose system instructions yields diminishing returns. A lightweight validation function with deterministic keyword matching consistently outperforms prompt tuning.
- Ignoring Silent Subagent Failures: When file operations fail inside a subagent, the model typically retries without detecting the error. This consumes context windows and delays recovery. Implement explicit timeout bounds, output checksums, and post-execution verification gates.
- Treating Delegation as a Production Primitive: Delegation interfaces are designed for controlled demonstrations, not production workloads. In real-world context window constraints, unbounded delegation collapses. Restrict delegation strictly to bounded, read-only research tasks.
- Skipping Atomic Decomposition for Ambiguous Requests: Multi-step prompts ("research, summarize, update README") must never be delegated as monolithic units. Always decompose into atomic subtasks and route each independently based on capability requirements.
- Losing the Decision Audit Trail: Without explicit annotations on routing corrections, debugging delegation failures becomes impossible. Always preserve the original model intent vs. enforced routing in the output payload to enable rapid iteration and guardrail tuning.
Deliverables
- Blueprint: Agentic Delegation Protocol architecture detailing the three-stage routing pipeline (Decision Tree β Task Decomposer β Validation Gate), capability mapping matrix, and fallback recovery mechanisms.
- Checklist: Pre-delegation capability audit, atomic task decomposition validation steps, hard keyword verification rules, direct vs. delegate routing criteria, and timeout/retry boundary configurations.
- Configuration Templates:
SKILL.mdrouting ruleset,decompose.pyCLI invocation patterns, Hermes skill auto-load directory structure, and standalone Python integration scaffolding. - Repository & Stack:
github.com/vystartasv/agentic-delegation| MIT License | Python 3.11+ | oMLX AgenticQwen-8B (local) | Hermes Agent skills system
