nd a budget router that selects executors based on remaining allocation.
Step 1: Failure Classification Architecture
Traditional retry policies count attempts. They cannot distinguish between a network timeout and a guardrail rejection. The solution is a FailureResolver that maps every exception to an execution category. The resolver evaluates the exception type, extracts metadata like Retry-After headers, and returns a classification enum. Unknown exceptions defer to a default resolver, preserving backward compatibility.
public enum ExecutionCategory {
TRANSIENT,
PERMANENT,
BUDGET_EXHAUSTED
}
@FunctionalInterface
public interface FailureResolver {
ExecutionCategory resolve(Throwable cause);
}
public final class DefaultFailureResolver implements FailureResolver {
@Override
public ExecutionCategory resolve(Throwable cause) {
if (cause instanceof IOException || cause instanceof TimeoutException) {
return ExecutionCategory.TRANSIENT;
}
if (cause instanceof ProviderRateLimitException) {
return ExecutionCategory.TRANSIENT;
}
if (cause instanceof GuardrailViolationException || cause instanceof InvalidSchemaException) {
return ExecutionCategory.PERMANENT;
}
if (cause instanceof QuotaExhaustedException) {
return ExecutionCategory.BUDGET_EXHAUSTED;
}
return null; // Defer to fallback resolver
}
}
Composing resolvers uses a chain-of-responsibility pattern. Custom domain rules wrap the default resolver, returning null when they cannot classify the error.
public final class CompositeFailureResolver implements FailureResolver {
private final List<FailureResolver> chain;
public CompositeFailureResolver(FailureResolver... resolvers) {
this.chain = List.of(resolvers);
}
@Override
public ExecutionCategory resolve(Throwable cause) {
for (FailureResolver resolver : chain) {
ExecutionCategory result = resolver.resolve(cause);
if (result != null) return result;
}
return ExecutionCategory.TRANSIENT; // Safe default
}
}
The retry engine consumes this classification. TRANSIENT triggers backoff with jitter. PERMANENT halts immediately. BUDGET_EXHAUSTED emits an interrupt event for human approval. This separation ensures the graph module remains decoupled from provider-specific SDKs, maintaining zero compile-time dependencies on Spring AI or web frameworks.
Step 2: Budget-Gated Routing Architecture
Budget policies traditionally act as hard caps. Upgrading them to routing controllers requires exposing remaining allocation as a readable state. A BudgetTracker maintains run-level limits and provides O(1) access to remaining funds. A CostThresholdRouter compares remaining budget against a degradation threshold, selecting executors deterministically.
public interface BudgetTracker {
BigDecimal getRemaining();
void consume(BigDecimal amount);
boolean isBelowThreshold(BigDecimal threshold);
}
public final class CostThresholdRouter {
private final BudgetTracker tracker;
private final BigDecimal degradationThreshold;
private final String primaryExecutor;
private final String fallbackExecutor;
public CostThresholdRouter(BudgetTracker tracker,
BigDecimal threshold,
String primary,
String fallback) {
this.tracker = tracker;
this.degradationThreshold = threshold;
this.primaryExecutor = primary;
this.fallbackExecutor = fallback;
}
public String selectExecutor() {
return tracker.isBelowThreshold(degradationThreshold)
? fallbackExecutor
: primaryExecutor;
}
}
The coordinator binds executors to the router. When a node executes, it queries the router, which reads the live budget counter. No LLM call, no complexity scoring, no heuristic guessing. The moment remaining funds drop below the threshold, routing switches deterministically. This approach is provably cheaper than ex-ante classification because state reads cost nothing, while complexity scoring consumes tokens and adds latency.
Architecture Rationale
- Classification over counting: Attempt limits are blind. Category-driven routing respects error semantics and provider signals.
- State reading over heuristic scoring: Deterministic routing eliminates the chicken-and-egg problem of spending tokens to decide how to spend tokens.
- Composable resolvers: Domain-specific rules wrap defaults, preserving backward compatibility while enabling fine-grained control.
- Zero framework coupling: The execution engine operates on interfaces, keeping the graph module lightweight and provider-agnostic.
Pitfall Guide
Explanation: Blind exponential backoff violates server cooldown hints, causing repeated 429 responses during peak load.
Fix: Parse the Retry-After header from rate-limit exceptions. Override computed backoff with the explicit delay. Implement a header extractor in the failure resolver.
2. Budget Scope Leakage
Explanation: Applying run-level budget limits to session-level routing causes premature degradation. A single expensive node exhausts the budget, forcing fallback for subsequent lightweight tasks.
Fix: Align budget scope with execution context. Use hierarchical limits: session cap, run cap, node cap. Reset counters per run, not per session.
3. Hardcoding Fallback Thresholds
Explanation: Static dollar amounts ignore token price volatility and model updates. A $1.00 threshold may be too aggressive or too lenient depending on current pricing.
Fix: Calculate thresholds dynamically based on the cost estimator. Use a percentage of remaining budget (e.g., 20%) instead of fixed amounts. Reconcile thresholds with provider rate cards quarterly.
4. Over-Classifying Transient Errors
Explanation: Treating network blips, DNS failures, or temporary provider outages as permanent halts progress unnecessarily.
Fix: Default to TRANSIENT for I/O and timeout exceptions. Reserve PERMANENT for explicit guardrail violations, schema mismatches, and quota limits. Use explicit markers rather than inference.
5. Silent Budget Exhaustion
Explanation: Failing without surfacing an interrupt request leaves the system in an undefined state. Downstream nodes receive null responses or throw unhandled exceptions.
Fix: Emit structured InterruptRequest events when budget exhaustion occurs. Include remaining allocation, consumed amount, and approval workflow reference. Integrate with human-in-the-loop systems for budget top-ups.
6. Estimator Drift
Explanation: The cost meter diverges from actual provider billing due to untracked overhead, prompt caching, or rate changes. Budget routing becomes inaccurate.
Fix: Reconcile the estimator with provider invoices weekly. Add a 5-10% buffer to account for untracked costs. Implement drift detection alerts when meter variance exceeds 15%.
7. Missing Jitter in Backoff
Explanation: Synchronized retries across multiple agents cause thundering herd effects, overwhelming provider endpoints.
Fix: Apply randomized jitter to exponential delays. Use a multiplier range (e.g., 0.5x to 1.5x) on the base delay. Ensure jitter is per-agent, not global.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-volume batch processing | Budget-gated routing with 20% degradation threshold | Prevents mid-run exhaustion, maintains throughput | Predictable spend, avoids cliff-edge failures |
| Interactive chat sessions | Reason-aware retries with Retry-After parsing | Respects rate limits, reduces latency spikes | Lower retry overhead, fewer 429 loops |
| Mixed workload (heavy + light nodes) | Hierarchical budget limits + dynamic threshold routing | Aligns allocation with node complexity | Optimizes token usage, prevents over-provisioning |
| Strict compliance environments | Permanent classification for guardrail violations | Halts execution on policy breaches immediately | Eliminates wasted attempts, ensures audit trails |
| Multi-provider routing | Composite failure resolver + estimator reconciliation | Handles provider-specific error semantics | Reduces drift, maintains routing accuracy |
Configuration Template
// 1. Define cost estimator and tracker
CostEstimator estimator = new TokenBasedEstimator(Map.of(
"premium-model", new BigDecimal("0.03"),
"fallback-model", new BigDecimal("0.005")
));
BudgetTracker tracker = BudgetTracker.hierarchical(
new BigDecimal("5.00"), // Run limit
estimator
);
// 2. Compose failure resolver
FailureResolver domainResolver = cause -> {
if (cause instanceof QuotaExhaustedException)
return ExecutionCategory.BUDGET_EXHAUSTED;
if (cause instanceof GuardrailViolationException)
return ExecutionCategory.PERMANENT;
return null; // Defer to default
};
FailureResolver resolver = new CompositeFailureResolver(
domainResolver,
new DefaultFailureResolver()
);
// 3. Configure threshold router
CostThresholdRouter router = new CostThresholdRouter(
tracker,
new BigDecimal("1.00"), // Degradation threshold
"premium-executor",
"fallback-executor"
);
// 4. Bind coordinator
ExecutionCoordinator coordinator = ExecutionCoordinator.builder()
.registerExecutor("premium-executor", premiumAgent)
.registerExecutor("fallback-executor", fallbackAgent)
.withRouter(router)
.withFailureResolver(resolver)
.withRetryPolicy(RetryPolicy.exponential(3, Duration.ofSeconds(1))
.withJitter(0.5))
.build();
Quick Start Guide
- Add orchestration dependency: Include the agent execution module in your build configuration. Ensure the graph module remains decoupled from provider SDKs.
- Define cost estimator: Implement a token-based estimator that maps model identifiers to per-token pricing. Add a 5% buffer for untracked overhead.
- Wire policies: Initialize the budget tracker, compose the failure resolver, and configure the threshold router. Bind executors to the coordinator.
- Run with stubs: Execute the graph using deterministic mocks. Verify that permanent failures halt immediately, rate limits respect
Retry-After, and routing degrades when budget drops below the threshold.
- Monitor drift: Log budget consumption per run. Reconcile with provider invoices weekly. Adjust thresholds and estimator rates as pricing evolves.