inst delivered code.
The practical implication is clear: a tiered routing architecture can reduce token costs by 40β60% while maintaining or improving deliverable quality. The bottleneck is no longer model intelligence; it is pipeline design, harness compatibility, and stage-specific prompt engineering.
Core Solution
Building a cost-efficient AI development pipeline requires decoupling model selection from task execution. Instead of hardcoding a single model per workflow, implement a dynamic router that evaluates task metadata, complexity thresholds, and cost tolerances before dispatching requests.
Step 1: Define Task Taxonomy and Complexity Scoring
Classify workflow stages into discrete task types with explicit complexity metrics. Complexity should factor in context window requirements, tool-use frequency, and constraint strictness.
export enum TaskCategory {
ARCHITECTURE = 'architecture',
UX_DESIGN = 'ux_design',
PLANNING = 'planning',
IMPLEMENTATION = 'implementation',
REVIEW = 'review'
}
export interface TaskDefinition {
id: string;
category: TaskCategory;
complexityScore: number; // 1-10 scale
requiresToolUse: boolean;
outputFormat: 'markdown' | 'json' | 'code';
budgetTier: 'standard' | 'premium';
}
Step 2: Implement Tiered Model Mapping
Map task categories to appropriate model tiers based on empirical performance data. Mid-tier models handle specification and planning efficiently. Flagship models reserve capacity for complex implementation logic and strict review validation.
export enum ModelTier {
MID = 'mid',
FLAGSHIP = 'flagship'
}
export const TIER_ROUTING_MAP: Record<TaskCategory, ModelTier> = {
[TaskCategory.ARCHITECTURE]: ModelTier.MID,
[TaskCategory.UX_DESIGN]: ModelTier.MID,
[TaskCategory.PLANNING]: ModelTier.MID,
[TaskCategory.IMPLEMENTATION]: ModelTier.FLAGSHIP,
[TaskCategory.REVIEW]: ModelTier.FLAGSHIP
};
Step 3: Build the Orchestration Router
The router evaluates task definitions, applies routing rules, and injects stage-specific system prompts. It also handles harness selection and output validation.
export class WorkflowOrchestrator {
private harnessRegistry: Map<ModelTier, string>;
constructor() {
this.harnessRegistry = new Map([
[ModelTier.MID, 'gemini-cli'],
[ModelTier.FLAGSHIP, 'claude-code']
]);
}
public async executeTask(task: TaskDefinition): Promise<TaskResult> {
const targetTier = TIER_ROUTING_MAP[task.category];
const harness = this.harnessRegistry.get(targetTier) ?? 'default';
const systemPrompt = this.generateStagePrompt(task);
const executionConfig = {
model: this.resolveModel(targetTier),
harness,
temperature: task.category === TaskCategory.REVIEW ? 0.1 : 0.3,
maxTokens: this.calculateTokenBudget(task)
};
const rawOutput = await this.invokeModel(systemPrompt, task, executionConfig);
return this.validateAndFormat(rawOutput, task.outputFormat);
}
private generateStagePrompt(task: TaskDefinition): string {
const base = `You are operating in the ${task.category} phase.`;
const constraints = task.category === TaskCategory.ARCHITECTURE
? 'Output must include explicit schema definitions, environment setup steps, and security considerations.'
: task.category === TaskCategory.UX_DESIGN
? 'Include responsive breakpoints, accessibility annotations, and component-level interaction states.'
: 'Maintain strict alignment with prior artifacts and avoid speculative features.';
return `${base} ${constraints}`;
}
private resolveModel(tier: ModelTier): string {
return tier === ModelTier.MID ? 'gemini-3.5-flash' : 'claude-sonnet-4.6';
}
private calculateTokenBudget(task: TaskDefinition): number {
const base = task.complexityScore * 1500;
return task.requiresToolUse ? base * 1.4 : base;
}
private async invokeModel(prompt: string, task: TaskDefinition, config: any): Promise<string> {
// Abstracted model invocation layer
// Handles API calls, retry logic, and harness-specific formatting
return ''; // Placeholder for actual implementation
}
private validateAndFormat(raw: string, format: string): TaskResult {
// Schema validation, markdown parsing, or code linting
return { success: true, payload: raw };
}
}
Architecture Decisions and Rationale
- Stage-Specific Routing: Early phases (architecture, UX, planning) prioritize structured output and constraint adherence. Mid-tier models handle these efficiently at lower cost. Implementation and review require deeper context retention and stricter compliance checking, justifying flagship routing.
- Harness Decoupling: Model capability does not guarantee harness reliability. The router explicitly maps tiers to compatible execution environments, reducing approval friction and environment setup delays observed in benchmark runs.
- Dynamic Token Budgeting: Token allocation scales with complexity and tool-use requirements. This prevents over-provisioning for simple tasks while ensuring sufficient context for complex implementation phases.
- Temperature Control: Review phases use lower temperature (0.1) to minimize hallucination and enforce strict spec compliance. Planning and UX phases use moderate temperature (0.3) to allow creative exploration within constraints.
Pitfall Guide
1. Ignoring Harness Friction
Explanation: Teams often attribute output quality solely to model intelligence, overlooking that harness compatibility dictates execution smoothness. Gemini-based CLI tools introduced repeated permission prompts and environment setup delays, while Claude Code demonstrated more reliable tool chaining.
Fix: Abstract harness selection behind the router. Implement approval automation flags and pre-warm environment configurations before task dispatch.
2. Single-Model Pipeline Fallacy
Explanation: Routing all stages to one model creates cost bloat and underutilizes model strengths. Flagship models waste capacity on boilerplate generation, while mid-tier models struggle with strict review validation.
Fix: Implement tiered routing with explicit stage mapping. Reserve premium models for implementation complexity and compliance validation.
3. Overlooking Early-Phase Spec Granularity
Explanation: Weak architecture or UX specifications cascade into implementation errors. Mid-tier models can produce excellent specs, but only when prompted with explicit structural constraints.
Fix: Enforce structured output schemas for early phases. Require explicit tables for decisions, environment setup steps, and component interaction states.
4. Misjudging Total Cost of Ownership
Explanation: Teams focus on per-token pricing rather than total workflow cost. A cheaper model that produces ambiguous specs increases rework, negating initial savings.
Fix: Track cost per deliverable, not cost per token. Factor in review cycles, bug fixes, and harness friction when calculating ROI.
5. Neglecting Environment & Build Path Validation
Explanation: AI-generated code often assumes idealized environments. Missing .gitignore files, broken build paths, and unresolved dependencies cause deployment failures.
Fix: Inject environment validation steps into the planning phase. Require explicit local-run instructions and dependency resolution checks before implementation dispatch.
6. Assuming Reviewer Models Can Fix Architectural Gaps
Explanation: Review stages cannot compensate for missing architectural decisions or ambiguous UX specifications. Lower-tier models consistently failed review when upstream artifacts lacked operational completeness.
Fix: Implement gate checks between phases. Block implementation dispatch until architecture and UX specs pass structural validation.
7. Skipping Structured Output Constraints
Explanation: Unconstrained model outputs produce inconsistent formatting, making downstream parsing and validation unreliable.
Fix: Mandate JSON or strict markdown schemas for all stages. Use schema validation middleware before accepting model outputs.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Rapid prototyping / MVP scaffolding | Mid-tier routing for all stages | Speed and cost efficiency outweigh strict compliance needs | β 50-60% token spend |
| Production-grade application | Tiered routing (mid for spec/planning, flagship for impl/review) | Balances cost with implementation reliability and compliance | β 30-40% vs full-flagship |
| Complex data pipeline / algorithmic logic | Flagship routing for implementation & review | Requires deep context retention and edge-case handling | β 20-30% but reduces rework |
| UI-heavy / responsive applications | Mid-tier for UX/design, flagship for implementation | UI layout benefits from structured mid-tier output; complex interactions need flagship precision | β 25% with maintained fidelity |
Configuration Template
# pipeline-config.yaml
orchestrator:
default_harness: "claude-code"
fallback_harness: "gemini-cli"
max_retries: 3
timeout_seconds: 120
stages:
architecture:
model_tier: "mid"
model_name: "gemini-3.5-flash"
temperature: 0.2
output_schema: "architecture_spec.json"
required_fields: ["stack_decisions", "security_considerations", "env_setup"]
ux_design:
model_tier: "mid"
model_name: "gemini-3.5-flash"
temperature: 0.3
output_schema: "ux_spec.json"
required_fields: ["responsive_breakpoints", "accessibility_notes", "component_states"]
planning:
model_tier: "mid"
model_name: "gemini-3.5-flash"
temperature: 0.2
output_schema: "backlog.json"
chunking_strategy: "vertical_slices"
implementation:
model_tier: "flagship"
model_name: "claude-sonnet-4.6"
temperature: 0.1
tool_use: true
build_validation: true
review:
model_tier: "flagship"
model_name: "claude-sonnet-4.6"
temperature: 0.05
compliance_check: "strict"
cross_reference: ["architecture", "ux_design", "planning"]
cost_controls:
token_budget_multiplier: 1.2
rework_threshold: 0.15
alert_on_overrun: true
Quick Start Guide
- Initialize the router: Clone the orchestration repository, install dependencies (
npm install), and configure API credentials in .env.
- Load pipeline config: Place
pipeline-config.yaml in the project root. Verify stage mappings and model tiers align with your deliverable requirements.
- Define your first task: Create a
TaskDefinition object specifying category, complexity, and output format. Pass it to WorkflowOrchestrator.executeTask().
- Validate outputs: Run the built-in schema validator against architecture and UX phases. Confirm required fields are populated before proceeding to implementation.
- Monitor metrics: Enable cost tracking and rework logging. Adjust temperature and token budgets based on actual deliverable quality and phase gate pass rates.
Deploying a tiered routing architecture transforms AI-assisted development from a cost center into a predictable, scalable engineering workflow. The models are capable; the pipeline design determines the outcome.