routing intent, a strict input schema, a sequential execution pipeline, and validation fixtures. The schema must be compiled to a format the router can consume natively.
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
interface WorkflowBlueprint<T extends z.ZodTypeAny> {
id: string;
routingIntent: string;
inputSchema: T;
executionPipeline: (args: z.infer<T>, context: ExecutionContext) => Promise<WorkflowResult>;
validationFixtures: Array<{ input: z.infer<T>; expected: WorkflowResult }>;
}
function createWorkflow<T extends z.ZodTypeAny>(blueprint: WorkflowBlueprint<T>) {
return {
...blueprint,
serializedSchema: zodToJsonSchema(blueprint.inputSchema, { strictUnions: true }),
validate: async () => {
for (const fixture of blueprint.validationFixtures) {
const result = await blueprint.executionPipeline(fixture.input, mockContext());
if (!deepEqual(result, fixture.expected)) {
throw new Error(`Fixture mismatch for ${blueprint.id}`);
}
}
return true;
}
};
}
Step 2: Implement the Deterministic Handler
The execution pipeline contains the actual tool calls. This is ordinary, testable code. No LLM is invoked during execution. The sequence is fixed, typed, and version-controlled.
const triageRepositoryWorkflow = createWorkflow({
id: 'repo_triage_v1',
routingIntent: 'analyze recent pull requests and assign labels',
inputSchema: z.object({
repositoryOwner: z.string(),
repositoryName: z.string(),
prCount: z.number().default(5),
labelStrategy: z.enum(['semantic', 'conventional', 'manual']).default('semantic')
}),
executionPipeline: async ({ repositoryOwner, repositoryName, prCount, labelStrategy }, ctx) => {
const prs = await ctx.github.fetchPullRequests(repositoryOwner, repositoryName, prCount);
const triaged = await Promise.all(
prs.map(async (pr) => {
const diff = await ctx.github.fetchDiff(pr.number);
const classification = await ctx.classifier.score(diff, { strategy: labelStrategy });
return ctx.github.applyLabels(pr.number, classification.tags);
})
);
return { processed: triaged.length, results: triaged };
},
validationFixtures: [/* recorded fixtures */]
});
Step 3: Runtime Intent Routing
At execution time, the local model receives the user request and the compiled schema registry. It outputs exactly one tool invocation with extracted arguments. No chain-of-thought, no multi-turn planning.
const router = new LocalIntentRouter({
model: 'qwen2.5:7b-instruct-q4_K_M',
registry: [triageRepositoryWorkflow, /* other macros */]
});
const userRequest = 'Check the last 3 PRs in acme/checkout and tag them using conventional commits';
const routingResult = await router.resolve(userRequest);
// Output: { tool: 'repo_triage_v1', args: { repositoryOwner: 'acme', repositoryName: 'checkout', prCount: 3, labelStrategy: 'conventional' } }
Architecture Decisions & Rationale
- Schema Compilation: Zod schemas do not automatically serialize to JSON Schema in a way LLM parsers expect. Explicit compilation via
zodToJsonSchema with strict union handling prevents the model from guessing parameter names. This single fix accounts for the 41% accuracy jump in routing benchmarks.
- Single-Turn Execution: Workflows must be encoded as complete sequences. Splitting a pipeline into multiple router turns reintroduces runtime reasoning, which defeats the pattern. Composition is handled by creating a new macro that chains existing handlers, not by chaining router calls.
- Deterministic Handlers: Tool sequences are written in standard TypeScript. This enables unit testing, mocking, and CI validation. The LLM never touches the execution path, eliminating hallucination during runtime.
- Intent Matching over Semantic Search: The router uses structured intent strings matched against the request, not vector similarity. This reduces false positives and ensures predictable routing behavior.
Pitfall Guide
1. Schema Serialization Drift
Explanation: Relying on implicit schema conversion causes the router to receive a generic {type: "object"} definition. The model then guesses parameter names, leading to silent argument mismatches.
Fix: Always compile schemas to JSON Schema at definition time. Validate the serialized output against the LLM's expected format before deployment.
2. Over-Granular Macro Splitting
Explanation: Breaking a single workflow into multiple router turns (e.g., fetch β extract β score as separate calls) forces the model to plan at runtime. This reintroduces the exact reasoning bottleneck the pattern aims to eliminate.
Fix: Encode end-to-end sequences as single macros. If workflows need composition, create a parent macro that orchestrates child handlers synchronously.
3. Missing Failure Contracts
Explanation: Handlers assume success paths. When a downstream API returns 429 or malformed data, the macro crashes without a structured recovery path, leaving the router in an undefined state.
Fix: Define explicit error states in the macro contract. Implement retry policies, circuit breakers, and fallback routing to a frontier model when local execution fails beyond a threshold.
4. Applying to Exploratory or Novel Tasks
Explanation: The pattern requires repetitive, well-defined surfaces. Applying it to open-ended debugging, creative generation, or rapidly changing third-party UIs creates maintenance overhead that exceeds the routing benefit.
Fix: Implement hybrid routing. Configure the router to delegate to a frontier API when confidence scores fall below a threshold or when the request matches a "novel" intent category.
5. Skipping the Distillation Gate
Explanation: Teams manually write macros but never enforce encoding of ad-hoc tool sequences. The macro library stagnates while session logs accumulate uncompiled workflows, creating technical debt.
Fix: Wire a CI hook that scans session logs for raw tool call sequences. Fail the build if unencoded workflows exceed a configurable threshold. Auto-suggest macro definitions from log patterns.
6. Underestimating the Model Floor
Explanation: Running the router on models below 7B parameters (especially non-instruct variants) causes high false-positive routing and argument extraction failures. The failure detector fires more often than successful routing.
Fix: Maintain a 7B+ instruct-tuned baseline for routing. Quantization to 4-bit is acceptable, but architecture and instruction tuning are non-negotiable for reliable intent classification.
7. Ignoring Versioning and Schema Evolution
Explanation: Updating a macro's input schema without versioning breaks existing router caches and causes silent argument mapping failures in production.
Fix: Version macros explicitly (v1, v2). Implement schema migration handlers and deprecation warnings. Route legacy requests to archived macro versions until clients update.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-volume repetitive tasks (triage, labeling, extraction) | Compiled Macro Routing | Deterministic execution, predictable latency, local deployment | ~$0.15 per 10k runs |
| Novel reasoning or creative generation | Frontier API Routing | Requires multi-step planning and unstructured surface handling | ~$150 per 10k runs |
| Air-gapped or strict data residency environments | Compiled Macro Routing | Zero external network egress, full control over execution path | Infrastructure only |
| Exploratory debugging or rapidly changing APIs | Local Agent Reasoning | Flexibility outweighs reliability; macro encoding overhead too high | ~$1.20 per 10k runs |
| Mixed workload with 80% routine / 20% novel | Hybrid Routing | Macros handle routine; frontier handles exceptions; cost optimized | ~$35 per 10k runs |
Configuration Template
// router.config.ts
import { createRouter, compileSchemas, loadMacros } from '@internal/workflow-engine';
import { qwen7bInstruct } from '@internal/model-registry';
export const productionRouter = createRouter({
model: qwen7bInstruct,
schemas: compileSchemas(loadMacros('./workflows')),
fallback: {
enabled: true,
threshold: 0.72,
provider: 'frontier-api',
maxRetries: 1
},
telemetry: {
logRoutingDecisions: true,
captureArgumentDrift: true,
exportInterval: '5m'
}
});
// ci-gate.ts
import { scanSessionLogs, suggestMacro } from '@internal/distillation-gate';
export async function enforceEncoding() {
const unencoded = await scanSessionLogs({ window: '24h' });
if (unencoded.length > 0) {
console.error(`Found ${unencoded.length} unencoded workflows.`);
unencoded.forEach(workflow => {
console.warn(suggestMacro(workflow.toolCalls));
});
process.exit(1);
}
}
Quick Start Guide
- Initialize the Macro Registry: Create a
workflows/ directory. Define your first macro using createWorkflow, specifying intent, schema, handler, and validation fixtures.
- Compile Schemas: Run the schema compiler to generate JSON Schema artifacts. Verify the output matches your LLM router's expected format.
- Deploy the Router: Instantiate the local intent router with your compiled registry and a 7B+ instruct-tuned model. Test with sample requests to verify single-turn tool invocation.
- Wire the CI Gate: Add the session log scanner to your pipeline. Configure it to fail builds when unencoded tool sequences exceed your threshold.
- Monitor & Iterate: Track routing confidence scores and argument drift. When fallbacks occur, encode the workflow into a new macro and retire the fallback path.