ding this pipeline requires three coordinated components: the routing proxy, the agent runtime, and the configuration bridge. The architecture prioritizes statelessness, sandbox isolation, and deterministic replay.
Step 1: Deploy the Routing Proxy
Lynkr runs as a Node.js service that exposes both Anthropic Messages and OpenAI Chat Completions interfaces. It sits between the agent and upstream providers. The proxy initializes a SQLite FTS5 database for long-term memory, loads AST parsers for supported languages, and registers provider backends.
// routing-engine.config.ts
import { createRouter, defineTiers, attachGraphify } from '@lynkr/core';
const tierStrategy = defineTiers({
simple: { providers: ['ollama/qwen2.5-coder', 'openrouter/deepseek-chat'] },
medium: { providers: ['openrouter/claude-sonnet-4.5', 'azure/gpt-4o'] },
complex: { providers: ['vertex/gemini-2.0-flash', 'bedrock/anthropic.claude'] },
reasoning: { providers: ['openai/o3-mini', 'anthropic/claude-opus-4'] }
});
const router = createRouter({
port: 8081,
tiers: tierStrategy,
analysis: attachGraphify({
languages: ['typescript', 'python', 'rust', 'go', 'java'],
metrics: ['cyclomatic_complexity', 'dependency_depth', 'module_cohesion']
}),
telemetry: {
storage: 'sqlite://./lynkr.db',
metricsEndpoint: '/metrics',
circuitBreaker: { threshold: 0.85, recovery: 'half-open' }
}
});
export default router;
Why this structure: Separating tier definitions from provider registration allows hot-reloading without service restarts. Graphify attaches at initialization, ensuring every incoming request is parsed for structural complexity before routing. The circuit breaker prevents cascade failures when upstream providers throttle or degrade.
OpenHands operates through an event-sourced architecture. The V1 SDK splits the system into SDK, Tools, Workspace, and Server packages. Mutable context lives in a single ConversationState object, while actions and observations are immutable Pydantic events. This design enables deterministic replay, pause/resume, and full audit trails.
The agent connects to the routing proxy via LiteLLM. Environment variables point to the local proxy endpoint, and the runtime handles provider abstraction automatically.
# openhands_runtime.env
LITELLM_BASE_URL=http://localhost:8081/v1
LITELLM_API_KEY=internal-routing-token
LITELLM_MODEL=auto-route
LITELLM_TIMEOUT=30
LITELLM_MAX_RETRIES=2
SANDBOX_RUNTIME=docker
SANDBOX_IMAGE_TAG=source-hash
SANDBOX_ISOLATION=cow-overlay
SECURITY_ANALYZER_LEVEL=medium
SKILLS_DIR=.openhands/microagents
Why this structure: LiteLLM standardizes the dispatch interface, allowing OpenHands to remain provider-agnostic. The auto-route model identifier signals the proxy to evaluate the request rather than forwarding it blindly. Sandbox isolation uses copy-on-write overlays to prevent host contamination while maintaining fast iteration cycles. The security analyzer scores tool calls before execution, blocking high-risk operations until human confirmation.
Step 3: Bridge Configuration and Skill Injection
Skills (formerly microagents) provide domain-specific context without bloating every prompt. They activate conditionally based on conversation keywords. The routing proxy complements this by compressing historical context using a sliding window and SHA-256-keyed LRU cache.
// skill-router.bridge.ts
import { SkillRegistry, ContextCompressor } from '@openhands/sdk';
const registry = new SkillRegistry({
baseDir: '.openhands/microagents',
triggerMode: 'keyword',
maxConcurrent: 3
});
registry.register({
id: 'frontend-guidelines',
keywords: ['react', 'component', 'css', 'ui'],
payload: 'frontend.md'
});
registry.register({
id: 'migration-patterns',
keywords: ['schema', 'migration', 'database', 'sql'],
payload: 'migrations.md'
});
const compressor = new ContextCompressor({
strategy: 'sliding-window',
cacheKey: 'sha256',
maxTokens: 12000,
deduplication: true
});
export { registry, compressor };
Why this structure: Conditional skill loading prevents context window exhaustion. The compressor maintains conversation coherence while discarding redundant observations. SHA-256 caching ensures identical prompt structures reuse compressed representations, reducing redundant token generation.
Architecture Decisions and Rationale
- Event-Sourced State: Modeling the agent as a pure function from event history to next event eliminates hidden state. Every action is replayable, enabling deterministic debugging and session forking.
- AST-Based Routing: Heuristic routing based on message length or keyword matching fails on complex codebases. Graphify evaluates actual structural complexity, ensuring routing decisions align with cognitive load requirements.
- Sandbox Isolation: Direct host execution introduces security risks and environment drift. Containerized runtimes with copy-on-write overlays guarantee reproducible execution and clean teardown.
- Provider Abstraction: LiteLLM decouples the agent from provider-specific SDKs. Adding a new model requires zero code changes, only configuration updates in the routing proxy.
Pitfall Guide
1. Unbounded Execution Loops
Explanation: Autonomous agents can enter recursive debugging cycles when tests fail repeatedly or file modifications trigger linting errors. Without intervention, token consumption escalates rapidly.
Fix: Implement token budgets and iteration caps at the proxy layer. Configure circuit breakers that pause sessions after 15 consecutive failed actions, requiring manual review or context reset.
2. Sandbox State Drift
Explanation: Bind mounts and named volumes can accumulate stale artifacts across sessions. Subsequent runs may execute against outdated dependencies or cached build outputs.
Fix: Enforce ephemeral containers with explicit volume initialization. Use copy-on-write overlay modes and validate dependency hashes before execution. Clean up orphaned images weekly.
3. Routing Tier Misalignment
Explanation: Graphify thresholds may misclassify straightforward refactors as complex operations, routing them to expensive reasoning models unnecessarily.
Fix: Calibrate routing weights using historical telemetry. Implement A/B routing for edge cases and adjust complexity scores based on actual execution outcomes. Log misrouted requests for threshold tuning.
4. Context Window Saturation
Explanation: Loading all skills simultaneously or retaining full conversation history exhausts the context window, degrading model performance and increasing latency.
Fix: Enforce keyword-triggered skill activation. Apply sliding-window compression with relevance scoring. Prune observations that don't contribute to the current task objective.
5. Cache Invalidation Staleness
Explanation: SHA-256 LRU caching may serve compressed prompts based on outdated file states, causing the agent to operate on stale code references.
Fix: Tie cache keys to git commit hashes or file modification timestamps. Implement TTL-based expiration for cached contexts. Invalidate cache entries when dependency graphs change.
6. Security Analyzer False Positives
Explanation: The LLMSecurityAnalyzer may block legitimate file writes or command executions, halting productive sessions unnecessarily.
Fix: Maintain an allowlist for known-safe operations. Tune severity thresholds based on repository sensitivity. Implement a confirmation queue for medium-risk actions rather than hard blocks.
7. Provider Rate Limiting
Explanation: Bursty agent behavior can trigger upstream rate limits, causing request failures and session interruptions.
Fix: Configure request queuing and exponential backoff at the proxy. Distribute load across multiple provider endpoints. Monitor P95 latency and trigger fallback routing when thresholds are breached.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Local Development | LocalRuntime + Ollama routing | Fast iteration, zero cloud costs, immediate feedback | Near-zero |
| CI/CD Pipeline | RemoteRuntime + OpenRouter fallback | Scalable, reproducible, handles burst workloads | Moderate ($0.02β$0.08/session) |
| Enterprise Fleet | Kubernetes + Bedrock/Vertex + Lynkr | Centralized routing, RBAC, audit trails, load shedding | High upfront, 60%+ savings vs direct API |
| Security-Sensitive Repo | Docker sandbox + SecurityAnalyzer HIGH + Local models | Isolation, no external data exfiltration, compliance | Infrastructure-heavy, token costs minimal |
Configuration Template
# lynkr-routing.yaml
proxy:
port: 8081
api_compat: [anthropic_messages, openai_chat]
health_check: /v1/admin/health
routing:
tiers:
simple:
providers: [ollama/qwen2.5-coder:7b, openrouter/deepseek-chat]
max_complexity: 0.35
medium:
providers: [openrouter/claude-sonnet-4.5, azure/gpt-4o]
max_complexity: 0.65
complex:
providers: [vertex/gemini-2.0-flash, bedrock/anthropic.claude]
max_complexity: 0.85
reasoning:
providers: [openai/o3-mini, anthropic/claude-opus-4]
max_complexity: 1.0
analysis:
graphify:
enabled: true
languages: [typescript, python, rust, go, java, csharp]
metrics: [cyclomatic_complexity, dependency_depth, module_cohesion, blast_radius]
optimization:
pipeline:
- smart_tool_selection
- code_mode_meta_tools
- distill_compression
- sha256_lru_cache
- memory_dedup
- sliding_window_history
- ml_headroom_sidecar
memory:
storage: sqlite://./lynkr.db
scoring: [surprise, recency, relevance]
injection: context_window_slice
telemetry:
metrics: /metrics
circuit_breaker:
threshold: 0.85
recovery: half-open
probe_interval: 30s
reload: POST /v1/admin/reload
Quick Start Guide
- Initialize the Proxy: Run
docker run -d -p 8081:8081 -v ./lynkr.db:/app/data/lynkr.db lynkr/proxy:latest. Verify health at http://localhost:8081/v1/admin/health.
- Configure OpenHands: Export the LiteLLM environment variables pointing to
http://localhost:8081/v1. Set LITELLM_MODEL=auto-route.
- Launch the Agent: Execute
openhands run --runtime docker --skills-dir .openhands/microagents. The agent will automatically route requests through the proxy.
- Validate Routing: Monitor
/metrics for tier distribution and P95 latency. Check lynkr.db for routing telemetry and quality scores. Adjust Graphify thresholds if simple tasks escalate unnecessarily.
- Secure the Loop: Enable
SECURITY_ANALYZER_LEVEL=medium in OpenHands. Configure token budgets in the proxy config. Test with a known issue to verify sandbox isolation and skill activation.