Run OpenAI Codex CLI on Claude, Gemini, or Llama — in 50 lines of C#
Decoupling Agent CLIs from Inference Engines: A Responses API Gateway Pattern
Current Situation Analysis
The modern developer toolchain is rapidly converging on a single protocol: OpenAI's Responses API. Originally introduced as an extension to Chat Completions, it has become the de facto standard for agentic workflows, offering native support for tool execution, reasoning traces, and structured output streaming. However, this standardization comes with a hidden cost: protocol-level vendor lock-in.
OpenAI's Codex CLI exemplifies this shift. As of early 2026, the tool dropped Chat Completion support entirely, routing all interactions through the Responses API. The WireApi enum in the underlying codebase now contains a single variant: Responses. This architectural decision means that any developer wishing to use the Codex UX with alternative models—whether self-hosted Llama instances, Anthropic's Claude, or Google's Gemini—faces a hard boundary. The CLI does not speak the older protocol, and it does not natively route to non-OpenAI inference endpoints.
This problem is frequently misunderstood. Many teams assume that because a CLI is branded by a specific vendor, it is cryptographically or architecturally bound to that vendor's models. In reality, the Responses API is a transparent HTTP/SSE specification. The lock-in is purely contractual and implementation-level, not cryptographic. Developers overlook the fact that the protocol is just a structured stream of JSON events. If you can stand up a lightweight gateway that accepts Responses API payloads, translates them to a vendor-neutral chat abstraction, and streams the results back in the exact SSE format the CLI expects, the inference engine becomes completely interchangeable.
The industry pain point is clear: powerful agent UXs are trapped behind proprietary wire formats. The solution is not to rewrite the CLI, but to build a protocol translation layer that sits between the tool and the model provider.
WOW Moment: Key Findings
The critical insight is that the Responses API is a stateless streaming contract, not a proprietary black box. By implementing a thin gateway that maps POST /v1/responses to a vendor-agnostic IChatClient interface, you decouple the execution environment from the inference provider.
| Approach | Model Flexibility | Protocol Compliance | Setup Overhead | Cost Efficiency |
|---|---|---|---|---|
| Native OpenAI Endpoint | OpenAI only | Full | Zero | High (vendor pricing) |
| Chat Completions Wrapper | High | Broken (Codex dropped support) | Medium | Variable |
| Responses API Gateway | Unlimited (Claude, Gemini, Llama, etc.) | Full | Low | Optimized (provider-agnostic) |
This finding matters because it transforms a vendor-specific CLI into a generic agent orchestrator. You retain the sophisticated tool loop (shell, apply_patch, plan tracking) while routing inference to the most cost-effective or capable model available. The gateway acts as a protocol adapter, preserving the UX while eliminating inference lock-in.
Core Solution
Building a Responses API gateway requires three architectural components: an HTTP router that accepts the Responses payload, a state manager for conversation chaining, and an SSE emitter that reconstructs the exact event sequence the CLI expects. We will use Microsoft.Extensions.AI as the vendor-neutral abstraction layer, allowing any backend to plug into the same routing logic.
Step 1: Define the Gateway Router
The gateway exposes a single endpoint: POST /v1/responses. It must also serve a model catalog to prevent metadata fallback warnings. We'll use ASP.NET Core Minimal APIs for lightweight routing.
using Microsoft.Extensions.AI;
using System.Text.Json;
using System.Text.Json.Serialization;
var builder = WebApplication.CreateBuilder();
builder.Services.AddLogging();
var app = builder.Build();
// In-memory state for turn chaining
var conversationStore = new Dictionary<string, IList<ChatMessage>>();
app.MapPost("/v1/responses", async (HttpRequest request, IChatClient chatClient) =>
{
var payload = await JsonSerializer.DeserializeAsync<ResponsesPayload>(request.Body);
if (payload is null) return Results.BadRequest("Invalid payload");
// Reconstruct conversation history if previous_response_id is provided
var history = new List<ChatMessage>();
if (!string.IsNullOrEmpty(payload.PreviousResponseId) &&
conversationStore.TryGetValue(payload.PreviousResponseId, out var storedHistory))
{
history.AddRange(storedHistory);
}
// Map incoming messages to IChatMessage format
foreach (var msg in payload.Input)
{
history.Add(new ChatMessage((ChatRole)msg.Role, msg.Content));
}
// Register Codex tools as passthrough schemas
var toolDefinitions = payload.Tools.Select(t =>
new AITool(t.Type, t.Function.Name, t.Function.Description, t.Function.Parameters)).ToList();
// Generate unique response ID for chaining
var responseId = Guid.NewGuid().ToString("N");
conversationStore[responseId] = history;
// Stream response back to client
await StreamResponsesAsync(chatClient, history, toolDefinitions, responseId, request.HttpContext.Response);
return Results.Ok();
});
app.MapGet("/v1/models", () => Results.Json(ModelCatalogRegistry.GetCatalog()));
app.Run("http://localhost:8080");
Step 2: Implement SSE Streaming Translation
The Responses API expects a strict sequence of Server-Sent Events. The gateway must translate ChatResponseUpdate objects from IChatClient into the exact event types Codex parses.
async Task StreamResponsesAsync(IChatClient client, IList<ChatMessage> history, IList<AITool> tools, string responseId, HttpResponse response)
{
response.ContentType = "text/event-stream";
response.Headers.CacheControl = "no-cache";
// 1. Emit creation event
await EmitSse(response, "response.created", new { id = responseId, status = "created" });
// 2. Emit in-progress event
await EmitSse(response, "response.in_progress", new { id = responseId });
// 3. Stream output items
var outputItemId = Guid.NewGuid().ToString("N");
await EmitSse(response, "response.output_item.added", new { output_index = 0, item = new { id = outputItemId, type = "message" } });
var fullResponse = new List<string>();
await foreach (var update in client.GetStreamingResponseAsync(history, new() { Tools = tools }))
{
if (update.Contents.OfType<TextContent>().FirstOrDefault() is { } text)
{
await EmitSse(response, "response.output_text.delta", new { delta = text.Text });
fullResponse.Add(text.Text);
}
if (update.Contents.OfType<FunctionCallContent>().FirstOrDefault() is { } funcCall)
{
await EmitSse(response, "response.function_call_arguments.delta", new {
call_id = funcCall.CallId,
arguments = funcCall.Arguments?.ToString()
});
}
}
// 4. Emit completion events
await EmitSse(response, "response.output_item.done", new { output_index = 0, item = new { id = outputItemId, type = "message" } });
await EmitSse(response, "response.completed", new { id = responseId, status = "completed" });
await EmitSse(response, "[DONE]", new { });
}
async Task EmitSse(HttpResponse response, string eventType, object data)
{
var json = JsonSerializer.Serialize(data);
await response.WriteAsync($"event: {eventType}\ndata: {json}\n\n");
await response.Body.FlushAsync();
}
Step 3: Configure the Vendor-Neutral Client
The gateway remains agnostic to the underlying model. You inject any IChatClient implementation at startup. For OpenRouter, Anthropic, or local Ollama instances, the routing logic remains identical.
// Example: OpenRouter configuration
var openRouterOptions = new ChatClientOptions
{
Endpoint = new Uri("https://openrouter.ai/api/v1"),
ModelId = Environment.GetEnvironmentVariable("INFERENCE_MODEL") ?? "anthropic/claude-3.5-sonnet"
};
var chatClient = new OpenAI.Chat.ChatClient(
openRouterOptions.ModelId,
new System.ClientModel.ApiKeyCredential(Environment.GetEnvironmentVariable("INFERENCE_API_KEY")),
new OpenAI.OpenAIClientOptions { Endpoint = openRouterOptions.Endpoint })
.AsIChatClient();
builder.Services.AddSingleton<IChatClient>(chatClient);
Architecture Rationale
IChatClientAbstraction: Decouples protocol translation from inference logic. Switching providers requires zero changes to the SSE streaming or routing code.- Passthrough Tool Schemas: Codex expects to execute tools locally. The gateway forwards tool definitions as raw JSON schemas without implementing handlers, allowing the CLI to manage execution while the model generates the calls.
- Bounded State Dictionary:
previous_response_idchaining is handled via an in-memory dictionary. This avoids database overhead while maintaining turn continuity. In production, this should be swapped for a distributed cache (Redis) if scaling horizontally. - Strict SSE State Machine: The event sequence (
created→in_progress→output_item.added→delta→completed) mirrors the exact parsing expectations of Rust-based CLI clients. Deviating from this order causes silent deserialization failures.
Pitfall Guide
1. UTF-8 BOM Corruption in JSON Catalogs
Explanation: .NET's default Encoding.UTF8 emits a Byte Order Mark (EF BB BF). Strict JSON parsers (like Rust's serde_json, which Codex uses) reject BOM-prefixed files per RFC 8259.
Fix: Always serialize with new UTF8Encoding(encoderShouldEmitUTF8Identifier: false). Validate generated files with jq . catalog.json before deployment.
2. Context Window Mismatch & Silent Truncation
Explanation: Declaring a context_window larger than the actual model supports causes the CLI to send oversized prompts. The inference provider silently truncates or rejects them, degrading output quality without explicit errors.
Fix: Dynamically fetch model metadata from the provider's API or maintain a versioned registry. Never hardcode limits. Implement a pre-flight validation that compares requested token counts against the declared window.
3. Ignoring previous_response_id State Management
Explanation: The Responses API relies on previous_response_id to chain turns without resending full history. Dropping this ID breaks conversation continuity, forcing the model to lose context.
Fix: Implement a bounded LRU cache for conversation history. Map previous_response_id to the accumulated IList<ChatMessage>. Evict entries older than a configurable TTL to prevent memory leaks.
4. Tool Schema Serialization Drift
Explanation: Codex expects strict JSON Schema drafts (typically draft-07). Minor deviations in type definitions, required fields, or additionalProperties flags cause the CLI to reject tool calls. Fix: Use a schema validation library to verify tool definitions before transmission. Log schema diffs during development. Never manually construct JSON schema strings; use strongly-typed builders.
5. SSE Event Ordering Violations
Explanation: The CLI parses events sequentially. Emitting response.completed before response.output_item.done, or interleaving deltas incorrectly, breaks the state machine.
Fix: Implement a strict event emitter wrapper that enforces state transitions. Use a queue-based approach to guarantee FIFO delivery. Add integration tests that replay exact event sequences against a mock parser.
6. Catalog File Replacement vs. Extension
Explanation: The model_catalog_json configuration key replaces the CLI's bundled catalog entirely. Omitting built-in models (like gpt-5-codex) breaks fallback behavior for users who expect them.
Fix: Merge your custom entries with the official catalog JSON before writing to disk. Maintain a base template and inject custom slugs programmatically. Document this behavior clearly for team adoption.
7. Environment Variable Scope Leakage
Explanation: Storing API keys in global shell environments risks accidental exposure to child processes or CI pipelines. It also complicates multi-tenant setups.
Fix: Use scoped process environments or .env files loaded at gateway startup. Implement a configuration validator that fails fast if required keys are missing. Rotate keys via secret management tools (HashiCorp Vault, AWS Secrets Manager) in production.
Production Bundle
Action Checklist
- Validate JSON catalog files with strict parsers before deployment
- Implement bounded LRU cache for
previous_response_idstate management - Enforce strict SSE event ordering via a state machine wrapper
- Dynamically resolve context windows from provider metadata APIs
- Merge custom model entries with official catalog to preserve fallbacks
- Scope API keys to process environments; avoid global shell exports
- Add schema validation middleware for all tool definitions
- Configure connection pooling and retry logic for streaming endpoints
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Local Development / Prototyping | In-memory dictionary + file-based catalog | Zero infrastructure overhead; fast iteration | None |
| Multi-Model Routing (Claude/Gemini/Llama) | OpenRouter or unified aggregator gateway | Single API key; automatic fallback routing | Moderate (aggregator markup) |
| High-Throughput CI/CD Pipelines | Distributed cache (Redis) + stateless gateway | Horizontal scaling; shared conversation state | Low (infrastructure cost) |
| Strict Compliance / Air-Gapped | Self-hosted Ollama + local catalog | No external network calls; full data control | High (hardware/ops) |
Configuration Template
gateway-config.toml
model = "custom-aggregator-slug"
model_provider = "local-gateway"
model_catalog_json = "./catalog.json"
[model_providers.local-gateway]
name = "Vendor-Neutral Responses Gateway"
base_url = "http://localhost:8080/v1"
wire_api = "responses"
env_key = "GATEWAY_AUTH_TOKEN"
stream_idle_timeout_ms = 300000
catalog.json
{
"models": [
{
"slug": "custom-aggregator-slug",
"display_name": "Aggregator (Claude/Gemini/Llama)",
"description": "Vendor-neutral inference gateway",
"supported_reasoning_levels": [],
"shell_type": "default",
"visibility": "list",
"supported_in_api": true,
"priority": 50,
"availability_nux": null,
"upgrade": null,
"base_instructions": "",
"supports_reasoning_summaries": false,
"support_verbosity": false,
"default_verbosity": null,
"apply_patch_tool_type": "freeform",
"truncation_policy": { "mode": "tokens", "limit": 8192 },
"supports_parallel_tool_calls": true,
"context_window": 200000,
"max_context_window": 200000,
"auto_compact_token_limit": 180000,
"effective_context_window_percent": 95,
"experimental_supported_tools": []
}
]
}
Quick Start Guide
- Initialize the Gateway: Create a new ASP.NET Core minimal API project. Install
Microsoft.Extensions.AIand your preferred provider SDK (e.g.,OpenAI,Anthropic, orOllama). - Configure Environment Variables: Set
INFERENCE_API_KEYandINFERENCE_MODELin your shell or.envfile. Ensure the model slug matches your provider's routing format. - Generate Catalog & Config: Run the gateway startup routine to emit
catalog.jsonandgateway-config.tomlinto a dedicated directory. Verify JSON validity withjq . catalog.json. - Launch & Connect: Start the gateway (
dotnet run). In a separate terminal, setCODEX_HOMEto your config directory andGATEWAY_AUTH_TOKENto any non-empty string. Executecodexto begin routing through the gateway.
Mid-Year Sale — Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register — Start Free Trial7-day free trial · Cancel anytime · 30-day money-back
