lity declaration.
use async_trait::async_trait;
use futures::stream::Stream;
use std::pin::Pin;
#[async_trait]
pub trait ModelGateway: Send + Sync {
fn vendor_id(&self) -> &str;
fn display_name(&self) -> &str;
async fn complete(
&self,
payload: UnifiedInferenceRequest,
) -> Result<UnifiedInferenceResponse, GatewayError>;
async fn stream_complete(
&self,
payload: UnifiedInferenceRequest,
) -> Result<
Pin<Box<dyn Stream<Item = Result<StreamChunk, GatewayError>> + Send>>,
GatewayError,
>;
async fn discover_models(&self) -> Result<Vec<ModelMetadata>, GatewayError>;
async fn verify_connectivity(&self) -> Result<HealthStatus, GatewayError>;
fn supported_features(&self) -> FeatureFlags;
}
Why this structure? Separating complete and stream_complete prevents backpressure leaks and allows providers to optimize their transport layers independently. Declaring supported_features upfront enables runtime capability negotiation, preventing invalid requests before they hit the network.
Vendor-agnostic types eliminate format coupling. The request structure centralizes prompt injection, tool schemas, and reasoning budgets. The response structure standardizes content blocks, termination signals, and telemetry.
pub struct UnifiedInferenceRequest {
pub target_model: String,
pub conversation_history: Vec<DialogueTurn>,
pub system_instruction: Option<InstructionBlock>,
pub tool_schemas: Vec<ToolSpecification>,
pub max_output_tokens: u32,
pub sampling_temperature: Option<f32>,
pub reasoning_budget: Option<ReasoningConfig>,
pub vendor_overrides: serde_json::Value,
}
pub struct UnifiedInferenceResponse {
pub request_id: String,
pub generated_content: Vec<ContentSegment>,
pub termination_signal: TerminationReason,
pub token_telemetry: UsageMetrics,
pub resolved_model: String,
}
Why this structure? vendor_overrides preserves provider-specific parameters without polluting the core schema. reasoning_budget abstracts divergent thinking configurations (Anthropic's budget_tokens, Google's thinkingBudget, OpenAI's reasoning_effort) into a single normalized field. The router inspects supported_features() before injecting reasoning parameters, preventing 400 errors on unsupported models.
The translator converts normalized types into vendor-specific JSON payloads. OpenAI requires system prompts embedded in the message array, tool definitions wrapped in a function object, and tool results delivered as separate role: "tool" messages.
impl OpenAITranslator {
fn serialize_conversation(
history: &[DialogueTurn],
instruction: Option<&InstructionBlock>,
) -> Vec<serde_json::Value> {
let mut formatted = Vec::new();
if let Some(sys) = instruction {
formatted.push(serde_json::json!({
"role": "system",
"content": sys.raw_text
}));
}
for turn in history {
match turn.participant {
Participant::User => {
Self::flatten_user_turn(&mut formatted, &turn.segments);
}
Participant::Assistant => {
let (text_payload, tool_invocations) =
Self::extract_assistant_segments(&turn.segments);
formatted.push(serde_json::json!({
"role": "assistant",
"content": text_payload,
"tool_calls": tool_invocations
}));
}
}
}
formatted
}
fn normalize_tool_definitions(schemas: &[ToolSpecification]) -> Vec<serde_json::Value> {
schemas.iter().map(|spec| {
serde_json::json!({
"type": "function",
"function": {
"name": spec.identifier,
"description": spec.human_readable_desc,
"parameters": spec.json_schema
}
})
}).collect()
}
}
Why this structure? The translator isolates vendor-specific serialization logic. flatten_user_turn handles the OpenAI requirement that tool results must appear as discrete role: "tool" messages, not inline content blocks. This prevents schema validation failures during multi-turn tool use.
Step 4: Route and Resolve
The router maintains a registry of active gateways and resolves requests based on vendor identifiers. It applies capability checks before dispatching.
pub struct GatewayRouter {
endpoints: std::collections::HashMap<String, Arc<dyn ModelGateway>>,
primary_vendor: String,
}
impl GatewayRouter {
pub fn resolve(&self, vendor_key: &str) -> Option<Arc<dyn ModelGateway>> {
self.endpoints.get(vendor_key).cloned()
}
pub fn register(&mut self, gateway: Arc<dyn ModelGateway>) {
self.endpoints.insert(gateway.vendor_id().to_string(), gateway);
}
}
Why this structure? Runtime registration enables dynamic provider loading without recompilation. The router acts as a single entry point, allowing middleware (logging, rate limiting, fallback routing) to be applied uniformly across all AI interactions.
Pitfall Guide
1. Role Enumeration Mismatch
Explanation: Google Gemini uses model instead of assistant for AI-generated turns. Directly passing assistant triggers a 400 validation error.
Fix: Implement a role normalization map in the translator. Convert Participant::Assistant to model when routing to Google, and preserve assistant for OpenAI/Anthropic. Never hardcode role strings in business logic.
Explanation: Anthropic generates IDs like toolu_01Bx... which may contain characters rejected by OpenAI's strict schema. Additionally, OpenAI expects tool arguments as JSON strings, while Anthropic and Google use native objects.
Fix: Sanitize IDs by stripping non-alphanumeric prefixes and enforcing length limits. Serialize arguments to strings for OpenAI, and parse them back to objects when normalizing responses. Validate IDs against each vendor's regex before dispatch.
3. Authentication & Endpoint Divergence
Explanation: OpenAI and Anthropic use Authorization: Bearer <key> headers. Google Gemini requires the API key as a URL query parameter (?key=). Assuming uniform auth breaks request construction.
Fix: Abstract authentication into a CredentialProvider trait. Each gateway implements its own signing method. Never embed credentials in the normalized request structure; resolve them at the transport layer.
4. Capability Blind Spots (Reasoning Budgets)
Explanation: Sending a thinking configuration to a model that doesn't support it causes immediate rejection. OpenAI's o-series uses reasoning_effort, Anthropic uses budget_tokens, and Google uses thinkingBudget.
Fix: Query supported_features() before constructing the payload. If reasoning_supported is false, strip the field entirely. Map normalized budgets to vendor-specific keys only after capability verification.
5. Streaming State & Backpressure Leaks
Explanation: Streaming endpoints differ in transport semantics. OpenAI uses SSE over standard HTTP, Google uses a dedicated streaming path, and Anthropic uses chunked transfer encoding. Failing to handle backpressure causes memory accumulation.
Fix: Use async streams with bounded channels. Implement StreamExt::ready_chunks() or equivalent to batch processing. Always attach timeout guards and cancellation tokens to streaming tasks.
6. Error Mapping & Vendor-Specific Codes
Explanation: Rate limits, context window overflows, and invalid tool schemas return different HTTP status codes and JSON structures across providers. Treating all errors as generic failures obscures root causes.
Fix: Build an error normalization enum that maps vendor codes to unified types (RateLimitExceeded, ContextOverflow, SchemaValidationFailed). Log raw vendor payloads for debugging, but expose only normalized errors to upper layers.
7. Prompt Caching & Context Window Truncation
Explanation: Providers handle context limits differently. Some truncate silently, others return explicit errors. Prompt caching tokens are billed separately and require explicit markers.
Fix: Implement a context manager that tracks token usage against vendor-specific limits. Strip or compress older turns before dispatch. Use provider-specific cache control headers only when prompt_caching_supported is true.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single-model prototype | Direct SDK integration | Minimal boilerplate, fastest time-to-value | Low initial, high scaling cost |
| Multi-model fallback routing | Unified Gateway with capability router | Enables runtime switching without code changes | Medium initial, near-zero marginal cost |
| High-throughput streaming | Gateway + bounded async channels | Prevents memory leaks and backpressure failures | Higher infrastructure, lower failure rate |
| Strict compliance/audit | Gateway + normalized error logging | Centralizes telemetry and vendor-agnostic metrics | Moderate logging overhead, high audit readiness |
| Cost-optimized routing | Gateway + pricing-aware dispatcher | Routes to cheapest capable model dynamically | Requires pricing feed, reduces token spend |
Configuration Template
// gateway_config.rs
use std::collections::HashMap;
use std::sync::Arc;
pub struct GatewayConfig {
pub primary_vendor: String,
pub fallback_chain: Vec<String>,
pub timeout_ms: u64,
pub max_retries: u32,
pub credential_store: HashMap<String, String>,
}
impl GatewayConfig {
pub fn default() -> Self {
Self {
primary_vendor: "openai".to_string(),
fallback_chain: vec!["anthropic".into(), "google".into()],
timeout_ms: 30_000,
max_retries: 2,
credential_store: HashMap::new(),
}
}
pub fn with_credentials(mut self, vendor: &str, key: &str) -> Self {
self.credential_store.insert(vendor.to_string(), key.to_string());
self
}
}
// Usage in router initialization
pub fn bootstrap_router(config: &GatewayConfig) -> GatewayRouter {
let mut router = GatewayRouter::new(&config.primary_vendor);
for (vendor, key) in &config.credential_store {
let gateway = match vendor.as_str() {
"openai" => Arc::new(OpenAIGateway::new(key, config.timeout_ms)),
"anthropic" => Arc::new(AnthropicGateway::new(key, config.timeout_ms)),
"google" => Arc::new(GoogleGateway::new(key, config.timeout_ms)),
_ => continue,
};
router.register(gateway);
}
router
}
Quick Start Guide
- Define the contract: Implement the
ModelGateway trait with async completion, streaming, and capability declaration methods.
- Normalize data structures: Create
UnifiedInferenceRequest and UnifiedInferenceResponse with vendor override fields and reasoning budget abstractions.
- Build translators: Implement format converters for each target vendor, handling role mapping, tool schema wrapping, and argument serialization.
- Initialize the router: Register gateways using a configuration struct, apply timeout/retry middleware, and wire capability checks before dispatch.
- Validate with integration tests: Send multi-turn tool-use conversations across all registered vendors, verify role normalization, ID sanitization, and error mapping before production deployment.