Architecting Provider-Agnostic LLM Clients: A Blueprint for Multi-Model Systems

Current Situation Analysis

The modern AI stack is no longer a monolith. Engineering teams routinely juggle multiple foundation models to balance inference costs, latency thresholds, regional compliance, and capability requirements. What begins as a straightforward integration with a single SDK quickly fractures when production realities dictate model rotation, fallback routing, or cost-based arbitration.

The core pain point is SDK coupling. Most developers initialize provider clients directly inside business logic, binding request formatting, authentication, and response parsing to a specific vendor's library. When a second model is introduced, the codebase duplicates inference paths. When a third arrives, the system becomes a maintenance liability. Every API version bump, authentication change, or streaming protocol update must be manually propagated across multiple code branches.

This problem is routinely overlooked during initial development because SDK convenience masks architectural debt. Teams prioritize shipping features over abstraction, assuming a single provider will suffice. The reality is that model economics shift rapidly. A vendor that dominates early development often becomes prohibitively expensive at scale, or introduces rate limits that break production SLAs.

Data from production deployments consistently shows that hardcoded inference layers require refactoring 30–40% of the model interaction code when switching providers. Maintenance overhead scales linearly with each additional vendor, while test coverage drops due to conditional branching. Teams that delay abstraction typically face 2–3 weeks of regression testing and hotfix cycles when forced to migrate mid-project.

WOW Moment: Key Findings

The architectural shift from vendor-specific calls to a unified strategy pattern yields measurable operational improvements. The following comparison isolates the impact of three common implementation approaches across production-critical metrics.

Approach	Code Duplication	Switching Cost	Test Coverage	Maintenance Overhead
Hardcoded SDK	High (per-provider)	Days to weeks	Fragmented	Linear scaling
Conditional Routing	Medium (shared wrapper)	Hours to days	Partial	Quadratic branching
Abstracted Strategy	Low (single interface)	Minutes (config)	Unified	Constant baseline

This finding matters because it decouples business logic from vendor implementation. The abstracted strategy pattern reduces switching cost to a configuration change, enables parallel testing across providers, and establishes a single point for cross-cutting concerns like logging, retry policies, and token accounting. It transforms model selection from a code deployment into an operational decision.

Core Solution

Building a provider-agnostic client requires separating the contract from the implementation. The strategy pattern, combined with dependency injection, provides a clean boundary between application logic and vendor SDKs. Below is a production-ready TypeScript implementation that demonstrates the architecture.

Step 1: Define the Provider Contract

The interface must capture the minimum viable operations shared across vendors. Chat completion and streaming are the universal baselines. Optional parameters are handled through a flexible configuration object rather than rigid method signatures.

export interface LLMRequestConfig {
  temperature?: number;
  maxTokens?: number;
  stopSequences?: string[];
  [key: string]: unknown;
}

export interface LLMResponse {
  content: string;
  model: string;
  usage?: { promptTokens: number; completionTokens: number };
}

export interface LLMProvider {
  readonly name: string;
  generateCompletion(
    messages: Array<{ role: string; content: string }>,
    config?: LLMRequestConfig
  ): Promise<LLMResponse>;

  streamCompletion(
    messages: Array<{ role: string; content: string }>,
    config?: LLMRequestConfig
  ): AsyncIterable<string>;
}

Step 2: Implement Concrete Adapters

Each vendor receives a dedicated adapter that translates the unified contract into provider-specific calls. This isolates SDK dependencies and authentication logic.

import OpenAI from 'openai';

export class OpenAIAdapter implements LLMProvider {
  readonly name = 'openai';
  private readonly client: OpenAI;
  private readonly defaultModel: string;

  constructor(apiKey: string, model: string = 'gpt-3.5-turbo') {
    this.client = new OpenAI({ apiKey });
    this.defaultModel = model;
  }

  async generateCompletion(
    messages: Array<{ role: string; content: string }>,
    config?: LLMRequestConfig
  ): Promise<LLMResponse> {
    const response = await this.client.chat.completions.create({
      model: this.defaultModel,
      messages,
      ...config,
    });

    return {
      content: response.choices[0].message.content ?? '',
      model: response.model,
      usage: response.usage
        ? {
            promptTokens: response.usage.prompt_tokens,
            completionTokens: response.usage.completion_tokens,
          }
        : undefined,
    };
  }

  async *streamCompletion(
    messages: Array<{ role: string; content: string }>,
    config?: LLMRequestConfig
  ): AsyncIterable<string> {
    const stream = await this.client.chat.completions.create({
      model: this.defaultModel,
      messages,
      stream: true,
      ...config,
    });

    for await (const chunk of stream) {
      const delta = chunk.choices[0]?.delta?.content;
      if (delta) yield delta;
    }
  }
}

For vendors without official SDKs, a lightweight HTTP adapter maintains the same contract:

export class InterwestAdapter implements LLMProvider {
  readonly name = 'interwest';
  private readonly baseUrl: string;
  private readonly apiKey: string;
  private readonly defaultModel: string;

  constructor(apiKey: string, model: string = 'default') {
    this.apiKey = apiKey;
    this.defaultModel = model;
    this.baseUrl = 'https://ai.interwestinfo.com/api/v1';
  }

  async generateCompletion(
    messages: Array<{ role: string; content: string }>,
    config?: LLMRequestConfig
  ): Promise<LLMResponse> {
    const response = await fetch(`${this.baseUrl}/chat`, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        Authorization: `Bearer ${this.apiKey}`,
      },
      body: JSON.stringify({
        model: this.defaultModel,
        messages,
        ...config,
      }),
    });

    if (!response.ok) {
      throw new Error(`Interwest API error: ${response.status}`);
    }

    const data = await response.json();
    return {
      content: data.choices[0].message.content,
      model: this.defaultModel,
    };
  }

  async *streamCompletion(
    messages: Array<{ role: string; content: string }>,
    config?: LLMRequestConfig
  ): AsyncIterable<string> {
    const response = await fetch(`${this.baseUrl}/chat`, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        Authorization: `Bearer ${this.apiKey}`,
        Accept: 'text/event-stream',
      },
      body: JSON.stringify({
        model: this.defaultModel,
        messages,
        stream: true,
        ...config,
      }),
    });

    if (!response.ok) {
      throw new Error(`Interwest streaming error: ${response.status}`);
    }

    const reader = response.body?.getReader();
    if (!reader) throw new Error('No readable stream available');

    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split('\n');
      buffer = lines.pop() ?? '';

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const payload = line.slice(6).trim();
          if (payload === '[DONE]') continue;
          try {
            const json = JSON.parse(payload);
            if (json.content) yield json.content;
          } catch {
            yield payload;
          }
        }
      }
    }
  }
}

Step 3: Build the Unified Client

The orchestrator receives a provider instance through dependency injection. Business logic interacts exclusively with this client, remaining completely unaware of underlying SDKs.

export class UnifiedLLMClient {
  constructor(private readonly provider: LLMProvider) {}

  async summarize(text: string, config?: LLMRequestConfig): Promise<LLMResponse> {
    const messages = [
      { role: 'system', content: 'You are a precise summarization engine.' },
      { role: 'user', content: `Condense the following text: ${text}` },
    ];
    return this.provider.generateCompletion(messages, config);
  }

  async *streamSummarize(
    text: string,
    config?: LLMRequestConfig
  ): AsyncIterable<string> {
    const messages = [
      { role: 'system', content: 'You are a precise summarization engine.' },
      { role: 'user', content: `Condense the following text: ${text}` },
    ];
    yield* this.provider.streamCompletion(messages, config);
  }

  getProviderName(): string {
    return this.provider.name;
  }
}

Architecture Rationale

Strategy Pattern over Conditional Routing: Injecting a concrete provider eliminates if/else branching. The compiler enforces contract compliance, and swapping providers requires zero logic changes.
Async Iterators for Streaming: AsyncIterable<string> standardizes token delivery across vendors. SSE, NDJSON, and chunked encoding are normalized at the adapter level.
Configuration Object over Rigid Parameters: LLMRequestConfig uses index signatures to pass provider-specific flags (e.g., response_format, top_p) without polluting the base interface.
Explicit Model Naming: The name property enables runtime routing, logging, and cost attribution without reflection or string parsing.

Pitfall Guide

1. The Leaky Abstraction Trap

Explanation: Forcing incompatible features (e.g., function calling, image generation, tool use) into a unified interface creates method signatures that only one provider implements. Downstream code must still check provider types, defeating the abstraction. Fix: Keep the base contract minimal. Create extended interfaces (ToolEnabledProvider, MultimodalProvider) for specialized capabilities. Route advanced workloads to dedicated clients rather than bloating the core abstraction.

2. Centralized Retry Logic

Explanation: Applying uniform retry policies across all providers ignores vendor-specific rate limits, error codes, and backoff requirements. OpenAI may return 429 with a retry-after header, while Interwest uses exponential backoff with different payload structures. Fix: Implement retry strategies inside each adapter. Use provider-specific error parsing and respect retry-after directives. The unified client should only handle network-level failures.

3. Streaming Protocol Mismatch

Explanation: Assuming all vendors deliver tokens identically causes silent data loss. Some use Server-Sent Events, others use newline-delimited JSON, and legacy endpoints may stream raw chunks without delimiters. Fix: Isolate stream parsing within adapters. Implement buffer management for partial reads, validate JSON boundaries, and yield only complete tokens. Add fallback handlers for malformed payloads.

4. Configuration Drift

Explanation: Hardcoding API keys, model names, and timeout values inside adapters makes environment switching impossible. Production deployments require runtime configuration without code changes. Fix: Externalize all vendor parameters into a configuration registry. Load keys from secure vaults, model aliases from environment variables, and timeout thresholds from feature flags. Validate configuration at startup.

5. Ignoring Token Accounting

Explanation: Treating all responses as identical strings obscures cost tracking and quota management. Different providers report usage metrics in varying formats, and some omit them entirely. Fix: Standardize token reporting in LLMResponse. Adapters must normalize vendor payloads into a consistent structure. Implement a middleware layer that logs usage per request for billing and alerting.

6. Over-Abstracting Non-Chat Modalities

Explanation: Applying the chat completion pattern to embedding, image generation, or audio models creates mismatched contracts. These endpoints require different request shapes, response types, and async patterns. Fix: Maintain separate abstraction layers per modality. A TextGenerationProvider, EmbeddingProvider, and MediaGenerationProvider should coexist without forcing a single interface.

7. Missing Fallback Chains

Explanation: Relying on a single provider instance creates single points of failure. When a vendor experiences outages or quota exhaustion, the application halts instead of degrading gracefully. Fix: Implement a fallback router that chains multiple providers. Attempt primary, catch specific errors, and route to secondary. Log fallback events for capacity planning.

Production Bundle

Action Checklist

Define a minimal base contract covering only shared operations (chat, stream)
Implement vendor adapters with isolated SDK dependencies and authentication
Inject providers via dependency injection rather than instantiating inside business logic
Normalize streaming responses using async iterators with buffer management
Externalize all configuration (keys, models, timeouts) into a centralized registry
Add provider-specific retry policies and rate limit handling inside adapters
Implement token usage normalization for cost tracking and quota enforcement
Establish fallback routing for high-availability requirements

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Single provider, stable workload	Direct SDK integration	Minimal abstraction overhead, fastest time-to-market	Lowest initial cost, highest lock-in risk
Cost-driven model rotation	Abstracted strategy + config routing	Enables runtime swapping without redeployment	Moderate dev cost, significant long-term savings
High-availability fallback	Fallback router with multiple adapters	Prevents outage propagation, maintains SLAs	Higher infrastructure cost, improved reliability
Specialized workloads (images, embeddings)	Modality-specific interfaces	Avoids leaky abstractions, preserves vendor features	Higher initial design cost, cleaner maintenance
Rapid prototyping / MVP	Conditional routing wrapper	Faster iteration, easier debugging during validation	Low initial cost, high refactoring debt later

Configuration Template

{
  "providers": {
    "openai": {
      "apiKeyEnv": "OPENAI_API_KEY",
      "defaultModel": "gpt-3.5-turbo",
      "timeoutMs": 15000,
      "retryPolicy": {
        "maxAttempts": 3,
        "backoffBaseMs": 1000,
        "retryableStatusCodes": [429, 500, 502, 503]
      }
    },
    "interwest": {
      "apiKeyEnv": "INTERWEST_API_KEY",
      "defaultModel": "default",
      "baseUrl": "https://ai.interwestinfo.com/api/v1",
      "timeoutMs": 20000,
      "retryPolicy": {
        "maxAttempts": 2,
        "backoffBaseMs": 2000,
        "retryableStatusCodes": [429, 500, 503]
      }
    }
  },
  "routing": {
    "defaultProvider": "openai",
    "fallbackChain": ["openai", "interwest"],
    "costThresholds": {
      "maxPromptCostPerToken": 0.0000015,
      "autoSwitchOnExceed": true
    }
  },
  "observability": {
    "logTokenUsage": true,
    "logProviderLatency": true,
    "metricsEndpoint": "/internal/metrics"
  }
}

Quick Start Guide

Install dependencies: Add openai, typescript, and your HTTP client of choice to your project. Initialize TypeScript with strict mode enabled.
Create the contract: Copy the LLMProvider interface and request/response types into a shared types module.
Build adapters: Implement OpenAIAdapter and InterwestAdapter following the provided structure. Validate each against vendor documentation.
Wire the client: Instantiate UnifiedLLMClient with your chosen provider. Inject configuration from environment variables or a secure vault.
Verify routing: Run a test suite that swaps providers at runtime. Confirm streaming, error handling, and token reporting behave consistently across implementations.

Stop hardcoding AI providers: a generic client approach