I Built a Production AI Layer Inside a Legacy ASP.NET Core App β and It Broke in Ways Tutorials Never Mention
Architecting Resilient AI Integrations in Established .NET Backends
Current Situation Analysis
The modern developer ecosystem heavily favors greenfield AI integration. Tutorials, documentation, and conference talks consistently demonstrate how to wire an LLM endpoint into a fresh repository, a clean dependency graph, and an isolated microservice. This creates a dangerous illusion: that AI integration is primarily an API consumption problem. In reality, introducing generative models into a mature, traffic-bearing backend is an architectural discipline problem.
The core pain point is boundary enforcement. When a large language model enters an existing system, it brings non-deterministic latency, variable cost structures, and unpredictable failure modes. Legacy applications were designed around deterministic contracts, fixed execution paths, and predictable resource consumption. Forcing an LLM into that mold without structural adaptation causes three predictable failures:
- Domain contamination: Business logic becomes tightly coupled to provider-specific SDKs, making model swaps or fallbacks impossible without rewriting core features.
- Cost leakage: Unbounded async chains, missing cancellation propagation, and absent caching strategies cause token consumption to scale linearly with user traffic, often without visibility.
- Deployment fragility: Configuration deserialization mismatches, missing environment variables, and late-stage validation produce cascade failures that only surface under production load.
This problem is consistently overlooked because tutorial architectures treat AI as a feature flag rather than a subsystem. Real production systems require explicit seams, deterministic error envelopes, and startup-time validation. Data from mature deployments shows that without these boundaries, AI endpoints become the most expensive and least testable components in the stack. Conversely, when properly isolated, development spend on gpt-4o-mini can remain under $15 across two months of active iteration, while providing measurable cache hit rates, token accounting, and graceful degradation paths. The difference between experimental and production-ready AI is not the model choice; it is the architectural contract surrounding it.
WOW Moment: Key Findings
The transition from tutorial-style AI wiring to production-grade integration yields measurable improvements across testability, cost control, and deployment resilience. The following comparison isolates the structural differences that determine whether an AI layer survives production traffic or collapses under it.
| Approach | Testability | Cost Visibility | Failure Isolation | Deployment Resilience |
|---|---|---|---|---|
| Tutorial/Greenfield | Low (requires real API calls) | None (raw SDK responses) | Poor (try/catch scattered) | Fragile (late config validation) |
| Production-Ready | High (domain layer mocks provider) | Full (token/cost tracking per feature) | Strong (unified result envelope) | Robust (startup validation + feature flags) |
This finding matters because it shifts AI from a fragile experimental endpoint to a maintainable, observable subsystem. When the provider layer is strictly separated from domain logic, you can swap gpt-4o-mini for an on-premise model, adjust caching strategies, or implement fallback routing without touching business rules. The unified result envelope eliminates scattered error handling, while startup validation prevents silent configuration deserialization from corrupting runtime state. In production, this architecture transforms AI from a cost center into a controlled, auditable capability.
Core Solution
Building a resilient AI layer in an established ASP.NET Core backend requires deliberate structural decisions. The implementation below follows a five-step architecture that enforces boundaries, guarantees observability, and prevents cost leakage.
Step 1: Enforce the Provider/Domain Boundary
The most critical structural decision is separating provider mechanics from domain intent. The provider layer understands HTTP, authentication, and SDK-specific types. The domain layer understands business rules, content styles, and prompt construction. These concerns must never share a class.
public interface IModelGateway
{
Task<ModelExecutionResult<string>> ExecuteAsync(
string systemInstruction,
string userInput,
CancellationToken ct);
}
public interface IProductCopyEngine
{
Task<ModelExecutionResult<CopyDraft>> GenerateAsync(
ProductContext context,
ContentStyle style,
CancellationToken ct);
}
IModelGateway accepts raw strings and returns a raw string. It knows nothing about products, pricing, or marketing. IProductCopyEngine constructs prompts, applies business constraints, and maps the raw output into a domain object. This separation enables unit testing the domain layer with a mock gateway, swapping providers without touching business logic, and isolating SDK upgrades to a single implementation.
Step 2: Implement the Unified Result Envelope
Raw SDK responses force every caller to handle network failures, rate limits, and deserialization errors. A unified envelope centralizes error handling, caching metadata, and token accounting.
public sealed class ModelExecutionResult<T>
{
public bool IsSuccessful { get; init; }
public T? Payload { get; init; }
public string? DiagnosticMessage { get; init; }
public bool WasServedFromCache { get; init; }
public int ConsumedTokens { get; init; }
public static ModelExecutionResult<T> Success(T data, int tokens = 0, bool cached = false) =>
new() { IsSuccessful = true, Payload = data, ConsumedTokens = tokens, WasServedFromCache = cached };
public static ModelExecutionResult<T> Failure(string reason) =>
new() { IsSuccessful = false, DiagnosticMessage = reason };
}
Static factory methods guarantee invalid states cannot be constructed. Callers above the provider layer never write try/catch. They inspect IsSuccessful and branch accordingly. The same envelope scales from text generation to structured JSON extraction, chat history management, or vector search results.
Step 3: Centralize Prompt Contracts
Prompt engineering belongs in a single
, auditable location. Scattering instructions across controllers or inline HTTP calls creates maintenance debt and security risks. System instructions and user input must be constructed separately to prevent prompt injection and enable deterministic testing.
internal static class InstructionTemplateFactory
{
internal static string Resolve(ContentStyle style) => style switch
{
ContentStyle.Authoritative =>
"Act as a senior technical copywriter. Produce precise, fact-bound descriptions. " +
"Do not invent specifications, awards, or third-party claims.",
ContentStyle.Conversational =>
"Adopt a helpful, approachable tone. Focus on user benefits and clarity.",
_ => throw new ArgumentOutOfRangeException(nameof(style), style, null)
};
}
Temperature, system instructions, and token limits serve three distinct purposes. Temperature controls output variance. System instructions enforce behavioral constraints. Token limits enforce budget boundaries. Confusing these controls leads to unpredictable outputs and unbounded costs.
Step 4: Wire Cancellation and Caching
Asynchronous AI calls must propagate CancellationToken through every layer. A token accepted at the controller but not forwarded to the SDK call creates a silent cost leak. The client disconnects, but the backend continues consuming tokens and billing the account.
Caching operates at the provider layer using deterministic key generation. Identical instruction and input pairs return cached results, reducing latency and cost.
private static string GenerateCacheKey(string instruction, string input) =>
$"llm:cache:{instruction.GetHashCode()}:{input.GetHashCode()}";
The WasServedFromCache flag travels through the result envelope to the presentation layer, enabling UI indicators that verify caching behavior without external telemetry. Caching should be controlled via feature flags, not hardcoded booleans, allowing runtime toggling without redeployment.
Step 5: Register and Validate at Startup
All AI infrastructure belongs at the application root, not scoped to UI areas or feature folders. Cross-cutting capabilities must be registered through a single extension method to prevent duplicate service instances and lifetime mismatches.
public static class AiInfrastructureExtensions
{
public static IServiceCollection AddModelServices(this IServiceCollection services, IConfiguration config)
{
var settings = config.GetSection("ModelProvider").Get<ModelSettings>()
?? throw new InvalidOperationException("Model configuration is missing or malformed.");
services.AddSingleton(settings);
services.AddScoped<IModelGateway, AzureOpenAiGateway>();
services.AddScoped<IProductCopyEngine, ProductCopyEngine>();
return services;
}
}
Configuration validation must occur at startup. A missing environment variable or malformed JSON should fail fast with a clear diagnostic, not produce a zeroed-out settings object that crashes three layers downstream during request execution.
Pitfall Guide
1. Blurring Provider and Domain Logic
Explanation: Combining SDK calls, prompt construction, and business mapping into a single class creates tight coupling. Swapping providers or adjusting business rules requires rewriting the entire service. Fix: Enforce a strict seam. The provider handles transport and SDK types. The domain handles prompt assembly, validation, and result mapping. Test the domain with a mock provider.
2. Silent Configuration Deserialization
Explanation: ASP.NET Core's configuration binder silently ignores missing keys or type mismatches, producing default values. This leads to zeroed-out token limits, null API keys, or malformed endpoint URLs that only surface under load.
Fix: Validate configuration immediately after binding. Use ?? throw or explicit null checks. Log startup diagnostics. Fail fast before the first request arrives.
3. Unbounded Async Chains
Explanation: Accepting CancellationToken at the controller but not forwarding it to the SDK call creates cost leakage. Disconnected clients still trigger full API executions.
Fix: Propagate the token through every await. Verify the chain: Controller β Domain Service β Provider β SDK. Use using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, sdkCts); when wrapping external calls.
4. Inheritance-Based Result Models
Explanation: Deriving domain objects from infrastructure wrappers drags metadata like Success, ErrorMessage, and TokensUsed into persistence layers, DTOs, and service boundaries. These concerns belong to the execution envelope, not the domain payload.
Fix: Use composition. The domain object contains only business data. The wrapper contains execution metadata. Return ModelExecutionResult<DomainObject>, not DomainObject : ModelExecutionResult.
5. Treating Temperature as a Safety Mechanism
Explanation: Developers often lower temperature to prevent hallucinations or enforce constraints. Temperature only reduces output variance; it does not enforce factual accuracy or behavioral rules. Fix: Use system instructions for constraints, temperature for style variance, and token limits for budget control. Audit prompts for explicit negative constraints rather than relying on sampling parameters.
6. Scoping Cross-Cutting AI Services to UI Areas
Explanation: Placing AI services inside MVC Areas or feature folders implies UI-driven boundaries. AI capabilities span search, recommendations, support, and content generation. UI grouping creates artificial coupling and complicates future feature expansion. Fix: Register AI infrastructure at the application root. Use dependency injection to expose capabilities to any layer. Keep UI routing separate from service architecture.
7. Late-Stage Config Validation
Explanation: Validating API keys, endpoint URLs, or model deployments only during the first request masks configuration errors until production traffic hits. This causes cascade failures and poor developer experience.
Fix: Validate during Program.cs execution. Check endpoint reachability, key format, and model availability. Fail startup with actionable diagnostics. Use health checks to verify runtime connectivity.
Production Bundle
Action Checklist
- Define provider and domain interfaces with strict responsibility boundaries
- Implement a unified result envelope with static factory methods
- Centralize system instructions in a single, auditable resolver
- Propagate CancellationToken through every async layer
- Wire hash-based caching at the provider layer with feature flag control
- Register all AI services through a single extension method at application root
- Validate configuration and endpoint connectivity during startup
- Instrument token consumption, cost tracking, and cache hit rates per feature
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Early prototyping | gpt-4o-mini with aggressive caching | Fast iteration, predictable costs, sufficient quality for tone/style tasks | ~$0.15 per 1k tokens; caching reduces effective cost by 60-80% |
| High-stakes factual generation | gpt-4o with strict system constraints | Higher reasoning accuracy, better instruction following, reduced hallucination | ~$2.50 per 1k tokens; requires tighter token budgeting |
| Internal tooling / low traffic | On-premise or open-weight model via local gateway | Zero API cost, full data control, predictable latency | Infrastructure overhead; requires GPU/CPU provisioning |
| Multi-turn conversational features | Persistent ChatHistory with session-scoped storage | Maintains context, reduces redundant prompt injection, improves UX | Context window costs scale with history length; implement truncation policies |
Configuration Template
{
"ModelProvider": {
"Endpoint": "https://your-resource.openai.azure.com/",
"DeploymentName": "gpt-4o-mini",
"ApiKey": "",
"MaxTokens": 1024,
"Temperature": 0.7,
"EnableCaching": true,
"CacheExpirationMinutes": 60,
"CostTrackingEnabled": true
}
}
Note: ApiKey must be supplied via User Secrets locally and Application Settings in production. Never commit secrets to source control or configuration files.
Quick Start Guide
- Install dependencies: Add
Azure.AI.OpenAIandMicrosoft.Extensions.Caching.Memoryto your project. - Create the gateway: Implement
IModelGatewayusingOpenAIClientwithChatCompletionsAPI. WireCancellationTokenand cache lookup. - Register services: Call
builder.Services.AddModelServices(builder.Configuration)inProgram.cs. Verify startup validation passes. - Build the domain engine: Implement
IProductCopyEngineto assemble prompts, call the gateway, and map results to domain objects. - Wire the endpoint: Create a controller action that accepts content parameters, passes
HttpContext.RequestAbortedas the cancellation token, and returnsModelExecutionResult<T>to the client.
This architecture transforms AI from an experimental API consumer into a production-grade subsystem. By enforcing boundaries, centralizing contracts, and validating early, you gain testability, cost visibility, and deployment resilience. The model choice matters less than the structural discipline surrounding it.
