Day 91: Why I stopped using GenAI for budgeting math
Current Situation Analysis
The rapid adoption of generative AI in consumer-facing applications has created a dangerous architectural blind spot: treating large language models as universal computation engines. Developers routinely route numerical queries, financial projections, and constraint-based calculations directly to models hosted on platforms like Amazon Bedrock, assuming that conversational fluency translates to mathematical reliability. This assumption is fundamentally flawed.
LLMs operate on probabilistic token prediction, not deterministic arithmetic. When a model generates a response, it calculates the likelihood of the next word based on training distributions, not by executing mathematical operations. In casual conversation, this distinction is invisible. In financial applications, it is catastrophic. A model might confidently state that a user can safely spend $42.80 today based on their transaction history, while the actual calculation yields $18.50. The output reads naturally, the confidence score is high, and the error goes undetected until the user overspends.
This problem is frequently overlooked because modern AI frameworks abstract away the underlying mechanics. SDKs and orchestration layers encourage developers to pass raw user prompts directly to the model, optimizing for latency and conversational continuity rather than computational integrity. The industry has normalized treating LLMs as reasoning engines, yet benchmark studies consistently show that base models struggle with multi-step arithmetic, unit conversion, and constraint satisfaction without explicit tool-use or code-execution layers.
The financial domain amplifies this risk. Budgeting tools, spending limits, and savings projections require exact precision. A single hallucinated figure can cascade into incorrect recommendations, erode user trust, and expose platforms to compliance liabilities. The solution is not to abandon AI, but to architect around its limitations by decoupling language generation from numerical computation.
WOW Moment: Key Findings
Routing calculation-heavy queries through deterministic code instead of probabilistic models yields measurable improvements across accuracy, latency, and operational cost. The following comparison illustrates the architectural trade-off between a pure LLM routing strategy and a deterministic interceptor pattern.
| Approach | Calculation Accuracy | Response Latency | Compute Cost per Query | Risk Profile |
|---|---|---|---|---|
| Pure LLM Routing | 62–74% on multi-step finance math | 1.8–3.2s | $0.004–$0.008 | High (hallucination-prone) |
| Deterministic Interceptor + LLM Formatting | 100% (validated arithmetic) | 45–90ms | $0.0001 (Lambda invocation) | Low (bounded execution) |
This finding matters because it proves that AI agents can maintain conversational UX while guaranteeing numerical integrity. By intercepting calculation intents before they reach the model, developers eliminate arithmetic hallucinations entirely. The LLM is relegated to its actual strength: natural language formatting, tone adaptation, and contextual explanation. This separation enables safe deployment in high-stakes domains without sacrificing performance or inflating cloud spend.
Core Solution
Building a reliable financial AI agent requires a three-layer architecture: intent detection, deterministic execution, and language formatting. Each layer serves a distinct purpose and must be isolated to prevent cross-contamination of concerns.
Step 1: Intent Detection Layer
Instead of forwarding every user message to the LLM, implement a lightweight router that classifies incoming queries. Calculation-heavy intents (e.g., spending limits, savings projections, budget breakdowns) are flagged for deterministic processing. Conversational intents (e.g., financial advice, habit tracking, motivational prompts) are routed to the model.
Pattern matching, keyword extraction, or a lightweight classifier can serve as the detection mechanism. For production systems, combining regex with a small intent classifier reduces false positives while maintaining sub-millisecond routing latency.
Step 2: Deterministic Execution Layer
Once a calculation intent is identified, the system executes the math using standard programming logic. This layer must include:
- Explicit input validation
- Bounded arithmetic operations
- Safety thresholds (e.g., hard stops on missing or zero income)
- Structured output formatting for downstream consumption
The calculation logic should be stateless, idempotent, and fully testable. Financial formulas must be version-controlled and auditable, unlike prompt-based calculations which are opaque and non-deterministic.
Step 3: Language Formatting Layer
The verified numerical result is passed to the LLM solely for tone adaptation and natural language generation. The model receives a structured payload containing the exact figures, constraints, and persona instructions. It never performs arithmetic; it only wraps the deterministic output in conversational language.
Implementation Example (TypeScript)
import { APIGatewayProxyHandler } from 'aws-lambda';
import { z } from 'zod';
// Intent classification schema
const UserQuerySchema = z.object({
userId: z.string().uuid(),
message: z.string().min(1),
timestamp: z.number()
});
// Deterministic calculator
function computeDailyAllowance(monthlyIncome: number, daysInMonth: number, savingsGoal: number): number {
if (monthlyIncome <= 0) {
throw new Error('INSUFFICIENT_INCOME_DATA');
}
const dailyIncome = monthlyIncome / daysInMonth;
const dailySavings = savingsGoal / daysInMonth;
return Math.max(0, Number((dailyIncome - dailySavings).toFixed(2)));
}
// LLM formatter (receives verified numbers only)
async function formatFinancialResponse(allowance: number, persona: string): Promise<string> {
// In production, this calls Bedrock with a strict system prompt
// that forbids arithmetic and enforces persona guidelines
return `[LLM_FORMATTED_RESPONSE] Based on your budget, you can safely spend $${allowance} today. ${persona === 'strict' ? 'Do not exceed this limit.' : 'Stay disciplined and track your progress.'}`;
}
// Main handler
export const handler: APIGatewayProxyHandler = async (event) => {
try {
const parsed = UserQuerySchema.parse(JSON.parse(event.body || '{}'));
// Intent detection: regex + keyword matching
const calculationKeywords = /daily\s*limit|spending\s*allowance|budget\s*today|how\s*much\s*can\s*i\s*spend/i;
const isCalculationIntent = calculationKeywords.test(parsed.message);
if (isCalculationIntent) {
// Fetch user financial data from secure storage
const userProfile = await fetchUserProfile(parsed.userId);
// Deterministic execution
const dailyAllowance = computeDailyAllowance(
userProfile.monthlyIncome,
new Date().getDaysInMonth(),
userProfile.monthlySavingsGoal
);
// LLM formatting only
const response = await formatFinancialResponse(dailyAllowance, userProfile.persona);
return {
statusCode: 200,
body: JSON.stringify({ type: 'calculation', value: dailyAllowance, formatted: response })
};
}
// Fallback to standard LLM routing for conversational queries
const conversationalResponse = await invokeBedrockModel(parsed.message, userProfile.persona);
return {
statusCode: 200,
body: JSON.stringify({ type: 'conversational', response: conversationalResponse })
};
} catch (error) {
if (error instanceof Error && error.message === 'INSUFFICIENT_INCOME_DATA') {
return { statusCode: 422, body: JSON.stringify({ error: 'Cannot calculate limit without recorded income.' }) };
}
return { statusCode: 500, body: JSON.stringify({ error: 'Internal processing failure.' }) };
}
};
Architecture Decisions
- Separation of Concerns: Calculation logic is isolated from language generation. This prevents prompt injection from altering mathematical outcomes and enables independent scaling of compute resources.
- Hard Safety Thresholds: The system refuses to compute limits when income data is missing or zero. This prevents division-by-zero errors and stops the model from generating speculative numbers.
- Static vs. Authenticated Routing: Public-facing endpoints (
/privacy,/terms) required by Google OAuth are served via AWS CloudFront as cached static assets. Authenticated traffic bypasses the static layer and routes directly to the React SPA dashboard. This reduces Lambda invocations, lowers costs, and ensures OAuth compliance without exposing internal application logic. - Strict LLM System Prompts: The formatting layer uses a constrained prompt that explicitly forbids arithmetic, enforces persona guidelines, and limits output structure. This minimizes token usage and eliminates drift.
Pitfall Guide
1. Trusting LLM Confidence Scores
Explanation: Models output high confidence scores even when calculations are incorrect. Confidence reflects token probability, not mathematical accuracy. Fix: Never use confidence metrics as a validation signal for numerical outputs. Implement deterministic verification before accepting any AI-generated figure.
2. Unbounded Formula Inputs
Explanation: Financial calculations fail silently when inputs contain unexpected values (e.g., negative income, fractional days, currency mismatches). Fix: Use schema validation (Zod, Joi, or TypeScript interfaces) to enforce type safety, range limits, and currency normalization before arithmetic execution.
3. Over-Engineering Intent Detection
Explanation: Complex regex chains or heavy ML classifiers add latency and maintenance overhead for simple routing decisions. Fix: Start with keyword matching and fallback patterns. Upgrade to lightweight classifiers only when false positive rates exceed 5%.
4. Mixing Calculation and Formatting in Prompts
Explanation: Asking the LLM to "calculate and explain" in a single prompt forces the model to guess arithmetic while generating text, increasing hallucination risk. Fix: Always separate computation from generation. Pass verified numbers to the model with explicit instructions to format, not calculate.
5. Ignoring Fallback Observability
Explanation: Deterministic interceptors often lack logging, making it impossible to track how often users trigger calculation routes versus conversational routes. Fix: Emit structured metrics (CloudWatch, Datadog) for intent classification outcomes, calculation success rates, and formatting latency. Use these metrics to refine routing thresholds.
6. Cold Start Latency Mismanagement
Explanation: Lambda functions handling financial calculations may experience cold starts that degrade UX if not provisioned correctly. Fix: Enable Provisioned Concurrency for calculation-heavy endpoints. Keep the interceptor lightweight to minimize initialization overhead.
7. OAuth Endpoint Exposure Risks
Explanation: Public /privacy and /terms pages required by GCP OAuth can accidentally expose internal routing logic or API keys if not properly isolated.
Fix: Host compliance pages on a separate static distribution. Use CloudFront cache behaviors to route unauthenticated traffic exclusively to static assets, keeping API Gateway and Lambda functions behind authentication layers.
Production Bundle
Action Checklist
- Implement intent detection layer with keyword matching and schema validation
- Isolate financial calculations in deterministic, unit-tested functions
- Add hard safety thresholds for missing or invalid income data
- Configure LLM system prompts to forbid arithmetic and enforce formatting constraints
- Deploy static compliance pages via CloudFront with strict cache policies
- Route authenticated traffic to SPA dashboard through API Gateway with JWT validation
- Emit structured metrics for intent classification, calculation success, and formatting latency
- Conduct adversarial testing with malformed inputs and edge-case financial scenarios
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Casual financial conversation | Pure LLM Routing | No numerical precision required; conversational fluency prioritized | Moderate ($0.004/query) |
| Daily spending limit calculation | Deterministic Interceptor + LLM Formatting | Requires exact arithmetic; hallucination risk unacceptable | Low ($0.0001/query + minimal tokens) |
| Monthly budget projection | Deterministic Execution Only | Multi-step math exceeds LLM reliability; formatting unnecessary | Minimal (Lambda compute only) |
| Real-time spending alerts | Deterministic Threshold Engine | Sub-second latency required; AI adds unnecessary overhead | Low (event-driven compute) |
| Compliance reporting | Deterministic Data Pipeline | Auditability and reproducibility mandatory; AI introduces variance | Moderate (batch processing) |
Configuration Template
// cloudfront-config.ts
export const distributionConfig = {
origins: [
{
originId: 'static-assets',
domainName: 's3-website-bucket.s3.amazonaws.com',
customOriginConfig: {
httpPort: 80,
httpsPort: 443,
originProtocolPolicy: 'https-only'
}
},
{
originId: 'api-gateway',
domainName: 'api.yourdomain.com',
customOriginConfig: {
httpPort: 80,
httpsPort: 443,
originProtocolPolicy: 'https-only'
}
}
],
defaultCacheBehavior: {
targetOriginId: 'static-assets',
viewerProtocolPolicy: 'redirect-to-https',
allowedMethods: ['GET', 'HEAD'],
cachedMethods: ['GET', 'HEAD'],
forwardedValues: {
queryString: false,
cookies: { forward: 'none' }
},
minTtl: 86400,
maxTtl: 604800
},
cacheBehaviors: [
{
pathPattern: '/api/*',
targetOriginId: 'api-gateway',
viewerProtocolPolicy: 'redirect-to-https',
allowedMethods: ['GET', 'HEAD', 'OPTIONS', 'PUT', 'POST', 'PATCH', 'DELETE'],
cachedMethods: ['GET', 'HEAD'],
forwardValues: {
queryString: true,
cookies: { forward: 'all' }
},
minTtl: 0,
maxTtl: 0
},
{
pathPattern: '/privacy',
targetOriginId: 'static-assets',
viewerProtocolPolicy: 'redirect-to-https',
minTtl: 2592000
},
{
pathPattern: '/terms',
targetOriginId: 'static-assets',
viewerProtocolPolicy: 'redirect-to-https',
minTtl: 2592000
}
]
};
Quick Start Guide
- Deploy Static Compliance Pages: Upload
/privacyand/termsHTML files to an S3 bucket. Configure CloudFront to serve them with long cache TTLs and public read access. - Initialize Intent Router: Create a TypeScript Lambda function with keyword-based intent detection. Implement schema validation for incoming queries and financial data inputs.
- Build Deterministic Calculator: Write isolated functions for daily limits, savings projections, and budget breakdowns. Add input validation, safety thresholds, and unit tests covering edge cases.
- Configure LLM Formatting Layer: Set up a Bedrock invocation with a strict system prompt that forbids arithmetic, enforces persona guidelines, and accepts pre-calculated values as input.
- Route Authenticated Traffic: Configure API Gateway to validate JWT tokens and forward authenticated requests to your React SPA. Ensure unauthenticated users only access CloudFront-served static assets.
Mid-Year Sale — Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register — Start Free Trial7-day free trial · Cancel anytime · 30-day money-back
