from the workspace boundary.
- List Price Baseline: Real-time enforcement uses published list rates. Negotiated enterprise discounts are applied during post-hoc accounting. This keeps the hot path fast and deterministic.
- Pre-Execution Interception: Budget checks occur before tool invocation. This prevents runaway loops and expensive sub-agent spawns from executing when caps are approached.
Implementation
The cost governance layer sits between the agent orchestrator and the tool execution engine. It maintains a TenantLedger that tracks cumulative spend per workspace and exposes a BudgetGate middleware.
import { readFileSync, writeFileSync, existsSync } from 'fs';
import { join } from 'path';
interface PricingTier {
inputPerK: number;
outputPerK: number;
toolCallBase: number;
}
interface TenantConfig {
workspaceId: string;
monthlyCap: number;
alertThreshold: number; // percentage
warnThreshold: number; // percentage
}
class TenantLedger {
private statePath: string;
private pricing: PricingTier;
constructor(workspaceDir: string, pricing: PricingTier) {
this.statePath = join(workspaceDir, '.cost_state.json');
this.pricing = pricing;
}
recordUsage(inputTokens: number, outputTokens: number, toolCalls: number): number {
const inputCost = (inputTokens / 1000) * this.pricing.inputPerK;
const outputCost = (outputTokens / 1000) * this.pricing.outputPerK;
const toolCost = toolCalls * this.pricing.toolCallBase;
const delta = inputCost + outputCost + toolCost;
const current = this.readState();
current.cumulativeSpend += delta;
this.writeState(current);
return current.cumulativeSpend;
}
getCumulativeSpend(): number {
return this.readState().cumulativeSpend;
}
private readState() {
if (!existsSync(this.statePath)) {
return { cumulativeSpend: 0, lastUpdated: new Date().toISOString() };
}
return JSON.parse(readFileSync(this.statePath, 'utf-8'));
}
private writeState(state: any) {
writeFileSync(this.statePath, JSON.stringify(state, null, 2));
}
}
class BudgetGate {
private tenants: Map<string, TenantConfig>;
private ledgers: Map<string, TenantLedger>;
constructor(configs: TenantConfig[], pricing: PricingTier) {
this.tenants = new Map(configs.map(c => [c.workspaceId, c]));
this.ledgers = new Map();
}
evaluate(workspaceId: string, toolName: string): { allowed: boolean; tier: 'alert' | 'warn' | 'hard_stop' | 'normal'; message: string } {
const config = this.tenants.get(workspaceId);
if (!config) return { allowed: true, tier: 'normal', message: 'No budget policy attached.' };
if (!this.ledgers.has(workspaceId)) {
this.ledgers.set(workspaceId, new TenantLedger(`/data/workspaces/${workspaceId}`, { inputPerK: 0.005, outputPerK: 0.015, toolCallBase: 0.002 }));
}
const ledger = this.ledgers.get(workspaceId)!;
const currentSpend = ledger.getCumulativeSpend();
const utilization = currentSpend / config.monthlyCap;
if (utilization >= 1.0) {
return { allowed: false, tier: 'hard_stop', message: `Budget exhausted. Cap: $${config.monthlyCap.toFixed(2)}. Current: $${currentSpend.toFixed(2)}` };
}
if (utilization >= config.warnThreshold / 100) {
return { allowed: true, tier: 'warn', message: `Approaching cap. Utilization: ${(utilization * 100).toFixed(1)}%.` };
}
if (utilization >= config.alertThreshold / 100) {
return { allowed: true, tier: 'alert', message: `Budget monitoring active. Utilization: ${(utilization * 100).toFixed(1)}%.` };
}
return { allowed: true, tier: 'normal', message: 'Within budget.' };
}
}
Why This Structure Works
The TenantLedger persists state to the workspace directory, ensuring cost data survives process restarts and remains strictly scoped to the tenant. The BudgetGate operates as a synchronous middleware. It evaluates utilization against three thresholds before the orchestrator dispatches a tool. This eliminates the race condition where an agent spawns a sub-agent or triggers a browser automation sequence after the budget is technically exhausted.
The pricing model uses list rates because they are stable, publicly documented, and sufficient for governance. Negotiated rates introduce latency and complexity into the hot path. Post-hoc reconciliation handles the delta between list and contracted pricing.
Pitfall Guide
-
Request-Level Obsession
Explanation: Teams attempt to meter every individual API call, assuming granular tracking is required for governance. This introduces distributed tracing overhead, token accounting drift, and complex reconciliation logic.
Fix: Accept workspace-level aggregation as the governance boundary. Request-level precision is unnecessary for budget enforcement and complicates the runtime.
-
Post-Execution Budget Checks
Explanation: Evaluating spend after a tool completes allows expensive operations to finish even when the budget is exhausted. This defeats the purpose of proactive cost control.
Fix: Implement pre-execution interception. The budget gate must run before the tool is dispatched, blocking execution when thresholds are breached.
-
Hardcoded Pricing Tables
Explanation: Embedding model rates directly into the enforcement logic requires code deployments whenever providers adjust pricing. This creates operational friction and stale rate cards.
Fix: Externalize pricing into a versioned configuration file or a lightweight rate service. The enforcement layer should consume rates, not define them.
-
Ignoring Context Window Accumulation
Explanation: Agents often pass growing conversation histories to subsequent calls. Failing to account for cumulative context tokens underestimates spend, especially for long-running sessions.
Fix: Track total input tokens per session, including system prompts, retrieved documents, and historical turns. The ledger must reflect the full payload, not just the latest message.
-
Single Global Budget
Explanation: Applying one budget across all tenants causes cross-contamination. A high-usage client drains the shared pool, triggering false hard stops for low-usage clients.
Fix: Namespace budgets by workspace ID. Each tenant receives an independent cap, alert threshold, and warning boundary.
-
Treating Estimates as Invoices
Explanation: Engineering teams sometimes present workspace-level cost aggregates as billing-grade invoices. This creates friction with finance when discrepancies appear due to list vs. contracted rates or unattributed overhead.
Fix: Clearly separate governance metrics from accounting. Label workspace aggregates as "operational estimates" and route final billing through a separate reconciliation pipeline.
-
Missing Grace Periods
Explanation: Hard stops at 100% utilization can abruptly terminate long-running tasks mid-execution, causing data corruption or incomplete workflows.
Fix: Implement a grace buffer or checkpoint system. Allow in-flight operations to complete, but block new tool dispatches once the cap is reached.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Startup MVP with <10 tenants | Workspace-level aggregation + hard_stop at 100% | Low overhead, fast deployment, sufficient for early validation | Minimal engineering cost; prevents surprise invoices |
| Enterprise multi-tenant platform | Workspace aggregation + negotiated rate reconciliation + grace buffers | Requires auditability, finance alignment, and safe termination of long tasks | Moderate engineering cost; reduces billing disputes by 60-80% |
| Research/Experimental workloads | Workspace aggregation + soft limits + alert-only mode | Prioritizes experimentation continuity over strict enforcement | Higher potential spend; enables rapid iteration without hard blocks |
| High-frequency tool automation | Pre-execution hooks + tool-level pricing tiers | Prevents expensive sub-agent or browser loops from draining budgets | Reduces runaway compute costs by 40-70% in automation-heavy workloads |
Configuration Template
# cost-governance.yaml
pricing:
model_input_per_1k: 0.005
model_output_per_1k: 0.015
base_tool_call: 0.002
browser_automation: 0.05
sub_agent_spawn: 0.10
policies:
default:
monthly_cap: 500.00
alert_threshold: 70
warn_threshold: 90
grace_buffer_seconds: 30
enforcement_mode: pre_execution
tenants:
- workspace_id: "client_alpha"
monthly_cap: 1200.00
alert_threshold: 60
warn_threshold: 85
allowed_tools: ["search", "code_exec"]
blocked_tools: ["browser_automation"]
- workspace_id: "client_beta"
monthly_cap: 300.00
alert_threshold: 80
warn_threshold: 95
enforcement_mode: post_execution # For legacy compatibility
Quick Start Guide
- Initialize workspace isolation: Create a dedicated directory structure for each tenant. Ensure process boundaries prevent cross-tenant resource access.
- Deploy the ledger service: Run the
TenantLedger instance per workspace, pointing to the tenant directory for state persistence.
- Attach the budget gate: Insert the
BudgetGate middleware into your agent orchestratorâs tool dispatch pipeline. Configure it to read from cost-governance.yaml.
- Validate thresholds: Execute a test session that triggers alert, warn, and hard_stop conditions. Verify that tool execution halts correctly at each tier.
- Route metrics to observability: Stream ledger state changes to your monitoring stack. Set up dashboards for utilization trends and threshold breaches.