Protecting against token theft

By Codcompass Team·2026-05-31·6 min read

Inference Arbitrage: Defending High-Cost AI Endpoints Against Resale Attacks

Current Situation Analysis

The economics of modern web infrastructure have created a dangerous asymmetry. Standard HTTP requests are virtually free; at scale, providers like Vercel charge approximately $2 per million requests. In contrast, a single prompt to a frontier model can cost $2 or more. This creates a million-fold cost differential that attackers exploit through inference arbitrage.

Inference theft occurs when attackers unauthorizedly consume paid AI inference and resell the capacity at a discount. Because the attacker's marginal cost for inference is zero, they can undercut legitimate pricing while maintaining high margins. This is not merely rate-limit abuse; it is the creation of a black market for stolen compute.

Many engineering teams overlook this threat because traditional web defenses are designed for low-cost attacks. IP rate limiting and authentication walls assume that the cost of bypassing defenses scales with the value of the resource. In inference theft, this assumption breaks. Attackers deploy residential proxy farms with thousands of IPs and automate account creation, rendering per-IP limits ineffective. Furthermore, defenses that verify identity only at session start allow attackers to amortize a single bypass across thousands of inference calls, destroying the defender's cost advantage.

Real-world incidents confirm the severity. On April 12, 2026, a Vercel documentation AI endpoint experienced a traffic spike to 1,300 requests per minute on the Anthropic Claude Haiku 4.5 model. The attack utilized residential proxies to obscure origins, bypassing standard rate limits. The volume represented a potential inference cost run rate exceeding $10,000 per day. Without per-request verification, such attacks can drain budgets rapidly before detection.

WOW Moment: Key Findings

The critical insight in defending AI endpoints is the amortization effect. Attackers profit by decoupling the cost of bypassing security from the number of inference calls. Per-request verification re-couples these costs, making the attack economically unviable.

Defense Strategy	Bypass Cost	Calls per Bypass	Attacker Margin	Resale Viability
Session-Based Auth	High (One-time)	Thousands	>95%	High
IP Rate Limits	Low (Proxy rotation)	Hundreds per IP	~80%	Medium
Per-Request Deep Analysis	High (Per-call)	1	Negative	None

Why this matters: When verification runs per request, the attacker must pay the bypass cost for every single inference call. Since inference is the most expensive resource per call, forcing the attacker to solve a challenge for each request destroys their ma

rgin. Even if the bypass cost is a fraction of a cent, it eliminates the profit when multiplied by high-cost inference calls.

Core Solution

Defending against inference theft requires a shift from session-level protection to per-request verification using invisible deep analysis. The goal is to leverage cost asymmetry: verification must be cheap for the defender but expensive enough per call to break the attacker's business model.

Architecture Decisions

Per-Request Gate: Verification must occur inside the route handler for every inference call. This prevents amortization.
Invisible Deep Analysis: Traditional CAPTCHAs are ineffective because modern AI models can bypass visual challenges. Invisible solutions powered by client-side machine learning (e.g., Vercel BotID with Kasada deep analysis) distinguish humans from bots without user friction, enabling per-request checks.
Client-Server Coupling: The verification token must be generated client-side and validated server-side. Missing client configuration causes server checks to fail, as headers are not attached to requests.
Adapter Agnosticism: Attackers often wrap victim endpoints in OpenAI/Anthropic-compatible adapters to resell access. The adapter becomes the session boundary for the attacker's users. Per-request verification ensures that even if the attacker authenticates to their own adapter, the underlying call to your API is still scrutinized.

Implementation Strategy

The following example demonstrates a middleware wrapper pattern that enforces per-request verification. This approach abstracts the verification logic, making it reusable across multiple AI endpoints.

Server-Side Guard:

// lib/inference-guard.ts
import { checkBotId } from 'botid/server';
import { NextRequest, NextResponse } from 'next/server';

export async function withInferenceProtection(
  request: NextRequest,
  handler: (req: NextRequest) => Promise<NextResponse>
): Promise<NextResponse> {
  // Run deep analysis on every request
  const analysis = await checkBotId();

  if (analysis.isBot) {
    // Block inference theft immediately
    return NextResponse.json(
      { error: 'Inference access denied: automated traffic detected' },
      { status: 403 }
    );
  }

  // Proceed to AI SDK call path
  return handler(request);
}

Route Handler Usage:

// app/api/v1/generate/route.ts
import { withInferenceProtection } from '@/lib/inference-guard';
import { NextRequest, NextResponse } from 'next/server';

export async function POST(request: NextRequest) {
  return withInferenceProtection(request, async (req) => {
    // Safe to proceed with expensive AI inference
    const response = await callFrontierModel(req);
    return NextResponse.json(response);
  });
}

Client-Side Configuration:

The client must initialize the protection to attach necessary headers. This configuration is critical; without it, the server-side check cannot validate the request.

// instrumentation-client.ts
import { initBotId } from 'botid/client/core';

initBotId({
  protect: [
    { path: '/api/v1/generate', method: 'POST' },
    { path: '/api/v1/chat', method: 'POST' },
  ],
});

Rationale: The wrapper pattern centralizes security logic, reducing the risk of developers forgetting to add checks to new endpoints. Using checkBotId ensures that every call is evaluated against behavioral signals, not just static credentials.

Pitfall Guide

Amortization Trap
- Explanation: Running verification only at login or session start. Attackers bypass once and reuse the session for thousands of calls.
- Fix: Move verification inside the route handler to run on every request.
Proxy Blindness
- Explanation: Relying on IP rate limits. Attackers use residential proxy networks with thousands of IPs, diluting limits to ineffective levels.
- Fix: Use behavioral analysis tools like BotID that evaluate request characteristics, not just IP reputation.
The Adapter Illusion
- Explanation: Assuming that because users authenticate to an attacker's adapter, your endpoint is safe. The adapter proxies calls to your API, masking the true origin.
- Fix: Verify every request hitting your API, regardless of upstream authentication. The adapter is just another client.
Visual CAPTCHA Bypass
- Explanation: Using image-based CAPTCHAs. AI models can solve these challenges automatically, rendering them useless against sophisticated attackers.
- Fix: Deploy invisible deep analysis that uses client-side ML to detect bots without user interaction.
Client Configuration Omission
- Explanation: Implementing server-side checks without configuring the client. The server check fails because the verification headers are never attached.
- Fix: Always pair checkBotId with initBotId configuration for the specific route.
Playground Neglect
- Explanation: Underestimating the risk of AI playgrounds or debug endpoints. These allow maximum prompt control, making stolen calls highly valuable for resale.
- Fix: Treat any endpoint with significant prompt control as high-risk and enforce strict per-request verification.
System Prompt False Security
- Explanation: Believing fixed system prompts prevent abuse. Attackers can jailbreak models or use prompts that work around restrictions, still enabling resale.
- Fix: Verify requests based on traffic patterns, not just prompt content. Jailbreaks do not change the economic incentive for theft.

Production Bundle

Action Checklist

Audit AI Endpoints: Identify all internet-facing endpoints that trigger AI inference.
Classify Risk: Prioritize endpoints with high prompt control (playgrounds) over fixed-prompt bots.
Deploy Per-Request Guard: Implement verification inside route handlers for all high-risk endpoints.
Configure Client Init: Ensure initBotId covers all protected routes to attach verification headers.
Enable Deep Analysis: Use invisible deep analysis rather than visual CAPTCHAs to maintain UX and security.
Monitor Cost Run-Rates: Set alerts for sudden spikes in inference volume or cost.
Test Bypass Scenarios: Simulate proxy rotation and adapter usage to verify defenses hold.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
AI Playground / High Control	Per-Request BotID Deep Analysis	High resale value; attackers will use proxies and adapters.	Low verification cost vs. high inference savings.
Support Bot / Fixed Prompt	Per-Request BotID + Rate Limit	Lower risk but still vulnerable to jailbreaks and resale.	Moderate verification cost; prevents bulk theft.
Internal Tool / Authenticated	Auth + Session Verification	Low external risk; internal users are trusted.	Minimal overhead; session checks suffice.
Public Demo / Free Tier	Per-Request BotID + Strict Limits	High abuse potential; no revenue to offset theft.	Essential to prevent budget drain.

Configuration Template

Ensure your Next.js configuration includes the required wrapper for BotID to function correctly. This template assumes a standard Next.js setup.

// next.config.ts
import type { NextConfig } from 'next';

const nextConfig: NextConfig = {
  // BotID requires this wrapper to inject client-side scripts
  // and handle verification headers correctly.
  // Refer to BotID documentation for the exact wrapper syntax.
  // Example structure:
  // experimental: {
  //   instrumentationHook: true,
  // },
};

export default nextConfig;

Quick Start Guide

Install BotID: Add the package to your project dependencies.
```
npm install botid
```
Initialize Client: Add initBotId to your client instrumentation file with the paths to protect.
Wrap Routes: Import checkBotId in your AI route handlers and verify every request before calling the AI model.
Verify Setup: Test the endpoint with a bot simulation to ensure requests are blocked and legitimate traffic passes.
Monitor: Check logs for verification results and adjust configurations if false positives occur.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back