Your LLM cost estimate is fine. Your rate-limit math is what pages you at 2am.

Every "LLM cost calculator" answers the question nobody gets paged for.

Cost is the easy half. You multiply tokens by a price and you get a number that is wrong by a rounding error. Nobody's on-call rotation has ever blown up because the monthly bill was 8% higher than the spreadsheet said.

What blows up the rotation is a 429 cascade: traffic ramps, one rate-limit dimension saturates, retries pile on, the queue backs up, and the thing that was "fine in the load test" is now a red dashboard. Datadog's 2026 State of AI Engineering puts rate-limit errors at roughly 60% of all errored LLM call spans — millions of them a month, industry-wide. It is the dominant failure mode of LLM apps in production, and almost no planning tool models it.

So I built one that does, and this post is the reasoning behind it: llmcapplanner.vercel.app — a single client-side page, no signup, nothing leaves your browser.

"Tokens per minute" is not one number

Here's the trap. You look at a provider's pricing page, you see a tier with some big TPM number, and you mentally file rate limits as "a thing I have headroom on." Then you scale and discover the limit you actually hit isn't the one you were watching.

Concretely, as of the dated snapshot in the tool (2026-05-15):

Anthropic enforces three independent dimensions — RPM, input tokens/min (ITPM), and output tokens/min (OTPM) — and the token limits are per model, not per tier. On Tier 4, Claude Opus 4.7 gets ~10M ITPM while Sonnet 4.6 gets ~2M at the same tier. If you sized your capacity off "Anthropic Tier 4" as a single row in a generic table, you sized it off the wrong model's numbers. Output tokens have their own ceiling that's an order of magnitude tighter than input — a summarization workload and a generation workload at identical RPS bind on completely different dimensions.

OpenAI enforces RPM and TPM (combined in+out), plus per-day RPD/TPD caps that most calculators ignore entirely. A burst pattern that's fine on the per-minute budget can still trip the daily ceiling.

The right question isn't "what does this cost" or even "what's my limit." It's: at my projected requests/min and average in/out token shape, which dimension saturates first, and how much headroom is on the others? That's the number that tells you when you'll get paged. The calculator computes exactly that and flags the binding dimension.

The per-second gotcha that load tests miss

"Per-minute" limits are not enforced per minute. They're enforced in much smaller windows — effectively per-second. A 4,000 RPM limit is not "4,000 requests in any 60s window." It's closer to ceil(4000/60) ≈ 67 req/s, and a one-second burst above that 429s even though your per-minute average is well under the cap.

This is why a load test at steady-state RPS passes and the same system 429s in production the moment traffic gets bursty. The tool surfaces the per-second figure whenever RPM is the binding (or near-binding) dimension, because that's the one that turns a "we're at 70% of our limit" dashboard into an incident.

Why this needs a dated snapshot, not a static table

The reason most rate-limit tables you find via search are wrong isn't laziness — it's that the ground truth moves. Model lineups churn (Opus went 4.6 → 4.7 in ~a month; GPT-5.4 → 5.5 around the same window), prices get cut, tier limits get revised. A calculator that hard-codes 2024 models and quietly rots is worse than no calculator, because it's confidently wrong.

So the snapshot is explicitly dated, every preset links to the provider's own current rate-limit / pricing doc, and there's a loud "verify in your dashboard — presets change" disclaimer. The maintenance is the product. If you spot a number that's drifted, that's a bug worth reporting.

What it does / doesn't do

Does: pick provider + model + tier + your req/min + avg in/out tokens → projected cost and the first binding 429 dimension with headroom on each, plus the per-second warning.
Doesn't: no live API calls, no auth, no account, no token-paste counting. It's deterministic arithmetic over a maintained, dated snapshot. That's the whole point — it's auditable.

It's free and there's nothing to sign up for: llmcapplanner.vercel.app

If you've hit a 429 cascade in prod, I'd genuinely like to hear which dimension bound first for you — that's the input that decides what this tool should model next.

Mid-Year Sale — Unlock Full Article