Difficulty

Intermediate

Read Time

5 min

Rate Limiting in C# — Don't Let Your API Get Hammered

By Codcompass Team·2026-05-27·5 min read

If you run a public API without rate limiting, it's only a matter of time before a runaway client, a misconfigured retry loop, or a well-intentioned load test brings your service to its knees. .NET 7 shipped a first-class rate-limiting API — no third-party middleware required. This post walks through every knob you can turn.

Prerequisite: the built-in rate limiter lives in System.Threading.RateLimiting and the ASP.NET Core middleware in Microsoft.AspNetCore.RateLimiting. Both ship in the box from .NET 7 onwards.

Why rate limiting matters

Rate limiting protects three things simultaneously: your infrastructure from overload, your downstream dependencies from fan-out abuse, and your legitimate users from a noisy neighbour hogging capacity. It also plugs a class of denial-of-service vectors that auth alone can't stop.

The four built-in algorithms

1. Fixed window

Permits N requests per fixed time window (e.g. 100 requests per minute, window resets on the clock boundary). Simple, low memory, but can allow 2× burst at window boundaries.

using System.Threading.RateLimiting;

var limiter = new FixedWindowRateLimiter(
    new FixedWindowRateLimiterOptions
    {
        PermitLimit          = 100,
        Window               = TimeSpan.FromMinutes(1),
        QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
        QueueLimit           = 0   // reject immediately when full
    });

Enter fullscreen mode Exit fullscreen mode

2. Sliding window

Divides the window into segments and tracks usage per segment. Smoother than fixed window — eliminates the boundary burst at the cost of slightly more memory.

var limiter = new SlidingWindowRateLimiter(
    new SlidingWindowRateLimiterOptions
    {
        PermitLimit          = 100,
        Window               = TimeSpan.FromMinutes(1),
        SegmentsPerWindow    = 6,     // 10-second granularity
        QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
        QueueLimit           = 0
    });

Enter fullscreen mode Exit fullscreen mode

3. Token bucket

A bucket fills with tokens at a steady rate up to a maximum. Each request consumes one token. Allows short bursts up to the bucket capacity while enforcing a long-run average. Ideal for APIs where short spikes are acceptable.

var li

miter = new TokenBucketRateLimiter( new TokenBucketRateLimiterOptions { TokenLimit = 50, // max burst ReplenishmentPeriod = TimeSpan.FromSeconds(10), TokensPerPeriod = 10, // ~1/s average AutoReplenishment = true, QueueProcessingOrder = QueueProcessingOrder.OldestFirst, QueueLimit = 0 });


Enter fullscreen mode Exit fullscreen mode

### [](#4-concurrency-limiter)4\. Concurrency limiter

Limits simultaneous in-flight requests rather than request rate. Useful for protecting expensive operations like report generation or ML inference where time-in-system matters more than throughput.

var limiter = new ConcurrencyLimiter( new ConcurrencyLimiterOptions { PermitLimit = 20, QueueProcessingOrder = QueueProcessingOrder.OldestFirst, QueueLimit = 5 });


Enter fullscreen mode Exit fullscreen mode

* * *

## [](#wiring-it-up-in-aspnet-core)Wiring it up in ASP.NET Core

Register policies in `Program.cs`, then apply them with the `[EnableRateLimiting]` attribute or inline via `RequireRateLimiting()`.

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddRateLimiter(options => { options.AddFixedWindowLimiter(policyName: "fixed", opt => { opt.PermitLimit = 100; opt.Window = TimeSpan.FromMinutes(1); opt.QueueLimit = 0; });

options.AddTokenBucketLimiter(policyName: "burst", opt =>
{
    opt.TokenLimit          = 50;
    opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
    opt.TokensPerPeriod     = 10;
    opt.AutoReplenishment   = true;
});

});

var app = builder.Build(); app.UseRateLimiter(); // must come before MapControllers


Enter fullscreen mode Exit fullscreen mode

Apply to a minimal API endpoint or controller action:

// Minimal API app.MapGet("/products", GetProducts) .RequireRateLimiting("fixed");

// Controller [EnableRateLimiting("burst")] [HttpGet("search")] public IActionResult Search(string query) { ... }


Enter fullscreen mode Exit fullscreen mode

* * *

## [](#peruser-and-perendpoint-policies)Per-user and per-endpoint policies

A single global policy rarely fits real-world needs. Use `AddPolicy` with a partition key derived from the request context:

options.AddPolicy("per-user", httpContext => RateLimitPartition.GetTokenBucketLimiter( partitionKey: httpContext.User.Identity?.Name ?? httpContext.Connection.RemoteIpAddress?.ToString() ?? "anonymous", factory: _ => new TokenBucketRateLimiterOptions { TokenLimit = 200, ReplenishmentPeriod = TimeSpan.FromMinutes(1), TokensPerPeriod = 200, AutoReplenishment = true }));


Enter fullscreen mode Exit fullscreen mode

> **Tip:** prefer authenticated user ID over IP address as the partition key — NAT and proxies can share a single IP across hundreds of users, leading to false positives at scale.

* * *

## [](#custom-rejection-responses)Custom rejection responses

By default, the middleware returns `503 Service Unavailable`. The RFC-correct status for rate limiting is `429 Too Many Requests` with a `Retry-After` header:

options.OnRejected = async (context, token) => { context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;

if (context.Lease.TryGetMetadata(
        MetadataName.RetryAfter, out var retryAfter))
{
    context.HttpContext.Response.Headers.Append(
        "Retry-After",
        ((int)retryAfter.TotalSeconds).ToString(
            System.Globalization.CultureInfo.InvariantCulture));
}

await context.HttpContext.Response.WriteAsync(
    "Rate limit exceeded. Please slow down.", token);

};


Enter fullscreen mode Exit fullscreen mode

* * *

## [](#distributed-scenarios-amp-redis)Distributed scenarios & Redis

The built-in limiters are in-process only — each pod maintains its own counters. In a horizontally scaled deployment, use a Redis-backed limiter via the `RedisRateLimiting` community library, which wraps the same `RateLimiter` abstraction:

dotnet add package RedisRateLimiting


Enter fullscreen mode Exit fullscreen mode

builder.Services.AddStackExchangeRedisCache(o => o.Configuration = builder.Configuration["Redis:Connection"]);

options.AddPolicy("distributed", httpContext => RedisRateLimitPartition.GetSlidingWindowRateLimiter( partitionKey: httpContext.User.Identity?.Name ?? "anon", factory: _ => new RedisSlidingWindowRateLimiterOptions { ConnectionMultiplexerFactory = httpContext.RequestServices .GetRequiredService<IConnectionMultiplexer>, PermitLimit = 500, Window = TimeSpan.FromMinutes(1) }));


Enter fullscreen mode Exit fullscreen mode

* * *

## [](#clientside-resilience-with-polly)Client-side resilience with Polly

If your code _consumes_ a rate-limited API, use Polly's `RateLimiter` strategy combined with `Retry` to handle 429s gracefully:

dotnet add package Polly.Extensions.Http


Enter fullscreen mode Exit fullscreen mode

services.AddHttpClient<IProductsClient, ProductsClient>() .AddResilienceHandler("products-pipeline", builder => { builder.AddRateLimiter(new SlidingWindowRateLimiter( new SlidingWindowRateLimiterOptions { PermitLimit = 50, Window = TimeSpan.FromSeconds(10), SegmentsPerWindow = 5 }));

        builder.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            Delay            = TimeSpan.FromSeconds(2),
            BackoffType      = DelayBackoffType.Exponential,
            ShouldHandle     = args => ValueTask.FromResult(
                args.Outcome.Result?.StatusCode ==
                    HttpStatusCode.TooManyRequests)
        });
    });


Enter fullscreen mode Exit fullscreen mode

* * *

## [](#choosing-the-right-algorithm)Choosing the right algorithm

Algorithm

Best for

Watch out for

Memory cost

Fixed window

Simple quotas, billing tiers

Boundary burst (2× spike)

Very low

Sliding window

Smooth public APIs

Segment count × partitions

Low–medium

Token bucket

Burst-tolerant consumer APIs

Tuning burst vs average

Low

Concurrency

Expensive ops (ML, reports)

Doesn't bound throughput

Very low

> **Distributed gotcha:** in-process limiters per pod means a cluster of 4 replicas effectively multiplies your limit by 4. Always use a Redis-backed partitioned limiter for multi-replica deployments where correctness matters.

* * *

## [](#wrapping-up)Wrapping up

.NET 7+ gives you production-grade rate limiting with zero external dependencies for single-node scenarios. The four algorithms cover the full spectrum from simple quotas to burst-tolerant consumer clients. Add Redis for distributed enforcement, Polly for client-side resilience, and always return `429` with a `Retry-After` header — your API consumers will thank you.

Questions or patterns I missed? Drop them in the comments.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr