miter = new TokenBucketRateLimiter(
new TokenBucketRateLimiterOptions
{
TokenLimit = 50, // max burst
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 10, // ~1/s average
AutoReplenishment = true,
QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
QueueLimit = 0
});
Enter fullscreen mode Exit fullscreen mode
### [](#4-concurrency-limiter)4\. Concurrency limiter
Limits simultaneous in-flight requests rather than request rate. Useful for protecting expensive operations like report generation or ML inference where time-in-system matters more than throughput.
var limiter = new ConcurrencyLimiter(
new ConcurrencyLimiterOptions
{
PermitLimit = 20,
QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
QueueLimit = 5
});
Enter fullscreen mode Exit fullscreen mode
* * *
## [](#wiring-it-up-in-aspnet-core)Wiring it up in ASP.NET Core
Register policies in `Program.cs`, then apply them with the `[EnableRateLimiting]` attribute or inline via `RequireRateLimiting()`.
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddRateLimiter(options =>
{
options.AddFixedWindowLimiter(policyName: "fixed", opt =>
{
opt.PermitLimit = 100;
opt.Window = TimeSpan.FromMinutes(1);
opt.QueueLimit = 0;
});
options.AddTokenBucketLimiter(policyName: "burst", opt =>
{
opt.TokenLimit = 50;
opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
opt.TokensPerPeriod = 10;
opt.AutoReplenishment = true;
});
});
var app = builder.Build();
app.UseRateLimiter(); // must come before MapControllers
Enter fullscreen mode Exit fullscreen mode
Apply to a minimal API endpoint or controller action:
// Minimal API
app.MapGet("/products", GetProducts)
.RequireRateLimiting("fixed");
// Controller
[EnableRateLimiting("burst")]
[HttpGet("search")]
public IActionResult Search(string query) { ... }
Enter fullscreen mode Exit fullscreen mode
* * *
## [](#peruser-and-perendpoint-policies)Per-user and per-endpoint policies
A single global policy rarely fits real-world needs. Use `AddPolicy` with a partition key derived from the request context:
options.AddPolicy("per-user", httpContext =>
RateLimitPartition.GetTokenBucketLimiter(
partitionKey: httpContext.User.Identity?.Name
?? httpContext.Connection.RemoteIpAddress?.ToString()
?? "anonymous",
factory: _ => new TokenBucketRateLimiterOptions
{
TokenLimit = 200,
ReplenishmentPeriod = TimeSpan.FromMinutes(1),
TokensPerPeriod = 200,
AutoReplenishment = true
}));
Enter fullscreen mode Exit fullscreen mode
> **Tip:** prefer authenticated user ID over IP address as the partition key β NAT and proxies can share a single IP across hundreds of users, leading to false positives at scale.
* * *
## [](#custom-rejection-responses)Custom rejection responses
By default, the middleware returns `503 Service Unavailable`. The RFC-correct status for rate limiting is `429 Too Many Requests` with a `Retry-After` header:
options.OnRejected = async (context, token) =>
{
context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
if (context.Lease.TryGetMetadata(
MetadataName.RetryAfter, out var retryAfter))
{
context.HttpContext.Response.Headers.Append(
"Retry-After",
((int)retryAfter.TotalSeconds).ToString(
System.Globalization.CultureInfo.InvariantCulture));
}
await context.HttpContext.Response.WriteAsync(
"Rate limit exceeded. Please slow down.", token);
};
Enter fullscreen mode Exit fullscreen mode
* * *
## [](#distributed-scenarios-amp-redis)Distributed scenarios & Redis
The built-in limiters are in-process only β each pod maintains its own counters. In a horizontally scaled deployment, use a Redis-backed limiter via the `RedisRateLimiting` community library, which wraps the same `RateLimiter` abstraction:
dotnet add package RedisRateLimiting
Enter fullscreen mode Exit fullscreen mode
builder.Services.AddStackExchangeRedisCache(o =>
o.Configuration = builder.Configuration["Redis:Connection"]);
options.AddPolicy("distributed", httpContext =>
RedisRateLimitPartition.GetSlidingWindowRateLimiter(
partitionKey: httpContext.User.Identity?.Name ?? "anon",
factory: _ => new RedisSlidingWindowRateLimiterOptions
{
ConnectionMultiplexerFactory =
httpContext.RequestServices
.GetRequiredService<IConnectionMultiplexer>,
PermitLimit = 500,
Window = TimeSpan.FromMinutes(1)
}));
Enter fullscreen mode Exit fullscreen mode
* * *
## [](#clientside-resilience-with-polly)Client-side resilience with Polly
If your code _consumes_ a rate-limited API, use Polly's `RateLimiter` strategy combined with `Retry` to handle 429s gracefully:
dotnet add package Polly.Extensions.Http
Enter fullscreen mode Exit fullscreen mode
services.AddHttpClient<IProductsClient, ProductsClient>()
.AddResilienceHandler("products-pipeline", builder =>
{
builder.AddRateLimiter(new SlidingWindowRateLimiter(
new SlidingWindowRateLimiterOptions
{
PermitLimit = 50,
Window = TimeSpan.FromSeconds(10),
SegmentsPerWindow = 5
}));
builder.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 3,
Delay = TimeSpan.FromSeconds(2),
BackoffType = DelayBackoffType.Exponential,
ShouldHandle = args => ValueTask.FromResult(
args.Outcome.Result?.StatusCode ==
HttpStatusCode.TooManyRequests)
});
});
Enter fullscreen mode Exit fullscreen mode
* * *
## [](#choosing-the-right-algorithm)Choosing the right algorithm
Algorithm
Best for
Watch out for
Memory cost
Fixed window
Simple quotas, billing tiers
Boundary burst (2Γ spike)
Very low
Sliding window
Smooth public APIs
Segment count Γ partitions
Lowβmedium
Token bucket
Burst-tolerant consumer APIs
Tuning burst vs average
Low
Concurrency
Expensive ops (ML, reports)
Doesn't bound throughput
Very low
> **Distributed gotcha:** in-process limiters per pod means a cluster of 4 replicas effectively multiplies your limit by 4. Always use a Redis-backed partitioned limiter for multi-replica deployments where correctness matters.
* * *
## [](#wrapping-up)Wrapping up
.NET 7+ gives you production-grade rate limiting with zero external dependencies for single-node scenarios. The four algorithms cover the full spectrum from simple quotas to burst-tolerant consumer clients. Add Redis for distributed enforcement, Polly for client-side resilience, and always return `429` with a `Retry-After` header β your API consumers will thank you.
Questions or patterns I missed? Drop them in the comments.