Rate Limiting: How to protect your API without losing your mind
Rate limiting algorithms (Token Bucket, Sliding Window), implementations with Redis, Express and Nginx, and how to handle 429s on the frontend.
Thiago Saraiva

Introduction
Rate limiting and throttling are essential techniques for controlling request flow in modern applications. They protect your infrastructure from overload, prevent abuse, ensure fair resource distribution among users, and maintain service quality. In this comprehensive guide, we'll explore the fundamental concepts, algorithms, practical implementations across all layers of your stack, and real-world use cases to help you make informed decisions about which approach to use in each scenario.
1. Fundamentals
Key Concepts
Rate Limiting limits the number of requests a client can make within a specific time window. For example: "100 requests per minute." If the limit is exceeded, subsequent requests are rejected (usually with HTTP 429).
Throttling controls the execution rate by enforcing a minimum interval between operations. It ensures operations don't happen faster than a defined rate, queuing or delaying requests rather than rejecting them immediately.
Debounce delays execution until a period of inactivity occurs. If new calls arrive before the delay expires, the timer resets. Common in search inputs where you wait for the user to stop typing.
Why You Need Them
- Overload Protection: Prevents server collapse under excessive load
- Security: Mitigates brute-force attacks, DDoS, credential stuffing
- Fair Usage: Ensures one client doesn't monopolize resources
- Cost Control: Limits usage of paid external APIs
- Better UX: Prevents unnecessary duplicate requests
Comparison Table
| Technique | When to Use | Example | Layer |
|---|---|---|---|
| Rate Limiting | Limit total requests per period | 1000 req/hour per API key | Backend/API Gateway |
| Throttling | Control execution rate | Process max 10 webhooks/sec | Backend/Queue |
| Debounce | Wait for user to finish action | Search after 300ms without typing | Frontend |
2. Rate Limiting Algorithms
2.1 Token Bucket
A bucket holds tokens, each representing permission for one request. Tokens are added at a constant rate. Each request consumes one token. If the bucket is empty, requests are rejected.
Pros: Allows controlled bursts, simple, low memory. Cons: Doesn't guarantee uniform distribution, burst at start possible.
2.2 Leaky Bucket
Requests enter a queue (bucket). The queue is processed at a constant rate (leak). If the queue is full, new requests are rejected.
Pros: Constant processing rate, smooth traffic, predictable. Cons: Can have high latency, uses more memory (queue), no bursts allowed.
2.3 Fixed Window Counter
Counts requests in fixed time windows. When the window ends, the counter resets.
Pros: Very simple, low memory, excellent performance. Cons: Boundary problem (2x limit at window edges), abrupt reset.
2.4 Sliding Window Log
Stores the timestamp of each request. Counts requests within the window by filtering timestamps.
Pros: Very accurate, no boundary problem, uniform distribution. Cons: High memory usage, O(n) complexity, hard to scale.
2.5 Sliding Window Counter
Combines fixed window counters with weighted calculation. Approximates the sliding window with minimal memory.
Pros: Good accuracy, moderate memory, solves boundary problem, best cost-benefit. Cons: Approximation (not exact), slightly more complex.
Algorithm Comparison
| Algorithm | Memory | Accuracy | Complexity | Allows Burst | Distributed |
|---|---|---|---|---|---|
| Token Bucket | O(1) | Good | O(1) | Yes | Hard |
| Leaky Bucket | O(n) | Excellent | O(1) | No | Medium |
| Fixed Window | O(1) | Poor (boundary) | O(1) | Yes | Easy |
| Sliding Log | O(n) | Excellent | O(n) | No | Hard |
| Sliding Counter | O(1) | Very Good | O(1) | Moderate | Easy |
3. Database Layer
3.1 Redis: INCR + EXPIRE (Fixed Window)
3.2 Redis: Sorted Sets (Sliding Window)
3.3 Redis: Lua Scripts (Atomic Operations)
3.4 PostgreSQL Advisory Locks
3.5 Connection Pooling
4. Backend Layer
4.1 Express Middleware (express-rate-limit)
4.2 Custom Redis Middleware (Sliding Window)
4.3 Python FastAPI (slowapi)
4.4 Distributed Rate Limiting
4.5 Tiered Limits (Free/Pro/Enterprise)
5. API Design
Standard Headers
Every API with rate limiting should include these headers:
X-RateLimit-Limit: Maximum requests allowed in the windowX-RateLimit-Remaining: Requests remaining in the current windowX-RateLimit-Reset: Unix timestamp when the window resetsRetry-After: Seconds until the client can retry (only on 429)
Complete 429 Response
Rate Limit Info in Response Body
6. Frontend Layer
6.1 Manual Throttle and Debounce
6.2 React Hooks
6.3 Request Queue with Concurrency Limit
6.4 Retry with Exponential Backoff on 429
7. Infrastructure
7.1 Nginx Rate Limiting
7.2 AWS API Gateway
7.3 Cloudflare Workers
7.4 Redis Cluster for Distributed Rate Limiting
8. Use Cases
8.1 Login Brute-Force Protection
8.2 Webhook Delivery Throttling
8.3 Real-time Typing Indicator
9. Performance Comparison
| Algorithm | Memory | Latency | Throughput | Accuracy | Distributed | Recommended For |
|---|---|---|---|---|---|---|
| Fixed Window | O(1) - 8 bytes | < 1ms | 100k+ req/s | 70% | Easy | High-traffic APIs |
| Sliding Counter | O(1) - 16 bytes | ~1ms | 50k+ req/s | 90% | Easy | General use (recommended) |
| Token Bucket | O(1) - 24 bytes | ~1ms | 50k+ req/s | 85% | Medium | APIs with bursts |
| Sliding Log | O(n) - ~100n bytes | ~5ms | 10k+ req/s | 100% | Hard | When precision is required |
| Leaky Bucket | O(n) - ~50n bytes | ~10ms | 5k+ req/s | 100% | Medium | Uniform processing |
10. Decision Framework
When to Use Each Algorithm
Token Bucket: Traffic with expected peaks, upload/download APIs, when burst tolerance is needed. Leaky Bucket: Constant rate required, sequential processing, webhooks and queues. Fixed Window: Maximum performance needed, high traffic (100k+ req/s), simple implementation. Sliding Window Counter: Best cost-benefit, general use, good accuracy with low memory. Sliding Window Log: Perfect precision needed, compliance and SLA, detailed auditing.
Decision Flowchart
Need perfect accuracy?
YES → Sliding Window Log
NO ↓
Allows bursts?
YES → Token Bucket
NO ↓
Need constant rate?
YES → Leaky Bucket
NO ↓
Performance critical?
YES → Fixed Window
NO → Sliding Window Counter (RECOMMENDED)
Per-Layer Recommendations
| Layer | Technique | Tool | Use Case |
|---|---|---|---|
| Frontend | Debounce | Lodash, custom hook | Search input, autocomplete |
| Frontend | Throttle | Lodash, custom hook | Scroll, resize, typing indicator |
| Backend | Rate Limit | express-rate-limit, Redis | API endpoints, auth |
| Database | Connection Pool | pg.Pool, Redis pipeline | Query throttling |
| Infra | Rate Limit | Nginx, API Gateway, Cloudflare | DDoS protection, global limit |
Final Recommendations
- For public APIs: Use Sliding Window Counter with Redis
- For login protection: Use Fixed Window with aggressive limits + progressive delays
- For webhooks: Use Leaky Bucket with a queue
- For frontend search: Use Debounce at 300ms
- For scroll events: Use Throttle at 100ms
- For high scale: Use infrastructure-level limiting (Nginx, Cloudflare)
- For multi-tenant: Implement tiered limits (Free/Pro/Enterprise)
- For compliance: Use Sliding Window Log with auditing