SECURITYFEBRUARY 17, 2026

Rate Limiting: How to protect your API without losing your mind

Rate limiting algorithms (Token Bucket, Sliding Window), implementations with Redis, Express and Nginx, and how to handle 429s on the frontend.

By Thiago Saraiva25 MIN

Introduction

Rate limiting and throttling are essential techniques for controlling request flow in modern applications. They protect your infrastructure from overload, prevent abuse, ensure fair resource distribution among users, and maintain service quality. In this comprehensive guide, we'll explore the fundamental concepts, algorithms, practical implementations across all layers of your stack, and real-world use cases to help you make informed decisions about which approach to use in each scenario.

1. Fundamentals

Key Concepts

Rate Limiting limits the number of requests a client can make within a specific time window. For example: "100 requests per minute." If the limit is exceeded, subsequent requests are rejected (usually with HTTP 429).

Throttling controls the execution rate by enforcing a minimum interval between operations. It ensures operations don't happen faster than a defined rate, queuing or delaying requests rather than rejecting them immediately.

Debounce delays execution until a period of inactivity occurs. If new calls arrive before the delay expires, the timer resets. Common in search inputs where you wait for the user to stop typing.

Why You Need Them

Overload Protection: Prevents server collapse under excessive load
Security: Mitigates brute-force attacks, DDoS, credential stuffing
Fair Usage: Ensures one client doesn't monopolize resources
Cost Control: Limits usage of paid external APIs
Better UX: Prevents unnecessary duplicate requests

Comparison Table

Technique	When to Use	Example	Layer
Rate Limiting	Limit total requests per period	1000 req/hour per API key	Backend/API Gateway
Throttling	Control execution rate	Process max 10 webhooks/sec	Backend/Queue
Debounce	Wait for user to finish action	Search after 300ms without typing	Frontend

2. Rate Limiting Algorithms

2.1 Token Bucket

A bucket holds tokens, each representing permission for one request. Tokens are added at a constant rate. Each request consumes one token. If the bucket is empty, requests are rejected.

Pros: Allows controlled bursts, simple, low memory. Cons: Doesn't guarantee uniform distribution, burst at start possible.

2.2 Leaky Bucket

Requests enter a queue (bucket). The queue is processed at a constant rate (leak). If the queue is full, new requests are rejected.

Pros: Constant processing rate, smooth traffic, predictable. Cons: Can have high latency, uses more memory (queue), no bursts allowed.

2.3 Fixed Window Counter

Counts requests in fixed time windows. When the window ends, the counter resets.

Pros: Very simple, low memory, excellent performance. Cons: Boundary problem (2x limit at window edges), abrupt reset.

2.4 Sliding Window Log

Stores the timestamp of each request. Counts requests within the window by filtering timestamps.

Pros: Very accurate, no boundary problem, uniform distribution. Cons: High memory usage, O(n) complexity, hard to scale.

2.5 Sliding Window Counter

Combines fixed window counters with weighted calculation. Approximates the sliding window with minimal memory.

Pros: Good accuracy, moderate memory, solves boundary problem, best cost-benefit. Cons: Approximation (not exact), slightly more complex.

Algorithm Comparison

Algorithm	Memory	Accuracy	Complexity	Allows Burst	Distributed
Token Bucket	O(1)	Good	O(1)	Yes	Hard
Leaky Bucket	O(n)	Excellent	O(1)	No	Medium
Fixed Window	O(1)	Poor (boundary)	O(1)	Yes	Easy
Sliding Log	O(n)	Excellent	O(n)	No	Hard
Sliding Counter	O(1)	Very Good	O(1)	Moderate	Easy

3. Database Layer

3.1 Redis: INCR + EXPIRE (Fixed Window)

3.2 Redis: Sorted Sets (Sliding Window)

3.3 Redis: Lua Scripts (Atomic Operations)

3.4 PostgreSQL Advisory Locks

3.5 Connection Pooling

4. Backend Layer

4.1 Express Middleware (express-rate-limit)

4.2 Custom Redis Middleware (Sliding Window)

4.3 Python FastAPI (slowapi)

4.4 Distributed Rate Limiting

4.5 Tiered Limits (Free/Pro/Enterprise)

5. API Design

Standard Headers

Every API with rate limiting should include these headers:

X-RateLimit-Limit: Maximum requests allowed in the window
X-RateLimit-Remaining: Requests remaining in the current window
X-RateLimit-Reset: Unix timestamp when the window resets
Retry-After: Seconds until the client can retry (only on 429)

Complete 429 Response

Rate Limit Info in Response Body

6. Frontend Layer

6.1 Manual Throttle and Debounce

6.2 React Hooks

6.3 Request Queue with Concurrency Limit

6.4 Retry with Exponential Backoff on 429

7. Infrastructure

7.1 Nginx Rate Limiting

7.2 AWS API Gateway

7.3 Cloudflare Workers

7.4 Redis Cluster for Distributed Rate Limiting

8. Use Cases

8.1 Login Brute-Force Protection

8.2 Webhook Delivery Throttling

8.3 Real-time Typing Indicator

9. Performance Comparison

Algorithm	Memory	Latency	Throughput	Accuracy	Distributed	Recommended For
Fixed Window	O(1) - 8 bytes	< 1ms	100k+ req/s	70%	Easy	High-traffic APIs
Sliding Counter	O(1) - 16 bytes	~1ms	50k+ req/s	90%	Easy	General use (recommended)
Token Bucket	O(1) - 24 bytes	~1ms	50k+ req/s	85%	Medium	APIs with bursts
Sliding Log	O(n) - ~100n bytes	~5ms	10k+ req/s	100%	Hard	When precision is required
Leaky Bucket	O(n) - ~50n bytes	~10ms	5k+ req/s	100%	Medium	Uniform processing

10. Decision Framework

When to Use Each Algorithm

Token Bucket: Traffic with expected peaks, upload/download APIs, when burst tolerance is needed. Leaky Bucket: Constant rate required, sequential processing, webhooks and queues. Fixed Window: Maximum performance needed, high traffic (100k+ req/s), simple implementation. Sliding Window Counter: Best cost-benefit, general use, good accuracy with low memory. Sliding Window Log: Perfect precision needed, compliance and SLA, detailed auditing.

Decision Flowchart

Need perfect accuracy?
 YES → Sliding Window Log
 NO  ↓

Allows bursts?
 YES → Token Bucket
 NO  ↓

Need constant rate?
 YES → Leaky Bucket
 NO  ↓

Performance critical?
 YES → Fixed Window
 NO  → Sliding Window Counter (RECOMMENDED)

Per-Layer Recommendations

Layer	Technique	Tool	Use Case
Frontend	Debounce	Lodash, custom hook	Search input, autocomplete
Frontend	Throttle	Lodash, custom hook	Scroll, resize, typing indicator
Backend	Rate Limit	express-rate-limit, Redis	API endpoints, auth
Database	Connection Pool	pg.Pool, Redis pipeline	Query throttling
Infra	Rate Limit	Nginx, API Gateway, Cloudflare	DDoS protection, global limit

Final Recommendations

For public APIs: Use Sliding Window Counter with Redis
For login protection: Use Fixed Window with aggressive limits + progressive delays
For webhooks: Use Leaky Bucket with a queue
For frontend search: Use Debounce at 300ms
For scroll events: Use Throttle at 100ms
For high scale: Use infrastructure-level limiting (Nginx, Cloudflare)
For multi-tenant: Implement tiered limits (Free/Pro/Enterprise)
For compliance: Use Sliding Window Log with auditing