Back to posts

Rate Limiting: How to protect your API without losing your mind

Rate limiting algorithms (Token Bucket, Sliding Window), implementations with Redis, Express and Nginx, and how to handle 429s on the frontend.

TS
Thiago Saraiva
25 min read

Rating Limit

Introduction

Rate limiting and throttling are essential techniques for controlling request flow in modern applications. They protect your infrastructure from overload, prevent abuse, ensure fair resource distribution among users, and maintain service quality. In this comprehensive guide, we'll explore the fundamental concepts, algorithms, practical implementations across all layers of your stack, and real-world use cases to help you make informed decisions about which approach to use in each scenario.


1. Fundamentals

Key Concepts

Rate Limiting limits the number of requests a client can make within a specific time window. For example: "100 requests per minute." If the limit is exceeded, subsequent requests are rejected (usually with HTTP 429).

Throttling controls the execution rate by enforcing a minimum interval between operations. It ensures operations don't happen faster than a defined rate, queuing or delaying requests rather than rejecting them immediately.

Debounce delays execution until a period of inactivity occurs. If new calls arrive before the delay expires, the timer resets. Common in search inputs where you wait for the user to stop typing.

Why You Need Them

  • Overload Protection: Prevents server collapse under excessive load
  • Security: Mitigates brute-force attacks, DDoS, credential stuffing
  • Fair Usage: Ensures one client doesn't monopolize resources
  • Cost Control: Limits usage of paid external APIs
  • Better UX: Prevents unnecessary duplicate requests

Comparison Table

TechniqueWhen to UseExampleLayer
Rate LimitingLimit total requests per period1000 req/hour per API keyBackend/API Gateway
ThrottlingControl execution rateProcess max 10 webhooks/secBackend/Queue
DebounceWait for user to finish actionSearch after 300ms without typingFrontend

2. Rate Limiting Algorithms

2.1 Token Bucket

A bucket holds tokens, each representing permission for one request. Tokens are added at a constant rate. Each request consumes one token. If the bucket is empty, requests are rejected.

Pros: Allows controlled bursts, simple, low memory. Cons: Doesn't guarantee uniform distribution, burst at start possible.

2.2 Leaky Bucket

Requests enter a queue (bucket). The queue is processed at a constant rate (leak). If the queue is full, new requests are rejected.

Pros: Constant processing rate, smooth traffic, predictable. Cons: Can have high latency, uses more memory (queue), no bursts allowed.

2.3 Fixed Window Counter

Counts requests in fixed time windows. When the window ends, the counter resets.

Pros: Very simple, low memory, excellent performance. Cons: Boundary problem (2x limit at window edges), abrupt reset.

2.4 Sliding Window Log

Stores the timestamp of each request. Counts requests within the window by filtering timestamps.

Pros: Very accurate, no boundary problem, uniform distribution. Cons: High memory usage, O(n) complexity, hard to scale.

2.5 Sliding Window Counter

Combines fixed window counters with weighted calculation. Approximates the sliding window with minimal memory.

Pros: Good accuracy, moderate memory, solves boundary problem, best cost-benefit. Cons: Approximation (not exact), slightly more complex.

Algorithm Comparison

AlgorithmMemoryAccuracyComplexityAllows BurstDistributed
Token BucketO(1)GoodO(1)YesHard
Leaky BucketO(n)ExcellentO(1)NoMedium
Fixed WindowO(1)Poor (boundary)O(1)YesEasy
Sliding LogO(n)ExcellentO(n)NoHard
Sliding CounterO(1)Very GoodO(1)ModerateEasy

3. Database Layer

3.1 Redis: INCR + EXPIRE (Fixed Window)

3.2 Redis: Sorted Sets (Sliding Window)

3.3 Redis: Lua Scripts (Atomic Operations)

3.4 PostgreSQL Advisory Locks

3.5 Connection Pooling


4. Backend Layer

4.1 Express Middleware (express-rate-limit)

4.2 Custom Redis Middleware (Sliding Window)

4.3 Python FastAPI (slowapi)

4.4 Distributed Rate Limiting

4.5 Tiered Limits (Free/Pro/Enterprise)


5. API Design

Standard Headers

Every API with rate limiting should include these headers:

  • X-RateLimit-Limit: Maximum requests allowed in the window
  • X-RateLimit-Remaining: Requests remaining in the current window
  • X-RateLimit-Reset: Unix timestamp when the window resets
  • Retry-After: Seconds until the client can retry (only on 429)

Complete 429 Response

Rate Limit Info in Response Body


6. Frontend Layer

6.1 Manual Throttle and Debounce

6.2 React Hooks

6.3 Request Queue with Concurrency Limit

6.4 Retry with Exponential Backoff on 429


7. Infrastructure

7.1 Nginx Rate Limiting

7.2 AWS API Gateway

7.3 Cloudflare Workers

7.4 Redis Cluster for Distributed Rate Limiting


8. Use Cases

8.1 Login Brute-Force Protection

8.2 Webhook Delivery Throttling

8.3 Real-time Typing Indicator


9. Performance Comparison

AlgorithmMemoryLatencyThroughputAccuracyDistributedRecommended For
Fixed WindowO(1) - 8 bytes< 1ms100k+ req/s70%EasyHigh-traffic APIs
Sliding CounterO(1) - 16 bytes~1ms50k+ req/s90%EasyGeneral use (recommended)
Token BucketO(1) - 24 bytes~1ms50k+ req/s85%MediumAPIs with bursts
Sliding LogO(n) - ~100n bytes~5ms10k+ req/s100%HardWhen precision is required
Leaky BucketO(n) - ~50n bytes~10ms5k+ req/s100%MediumUniform processing

10. Decision Framework

When to Use Each Algorithm

Token Bucket: Traffic with expected peaks, upload/download APIs, when burst tolerance is needed. Leaky Bucket: Constant rate required, sequential processing, webhooks and queues. Fixed Window: Maximum performance needed, high traffic (100k+ req/s), simple implementation. Sliding Window Counter: Best cost-benefit, general use, good accuracy with low memory. Sliding Window Log: Perfect precision needed, compliance and SLA, detailed auditing.

Decision Flowchart

Need perfect accuracy?
 YES → Sliding Window Log
 NO  ↓

Allows bursts?
 YES → Token Bucket
 NO  ↓

Need constant rate?
 YES → Leaky Bucket
 NO  ↓

Performance critical?
 YES → Fixed Window
 NO  → Sliding Window Counter (RECOMMENDED)

Per-Layer Recommendations

LayerTechniqueToolUse Case
FrontendDebounceLodash, custom hookSearch input, autocomplete
FrontendThrottleLodash, custom hookScroll, resize, typing indicator
BackendRate Limitexpress-rate-limit, RedisAPI endpoints, auth
DatabaseConnection Poolpg.Pool, Redis pipelineQuery throttling
InfraRate LimitNginx, API Gateway, CloudflareDDoS protection, global limit

Final Recommendations

  1. For public APIs: Use Sliding Window Counter with Redis
  2. For login protection: Use Fixed Window with aggressive limits + progressive delays
  3. For webhooks: Use Leaky Bucket with a queue
  4. For frontend search: Use Debounce at 300ms
  5. For scroll events: Use Throttle at 100ms
  6. For high scale: Use infrastructure-level limiting (Nginx, Cloudflare)
  7. For multi-tenant: Implement tiered limits (Free/Pro/Enterprise)
  8. For compliance: Use Sliding Window Log with auditing