Why This Matters
Without rate limiting, a single misbehaving client can overwhelm your API with thousands of requests per second, degrading performance for everyone. A bot can scrape your entire database, a buggy script can hammer your endpoints, or a DDoS attack can take you offline. Rate limiting controls how many requests a client can make in a given time window, protecting your service from abuse and ensuring fair access for all users.
Every major API -- GitHub, Stripe, Twitter, Google -- enforces rate limits. Understanding the algorithms behind rate limiting lets you build systems that are both fair and resilient under load.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
Token bucket rate limiting: each request consumes a token; tokens refill over time.
Code Example
// Token Bucket rate limiter
class TokenBucket {
constructor(capacity, refillRate) {
this.capacity = capacity; // Max tokens
this.tokens = capacity; // Start full
this.refillRate = refillRate; // Tokens per second
this.lastRefill = Date.now();
}
refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.capacity,
this.tokens + elapsed * this.refillRate
);
this.lastRefill = now;
}
tryConsume() {
this.refill();
if (this.tokens >= 1) {
this.tokens -= 1;
return true; // Request allowed
}
return false; // Rate limited
}
}
// Sliding Window rate limiter
class SlidingWindowLimiter {
constructor(maxRequests, windowMs) {
this.maxRequests = maxRequests;
this.windowMs = windowMs;
this.requests = []; // Timestamps of recent requests
}
tryConsume() {
const now = Date.now();
// Remove requests outside the window
this.requests = this.requests.filter(
t => now - t < this.windowMs
);
if (this.requests.length < this.maxRequests) {
this.requests.push(now);
return true;
}
return false;
}
}
// Demo: Token Bucket
const bucket = new TokenBucket(5, 2); // 5 max, 2 per second
for (let i = 0; i < 8; i++) {
const allowed = bucket.tryConsume();
console.log(`Request ${i + 1}: ${allowed ? "ALLOWED" : "REJECTED"}`);
}
// Demo: Sliding Window
const limiter = new SlidingWindowLimiter(3, 1000); // 3 per second
for (let i = 0; i < 5; i++) {
console.log(`Sliding ${i + 1}: ${limiter.tryConsume() ? "OK" : "LIMITED"}`);
}Interactive Experiment
Try these exercises to explore rate limiting algorithms:
- Compare the token bucket and sliding window algorithms. Send 5 requests instantly, wait 2 seconds, then send 5 more. How does each algorithm behave?
- Implement a fixed window limiter: count requests in fixed 1-second windows (e.g., 0:00-0:01, 0:01-0:02). What happens at the boundary between two windows?
- Add per-user rate limiting: each user ID gets their own bucket. How would you implement this with Redis in a distributed system?
- Add a
Retry-Afterheader calculation that tells the client exactly how many seconds to wait.
Quick Quiz
Coding Challenge
Write a class called `FixedWindowLimiter` with a method `tryRequest(timestamp)` that takes a Unix timestamp in seconds. Allow up to `maxRequests` in each `windowSize`-second window. Return true if the request is allowed, false if rate limited.
Real-World Usage
Rate limiting protects every major API and service:
- GitHub API: 5,000 requests per hour for authenticated users, 60 for unauthenticated. Exceeding the limit returns 429 with rate limit headers.
- Stripe API: Uses a token bucket algorithm. Rate limits vary by endpoint and account type. Critical endpoints have lower limits.
- Redis-based rate limiting: Redis INCR with TTL is the most common distributed rate limiter. Libraries like
redis-cellimplement token bucket natively. - Cloudflare: Provides edge-based rate limiting that blocks abusive traffic before it reaches your servers.
- API Gateway (AWS, Kong): API gateways provide configurable rate limiting per API key, IP, or route as a built-in feature.