distributed systems25 min

Rate Limiting

Controlling request volume to protect services from overload

0/9Not Started

Why This Matters

Without rate limiting, a single misbehaving client can overwhelm your API with thousands of requests per second, degrading performance for everyone. A bot can scrape your entire database, a buggy script can hammer your endpoints, or a DDoS attack can take you offline. Rate limiting controls how many requests a client can make in a given time window, protecting your service from abuse and ensuring fair access for all users.

Every major API -- GitHub, Stripe, Twitter, Google -- enforces rate limits. Understanding the algorithms behind rate limiting lets you build systems that are both fair and resilient under load.

Define Terms

Visual Model

Request 1
Request 2
Request 3
Token Bucket5 tokens, refills 2/sec
RefillSteady rate
Allowed200 OK
Rejected429 Too Many
Consume token
Consume token
Consume token
Add tokens
Has tokens
Empty

The full process at a glance. Click Start tour to walk through each step.

Token bucket rate limiting: each request consumes a token; tokens refill over time.

Code Example

Code
// Token Bucket rate limiter

class TokenBucket {
  constructor(capacity, refillRate) {
    this.capacity = capacity;      // Max tokens
    this.tokens = capacity;        // Start full
    this.refillRate = refillRate;   // Tokens per second
    this.lastRefill = Date.now();
  }

  refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + elapsed * this.refillRate
    );
    this.lastRefill = now;
  }

  tryConsume() {
    this.refill();
    if (this.tokens >= 1) {
      this.tokens -= 1;
      return true;  // Request allowed
    }
    return false;   // Rate limited
  }
}

// Sliding Window rate limiter
class SlidingWindowLimiter {
  constructor(maxRequests, windowMs) {
    this.maxRequests = maxRequests;
    this.windowMs = windowMs;
    this.requests = [];  // Timestamps of recent requests
  }

  tryConsume() {
    const now = Date.now();
    // Remove requests outside the window
    this.requests = this.requests.filter(
      t => now - t < this.windowMs
    );

    if (this.requests.length < this.maxRequests) {
      this.requests.push(now);
      return true;
    }
    return false;
  }
}

// Demo: Token Bucket
const bucket = new TokenBucket(5, 2); // 5 max, 2 per second
for (let i = 0; i < 8; i++) {
  const allowed = bucket.tryConsume();
  console.log(`Request ${i + 1}: ${allowed ? "ALLOWED" : "REJECTED"}`);
}

// Demo: Sliding Window
const limiter = new SlidingWindowLimiter(3, 1000); // 3 per second
for (let i = 0; i < 5; i++) {
  console.log(`Sliding ${i + 1}: ${limiter.tryConsume() ? "OK" : "LIMITED"}`);  
}

Interactive Experiment

Try these exercises to explore rate limiting algorithms:

  • Compare the token bucket and sliding window algorithms. Send 5 requests instantly, wait 2 seconds, then send 5 more. How does each algorithm behave?
  • Implement a fixed window limiter: count requests in fixed 1-second windows (e.g., 0:00-0:01, 0:01-0:02). What happens at the boundary between two windows?
  • Add per-user rate limiting: each user ID gets their own bucket. How would you implement this with Redis in a distributed system?
  • Add a Retry-After header calculation that tells the client exactly how many seconds to wait.

Quick Quiz

Coding Challenge

Fixed Window Rate Limiter

Write a class called `FixedWindowLimiter` with a method `tryRequest(timestamp)` that takes a Unix timestamp in seconds. Allow up to `maxRequests` in each `windowSize`-second window. Return true if the request is allowed, false if rate limited.

Loading editor...

Real-World Usage

Rate limiting protects every major API and service:

  • GitHub API: 5,000 requests per hour for authenticated users, 60 for unauthenticated. Exceeding the limit returns 429 with rate limit headers.
  • Stripe API: Uses a token bucket algorithm. Rate limits vary by endpoint and account type. Critical endpoints have lower limits.
  • Redis-based rate limiting: Redis INCR with TTL is the most common distributed rate limiter. Libraries like redis-cell implement token bucket natively.
  • Cloudflare: Provides edge-based rate limiting that blocks abusive traffic before it reaches your servers.
  • API Gateway (AWS, Kong): API gateways provide configurable rate limiting per API key, IP, or route as a built-in feature.

Connections