Why This Matters
Your application works great with 100 users. Then it hits 10,000 users and slows to a crawl. Then 100,000 users arrive and the server crashes. How do you handle more load? You have two options: vertical scaling (get a bigger machine) or horizontal scaling (add more machines). This decision shapes your architecture, your costs, and your ability to grow.
Vertical scaling is straightforward -- upgrade CPU, RAM, or storage. But every machine has a ceiling, and a single machine is a single point of failure. Horizontal scaling adds more servers behind a load balancer, but now you must handle shared state, session management, and data consistency across machines. Most modern systems use a combination of both strategies. Understanding when and how to apply each approach is fundamental to building systems that grow with your users.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
Two paths to handle growth: upgrade a single server (vertical) or add more servers (horizontal).
Code Example
// Simulating a load balancer with round-robin
class LoadBalancer {
constructor(servers) {
this.servers = servers;
this.current = 0;
}
getNextServer() {
const server = this.servers[this.current];
this.current = (this.current + 1) % this.servers.length;
return server;
}
handleRequest(request) {
const server = this.getNextServer();
console.log(`Routing to ${server}: ${request}`);
return server;
}
}
// Horizontal scaling: 3 servers behind a load balancer
const lb = new LoadBalancer([
"server-1:3000",
"server-2:3000",
"server-3:3000",
]);
// Simulate 6 requests — distributed evenly
for (let i = 1; i <= 6; i++) {
lb.handleRequest(`Request ${i}`);
}
// Output:
// Routing to server-1:3000: Request 1
// Routing to server-2:3000: Request 2
// Routing to server-3:3000: Request 3
// Routing to server-1:3000: Request 4
// ...Interactive Experiment
Try these exercises:
- Modify the load balancer to use "least connections" instead of round-robin. Track how many active requests each server has.
- Simulate a server going down: remove a server from the list. How does the load balancer adapt?
- Think about session state: if a user logs in on server-1 and the next request goes to server-2, what happens? How would you solve this?
- Calculate the cost difference: one server with 64 GB RAM vs four servers with 16 GB RAM each. Which is more cost-effective?
Quick Quiz
Coding Challenge
Write a function `autoScale` that takes an array of request counts (one per time period) and a `maxPerServer` threshold. Start with 1 server. For each time period, if requests per server exceeds `maxPerServer`, add a server. If requests per server is below `maxPerServer / 2` and there is more than 1 server, remove a server. Return an array of objects with `period`, `requests`, `servers`, and `perServer` (requests divided by servers, rounded down) for each time period.
Real-World Usage
Scaling strategies are essential in production:
- AWS Auto Scaling: Automatically adds or removes EC2 instances based on CPU utilization, network traffic, or custom metrics.
- Kubernetes Horizontal Pod Autoscaler: Scales the number of pods in a deployment based on observed CPU or memory usage.
- Database read replicas: Databases like PostgreSQL and MySQL scale reads horizontally by replicating data to read-only copies.
- Vertical scaling for databases: It is common to vertically scale database servers (more RAM, faster SSDs) because horizontal scaling of stateful systems is much harder.
- CDNs: Content delivery networks scale horizontally by adding edge servers worldwide to serve cached content closer to users.