Engineering Fluency OS

Why This Matters

Your application works great with 100 users. Then it hits 10,000 users and slows to a crawl. Then 100,000 users arrive and the server crashes. How do you handle more load? You have two options: vertical scaling (get a bigger machine) or horizontal scaling (add more machines). This decision shapes your architecture, your costs, and your ability to grow.

Vertical scaling is straightforward -- upgrade CPU, RAM, or storage. But every machine has a ceiling, and a single machine is a single point of failure. Horizontal scaling adds more servers behind a load balancer, but now you must handle shared state, session management, and data consistency across machines. Most modern systems use a combination of both strategies. Understanding when and how to apply each approach is fundamental to building systems that grow with your users.

Define Terms

Visual Model

Big Server32 CPU, 128 GB RAM

Load BalancerDistributes traffic

Server 18 CPU, 32 GB

Server 28 CPU, 32 GB

Server 38 CPU, 32 GB

The full process at a glance. Click Start tour to walk through each step.

Two paths to handle growth: upgrade a single server (vertical) or add more servers (horizontal).

Code Example

Code

// Simulating a load balancer with round-robin
class LoadBalancer {
  constructor(servers) {
    this.servers = servers;
    this.current = 0;
  }

  getNextServer() {
    const server = this.servers[this.current];
    this.current = (this.current + 1) % this.servers.length;
    return server;
  }

  handleRequest(request) {
    const server = this.getNextServer();
    console.log(`Routing to ${server}: ${request}`);
    return server;
  }
}

// Horizontal scaling: 3 servers behind a load balancer
const lb = new LoadBalancer([
  "server-1:3000",
  "server-2:3000",
  "server-3:3000",
]);

// Simulate 6 requests — distributed evenly
for (let i = 1; i <= 6; i++) {
  lb.handleRequest(`Request ${i}`);
}
// Output:
// Routing to server-1:3000: Request 1
// Routing to server-2:3000: Request 2
// Routing to server-3:3000: Request 3
// Routing to server-1:3000: Request 4
// ...

Interactive Experiment

Try these exercises:

Modify the load balancer to use "least connections" instead of round-robin. Track how many active requests each server has.
Simulate a server going down: remove a server from the list. How does the load balancer adapt?
Think about session state: if a user logs in on server-1 and the next request goes to server-2, what happens? How would you solve this?
Calculate the cost difference: one server with 64 GB RAM vs four servers with 16 GB RAM each. Which is more cost-effective?

Quick Quiz

Coding Challenge

Auto-Scaling Simulator

Write a function `autoScale` that takes an array of request counts (one per time period) and a `maxPerServer` threshold. Start with 1 server. For each time period, if requests per server exceeds `maxPerServer`, add a server. If requests per server is below `maxPerServer / 2` and there is more than 1 server, remove a server. Return an array of objects with `period`, `requests`, `servers`, and `perServer` (requests divided by servers, rounded down) for each time period.

Loading editor...

Real-World Usage

Scaling strategies are essential in production:

AWS Auto Scaling: Automatically adds or removes EC2 instances based on CPU utilization, network traffic, or custom metrics.
Kubernetes Horizontal Pod Autoscaler: Scales the number of pods in a deployment based on observed CPU or memory usage.
Database read replicas: Databases like PostgreSQL and MySQL scale reads horizontally by replicating data to read-only copies.
Vertical scaling for databases: It is common to vertically scale database servers (more RAM, faster SSDs) because horizontal scaling of stateful systems is much harder.
CDNs: Content delivery networks scale horizontally by adding edge servers worldwide to serve cached content closer to users.

Horizontal vs Vertical Scaling