distributed systems25 min

Leader Election

How distributed systems choose a single coordinator node

0/9Not Started

Why This Matters

In a group of servers, someone needs to be in charge. The leader coordinates writes, assigns tasks, or manages shared resources. But what happens when the leader crashes? The remaining nodes must elect a new leader -- quickly, correctly, and without disagreement. If two nodes both think they are the leader, you get a split brain, one of the most dangerous situations in distributed systems.

Leader election is used in database replication, distributed locks, job scheduling, and consensus protocols. Understanding it is key to building systems that recover automatically from failures.

Define Terms

Visual Model

Node 0ID: 0
Node 1ID: 1
Node 2ID: 2
Node 3ID: 3
Node 4Leader
heartbeat
heartbeat
heartbeat
heartbeat
election msg
election msg
election msg
election msg

The full process at a glance. Click Start tour to walk through each step.

Leader election: detect failure, elect a new leader, and use fencing tokens to prevent split brain.

Code Example

Code
// Simulating leader election with the bully algorithm

class Node {
  constructor(id) {
    this.id = id;
    this.isAlive = true;
    this.isLeader = false;
    this.leaderId = null;
  }
}

class Cluster {
  constructor(nodeCount) {
    this.nodes = [];
    this.fencingToken = 0;
    for (let i = 0; i < nodeCount; i++) {
      this.nodes.push(new Node(i));
    }
    // Highest ID starts as leader
    this.electLeader();
  }

  electLeader() {
    // Bully algorithm: highest alive ID wins
    const alive = this.nodes.filter(n => n.isAlive);
    if (alive.length === 0) {
      console.log("No nodes alive!");
      return;
    }

    // Reset all leadership
    for (const node of this.nodes) {
      node.isLeader = false;
    }

    const winner = alive.reduce((a, b) => a.id > b.id ? a : b);
    winner.isLeader = true;
    this.fencingToken++;

    for (const node of alive) {
      node.leaderId = winner.id;
    }

    console.log(`Node ${winner.id} elected as leader (token: ${this.fencingToken})`);
  }

  killNode(id) {
    this.nodes[id].isAlive = false;
    console.log(`Node ${id} crashed`);
    if (this.nodes[id].isLeader) {
      console.log("Leader is down! Starting election...");
      this.electLeader();
    }
  }
}

const cluster = new Cluster(5);
cluster.killNode(4);  // Kill the leader
cluster.killNode(3);  // Kill the new leader

Interactive Experiment

Try these modifications:

  • Add a reviveNode method. What should happen when a crashed node comes back online with an old fencing token?
  • Implement a heartbeat system: the leader sends heartbeats every second, and followers start an election after 3 missed heartbeats.
  • Simulate a network partition where nodes 0-2 cannot communicate with nodes 3-4. What happens with the bully algorithm?
  • Why is it dangerous to use a very short heartbeat timeout? What about a very long one?

Quick Quiz

Coding Challenge

Fencing Token Validator

Write a function called `validateOperation` that takes the current valid fencing token and an operation object with a `token` and `action` field. Return true only if the operation's token matches the current valid token. This prevents stale leaders from executing operations.

Loading editor...

Real-World Usage

Leader election is critical in production distributed systems:

  • ZooKeeper: Uses a variant of Zab (ZooKeeper Atomic Broadcast) to elect a leader that coordinates all writes across the ensemble.
  • etcd/Raft: Uses the Raft consensus algorithm where candidates request votes and a node with a majority becomes the leader.
  • Kafka: Each partition has a leader broker that handles all reads and writes for that partition. If the leader fails, a follower is promoted.
  • Redis Sentinel: Monitors Redis instances and automatically promotes a replica to master when the current master fails.
  • Kubernetes: Uses etcd leader election for control plane components to ensure only one scheduler and controller manager are active.

Connections