Why This Matters
When one service in your system goes down, every service that depends on it starts accumulating timeouts and errors. Those services then become slow, causing their callers to time out too. This is a failure cascade -- a single failure that spreads across your entire system like falling dominoes.
A circuit breaker stops this cascade by detecting repeated failures and short-circuiting requests to the failing service. Instead of waiting 30 seconds for a timeout, the circuit breaker immediately returns an error, freeing up resources and preventing the failure from spreading. It is named after the electrical circuit breaker that prevents house fires by cutting power when it detects a dangerous current.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
Circuit breaker states: Closed (normal), Open (failing fast), Half-Open (testing recovery).
Code Example
// Circuit breaker implementation
class CircuitBreaker {
constructor(options = {}) {
this.failureThreshold = options.failureThreshold || 5;
this.resetTimeout = options.resetTimeout || 30000; // 30 seconds
this.state = "CLOSED";
this.failureCount = 0;
this.lastFailureTime = null;
this.successCount = 0;
}
async call(fn) {
if (this.state === "OPEN") {
// Check if timeout has elapsed
if (Date.now() - this.lastFailureTime >= this.resetTimeout) {
this.state = "HALF_OPEN";
console.log("Circuit HALF-OPEN: testing...");
} else {
throw new Error("Circuit breaker is OPEN - request rejected");
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failureCount = 0;
if (this.state === "HALF_OPEN") {
this.state = "CLOSED";
console.log("Circuit CLOSED: service recovered");
}
}
onFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.failureThreshold) {
this.state = "OPEN";
console.log(`Circuit OPEN after ${this.failureCount} failures`);
}
}
getState() {
return this.state;
}
}
// Usage
const breaker = new CircuitBreaker({ failureThreshold: 3 });
let attempts = 0;
async function flakyService() {
attempts++;
if (attempts <= 4) throw new Error("Service down");
return "Success!";
}
async function demo() {
for (let i = 0; i < 5; i++) {
try {
const result = await breaker.call(flakyService);
console.log("Result:", result);
} catch (e) {
console.log(`Attempt ${i + 1}: ${e.message} [${breaker.getState()}]`);
}
}
}
demo();Interactive Experiment
Try these modifications to understand circuit breaker behavior:
- Change the failure threshold to 1 and observe how quickly the circuit opens. Then change it to 10. What are the tradeoffs?
- Add a fallback function that returns cached or default data when the circuit is open.
- Implement the bulkhead pattern: limit the number of concurrent requests to a service so one slow service cannot consume all your threads.
- Add metrics tracking: count total requests, successful requests, rejected requests, and circuit state transitions.
Quick Quiz
Coding Challenge
Write a function called `shouldTripCircuit` that takes an array of recent request results (true for success, false for failure) and a failure rate threshold (a decimal like 0.5 for 50%). Return true if the failure rate exceeds the threshold.
Real-World Usage
Circuit breakers are standard in production microservice architectures:
- Netflix Hystrix: The original library that popularized the circuit breaker pattern. Though now in maintenance mode, its ideas are embedded in every modern service mesh.
- Resilience4j: A lightweight Java library for fault tolerance with circuit breakers, rate limiters, bulkheads, and retries.
- Istio/Envoy: Service mesh proxies implement circuit breakers at the infrastructure level, requiring no code changes in your services.
- AWS App Mesh: Provides circuit breaker functionality as part of the service mesh for ECS and EKS workloads.
- Polly (.NET): A .NET resilience library supporting circuit breakers, retries, and fallbacks.