Engineering Fluency OS

Why This Matters

Your application is running in production. Suddenly, users report that pages are loading slowly. Is it the database? The network? A bug in the latest deployment? Without observability, you are flying blind. You cannot fix what you cannot see.

Observability is the ability to understand what is happening inside your system by examining its outputs: logs (what happened), metrics (how much), and traces (where). These three pillars give you the tools to detect problems before users notice, diagnose the root cause when things break, and make data-driven decisions about performance and capacity. Every production system needs observability. The difference between a team that spends hours debugging and a team that resolves incidents in minutes is their observability setup.

Define Terms

Visual Model

Service AEmits signals

Service BEmits signals

LogsWhat happened

MetricsHow much

TracesWhere (path)

DashboardVisualize and alert

The full process at a glance. Click Start tour to walk through each step.

The three pillars of observability: logs tell you what happened, metrics tell you how much, traces tell you where.

Code Example

Code

// Structured logging with severity levels
class Logger {
  log(level, message, context = {}) {
    const entry = {
      timestamp: new Date().toISOString(),
      level,
      message,
      ...context,
    };
    console.log(JSON.stringify(entry));
  }
  info(msg, ctx) { this.log("info", msg, ctx); }
  warn(msg, ctx) { this.log("warn", msg, ctx); }
  error(msg, ctx) { this.log("error", msg, ctx); }
}

// Metrics collector
class MetricsCollector {
  constructor() {
    this.counters = {};
    this.histograms = {};
  }
  increment(name) {
    this.counters[name] = (this.counters[name] || 0) + 1;
  }
  recordDuration(name, ms) {
    if (!this.histograms[name]) this.histograms[name] = [];
    this.histograms[name].push(ms);
  }
  getAverage(name) {
    const vals = this.histograms[name] || [];
    if (vals.length === 0) return 0;
    return vals.reduce((a, b) => a + b, 0) / vals.length;
  }
}

// Usage in a request handler
const logger = new Logger();
const metrics = new MetricsCollector();

function handleRequest(req) {
  const start = Date.now();
  logger.info("Request received", { path: req.path });

  metrics.increment("requests_total");
  // ... process request ...

  const duration = Date.now() - start;
  metrics.recordDuration("request_duration_ms", duration);
  logger.info("Request completed", {
    path: req.path, durationMs: duration,
  });
}

Interactive Experiment

Try these exercises:

Run the Logger and make several requests. Look at the structured JSON output. Why is JSON better than plain text for logs?
Add an error log that includes a stack trace. How does this help debugging?
Track three different metrics: total requests, error count, and average response time. Which would you alert on?
Imagine a request spans 3 services. Draw what the trace would look like with timing for each span.

Quick Quiz

Coding Challenge

Health Check Monitor

Write a `HealthMonitor` class that tracks the health of services. It has `recordRequest(service, durationMs, success)` to record a request result, and `getHealth(service)` that returns an object with `totalRequests`, `errorRate` (errors / total, as a decimal between 0 and 1), and `avgDuration` (average duration in ms). If no requests have been recorded for a service, return {totalRequests: 0, errorRate: 0, avgDuration: 0}.

Loading editor...

Real-World Usage

Observability is critical infrastructure in production:

Datadog, New Relic, Grafana: Full-stack observability platforms that combine logs, metrics, and traces into unified dashboards with alerting.
ELK Stack (Elasticsearch, Logstash, Kibana): An open-source log management pipeline used to collect, store, and search logs from distributed systems.
Prometheus + Grafana: The standard open-source stack for metrics collection and visualization, widely used in Kubernetes environments.
OpenTelemetry: A vendor-neutral standard for collecting and exporting logs, metrics, and traces. Supported by all major observability platforms.
PagerDuty / Opsgenie: Alert management platforms that notify on-call engineers when observability tools detect problems.

Observability