Why This Matters
Your application is deployed. Users are hitting it. Then something goes wrong: response times spike, errors increase, a feature stops working. How do you figure out what happened? You cannot attach a debugger to production. You cannot add console.log and redeploy mid-incident. You need observability -- the ability to understand what your system is doing by examining its outputs.
Observability rests on three pillars: logs (what happened), metrics (how much), and traces (the path a request took). Together, they let you answer the question every on-call engineer dreads at 2 AM: "What is broken and why?" Good structured logging and alerting mean you find out about problems before your users do.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
The three pillars of observability: logs, metrics, and traces flow into dashboards and alerts.
Code Example
// Structured logging (JSON format)
const log = (level, message, data = {}) => {
const entry = {
timestamp: new Date().toISOString(),
level,
message,
service: "payment-api",
...data
};
console.log(JSON.stringify(entry));
};
// Log levels indicate severity
log("info", "Server started", { port: 3000 });
log("info", "Payment processed", { userId: "u123", amount: 49.99 });
log("warn", "Slow database query", { queryMs: 2500, table: "orders" });
log("error", "Payment failed", { userId: "u456", error: "Card declined" });
// Metrics: track request duration
const startTime = Date.now();
// ... handle request ...
const durationMs = Date.now() - startTime;
log("info", "Request completed", {
method: "POST",
path: "/api/pay",
statusCode: 200,
durationMs
});
// Simple metrics counter
const metrics = { requests: 0, errors: 0 };
function trackRequest(success) {
metrics.requests++;
if (!success) metrics.errors++;
const errorRate = (metrics.errors / metrics.requests * 100).toFixed(2);
console.log(`Error rate: ${errorRate}%`);
}Interactive Experiment
Try these exercises:
- Add structured JSON logging to an existing project. Include timestamp, level, message, and relevant data fields. Filter the output with
jq. - Build a simple request counter that tracks total requests, errors, and average response time. Print a summary every 10 requests.
- Create a log function that only outputs messages at or above a configured level (e.g., setting level to "warn" suppresses "info" and "debug").
- Time a database query or API call. Log the duration and flag anything over 1 second as a warning.
Quick Quiz
Coding Challenge
Write a function called `analyzeLogs` that takes an array of log entry objects, each with `level` ('info', 'warn', 'error') and `durationMs` (number). Return an object with: `total` (total log count), `errors` (count of error-level logs), `errorRate` (percentage of errors, rounded to 1 decimal), `avgDuration` (average durationMs, rounded to nearest integer), and `slowRequests` (count of entries with durationMs > 1000).
Real-World Usage
Observability is non-negotiable for production systems:
- Datadog, New Relic, and Grafana are observability platforms that ingest logs, metrics, and traces from thousands of services, providing dashboards and alerting.
- Prometheus scrapes metrics from application endpoints and stores time-series data. Grafana visualizes Prometheus data in real-time dashboards.
- PagerDuty and OpsGenie receive alerts and route them to the right on-call engineer via phone call, SMS, or Slack.
- Distributed tracing tools like Jaeger and Honeycomb follow requests across microservices, revealing that a slow checkout was caused by a slow inventory service call.
- Log levels in production are typically set to "info" or "warn". Debug logging is enabled temporarily when investigating specific issues.