machine learning25 min

Metrics & Evaluation

Measuring model performance with the right metrics for the right task

0/9Not Started

Why This Matters

A model that is "95% accurate" sounds great — until you learn the dataset is 95% one class. A spam filter that catches all spam but also blocks half your legitimate emails has a different kind of problem. Choosing the right evaluation metric is just as important as choosing the right model.

Accuracy is the metric everyone learns first, but it can be deeply misleading. Understanding precision, recall, and the confusion matrix gives you the vocabulary to ask: "Is this model actually solving my problem, or just looking good on paper?"

Define Terms

Visual Model

TPTrue Positive
FPFalse Positive
FNFalse Negative
TNTrue Negative
Predicted+ -
PrecisionTP/(TP+FP)
RecallTP/(TP+FN)
F1 Score

The full process at a glance. Click Start tour to walk through each step.

Confusion matrix quadrants (TP, FP, FN, TN) feed into precision, recall, and F1 score.

Code Example

Code
// Computing classification metrics from scratch

function confusionMatrix(actual, predicted) {
  let tp = 0, tn = 0, fp = 0, fn = 0;
  for (let i = 0; i < actual.length; i++) {
    if (actual[i] === 1 && predicted[i] === 1) tp++;
    if (actual[i] === 0 && predicted[i] === 0) tn++;
    if (actual[i] === 0 && predicted[i] === 1) fp++;
    if (actual[i] === 1 && predicted[i] === 0) fn++;
  }
  return { tp, tn, fp, fn };
}

function metrics(actual, predicted) {
  const { tp, tn, fp, fn } = confusionMatrix(actual, predicted);
  const accuracy  = (tp + tn) / (tp + tn + fp + fn);
  const precision = tp / (tp + fp) || 0;
  const recall    = tp / (tp + fn) || 0;
  const f1        = 2 * (precision * recall) / (precision + recall) || 0;
  return { accuracy, precision, recall, f1 };
}

// Example: disease screening
const actual    = [1, 0, 1, 1, 0, 0, 1, 0, 1, 0];
const predicted = [1, 0, 1, 0, 0, 1, 1, 0, 0, 0];

const cm = confusionMatrix(actual, predicted);
console.log("Confusion Matrix:", cm);
// { tp: 3, tn: 4, fp: 1, fn: 2 }

const m = metrics(actual, predicted);
console.log("Accuracy:",  m.accuracy.toFixed(2));
console.log("Precision:", m.precision.toFixed(2));
console.log("Recall:",    m.recall.toFixed(2));
console.log("F1 Score:",  m.f1.toFixed(2));

Interactive Experiment

Try these exercises to build intuition about metrics:

  • Change the predicted array so all predictions are 0. What happens to accuracy? What happens to recall?
  • Create a highly imbalanced dataset: 95 negatives and 5 positives. A model that always predicts 0 has what accuracy?
  • Make the model achieve perfect recall (catch all positives). What happens to precision?
  • Find a set of predictions that maximizes F1 score. Is it the same set that maximizes accuracy?

Quick Quiz

Coding Challenge

Build a Metrics Dashboard

Write a function called `classificationReport` that takes arrays of actual labels and predicted labels (0s and 1s) and returns an object with: `tp`, `tn`, `fp`, `fn`, `accuracy`, `precision`, `recall`, and `f1`. All decimal values should be rounded to 2 decimal places.

Loading editor...

Real-World Usage

Choosing the right metric is a critical business and engineering decision:

  • Search engines: Precision at top-k measures how many of the top results are relevant — users only look at the first page.
  • Fraud detection: High recall is essential to catch as many fraudulent transactions as possible, even if some legitimate ones get flagged.
  • Content moderation: Platforms balance precision (avoiding censoring legitimate content) with recall (catching harmful content).
  • Autonomous vehicles: Safety-critical systems use multiple metrics and require near-perfect recall for obstacle detection.
  • A/B testing: Business metrics like click-through rate and conversion rate are often more meaningful than raw model accuracy.

Connections