Why This Matters
A model that is "95% accurate" sounds great — until you learn the dataset is 95% one class. A spam filter that catches all spam but also blocks half your legitimate emails has a different kind of problem. Choosing the right evaluation metric is just as important as choosing the right model.
Accuracy is the metric everyone learns first, but it can be deeply misleading. Understanding precision, recall, and the confusion matrix gives you the vocabulary to ask: "Is this model actually solving my problem, or just looking good on paper?"
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
Confusion matrix quadrants (TP, FP, FN, TN) feed into precision, recall, and F1 score.
Code Example
// Computing classification metrics from scratch
function confusionMatrix(actual, predicted) {
let tp = 0, tn = 0, fp = 0, fn = 0;
for (let i = 0; i < actual.length; i++) {
if (actual[i] === 1 && predicted[i] === 1) tp++;
if (actual[i] === 0 && predicted[i] === 0) tn++;
if (actual[i] === 0 && predicted[i] === 1) fp++;
if (actual[i] === 1 && predicted[i] === 0) fn++;
}
return { tp, tn, fp, fn };
}
function metrics(actual, predicted) {
const { tp, tn, fp, fn } = confusionMatrix(actual, predicted);
const accuracy = (tp + tn) / (tp + tn + fp + fn);
const precision = tp / (tp + fp) || 0;
const recall = tp / (tp + fn) || 0;
const f1 = 2 * (precision * recall) / (precision + recall) || 0;
return { accuracy, precision, recall, f1 };
}
// Example: disease screening
const actual = [1, 0, 1, 1, 0, 0, 1, 0, 1, 0];
const predicted = [1, 0, 1, 0, 0, 1, 1, 0, 0, 0];
const cm = confusionMatrix(actual, predicted);
console.log("Confusion Matrix:", cm);
// { tp: 3, tn: 4, fp: 1, fn: 2 }
const m = metrics(actual, predicted);
console.log("Accuracy:", m.accuracy.toFixed(2));
console.log("Precision:", m.precision.toFixed(2));
console.log("Recall:", m.recall.toFixed(2));
console.log("F1 Score:", m.f1.toFixed(2));Interactive Experiment
Try these exercises to build intuition about metrics:
- Change the predicted array so all predictions are 0. What happens to accuracy? What happens to recall?
- Create a highly imbalanced dataset: 95 negatives and 5 positives. A model that always predicts 0 has what accuracy?
- Make the model achieve perfect recall (catch all positives). What happens to precision?
- Find a set of predictions that maximizes F1 score. Is it the same set that maximizes accuracy?
Quick Quiz
Coding Challenge
Write a function called `classificationReport` that takes arrays of actual labels and predicted labels (0s and 1s) and returns an object with: `tp`, `tn`, `fp`, `fn`, `accuracy`, `precision`, `recall`, and `f1`. All decimal values should be rounded to 2 decimal places.
Real-World Usage
Choosing the right metric is a critical business and engineering decision:
- Search engines: Precision at top-k measures how many of the top results are relevant — users only look at the first page.
- Fraud detection: High recall is essential to catch as many fraudulent transactions as possible, even if some legitimate ones get flagged.
- Content moderation: Platforms balance precision (avoiding censoring legitimate content) with recall (catching harmful content).
- Autonomous vehicles: Safety-critical systems use multiple metrics and require near-perfect recall for obstacle detection.
- A/B testing: Business metrics like click-through rate and conversion rate are often more meaningful than raw model accuracy.