Why This Matters
You built a model that achieves 99% accuracy on your training data. You deploy it. It fails miserably on real-world data. This is overfitting — the most common and most costly mistake in machine learning.
Overfitting means the model has memorized the training data instead of learning generalizable patterns. Regularization techniques are the antidote: they constrain the model so it focuses on real patterns rather than noise. Understanding this tradeoff is what separates a working ML system from an impressive-looking demo that collapses in production.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
Two paths: without regularization the model overfits; with regularization it generalizes.
Code Example
// Demonstrating overfitting vs regularization
// Polynomial regression with and without L2 regularization
// Generate noisy data from y = 2x + 1
function generateData(n) {
const data = [];
for (let i = 0; i < n; i++) {
const x = i / n * 10;
const y = 2 * x + 1 + (Math.random() - 0.5) * 4; // noise
data.push({ x, y });
}
return data;
}
// Simple linear fit (appropriate complexity)
function linearFit(data) {
const n = data.length;
const sumX = data.reduce((s, d) => s + d.x, 0);
const sumY = data.reduce((s, d) => s + d.y, 0);
const sumXY = data.reduce((s, d) => s + d.x * d.y, 0);
const sumX2 = data.reduce((s, d) => s + d.x * d.x, 0);
const slope = (n * sumXY - sumX * sumY) / (n * sumX2 - sumX * sumX);
const intercept = (sumY - slope * sumX) / n;
return { slope: +slope.toFixed(2), intercept: +intercept.toFixed(2) };
}
// Evaluate: MSE on data
function mse(data, slope, intercept) {
return data.reduce((s, d) =>
s + (slope * d.x + intercept - d.y) ** 2, 0) / data.length;
}
const train = generateData(20);
const test = generateData(10);
const { slope, intercept } = linearFit(train);
console.log(`Model: y = ${slope}x + ${intercept}`);
console.log(`Train MSE: ${mse(train, slope, intercept).toFixed(2)}`);
console.log(`Test MSE: ${mse(test, slope, intercept).toFixed(2)}`);
// A good fit has similar train and test MSEInteractive Experiment
Try these exercises to see overfitting in action:
- Generate only 5 training points and fit the model. How does test MSE compare to train MSE?
- Increase training data to 1000 points. Does the gap between train and test MSE shrink?
- Increase the noise multiplier from 4 to 20. How does more noise affect the model?
- Add an L2 regularization term: penalize large weights by adding
lambda * (slope^2)to the loss. Does it improve test performance?
Quick Quiz
Coding Challenge
Write a function called `ridgeRegression` that performs linear regression with L2 regularization. Given training data and a regularization strength `lambda`, it should learn weight and bias by running gradient descent where the weight gradient includes a penalty term: gradient = normal_gradient + 2 * lambda * weight. Return the trained weight and bias as an object.
Real-World Usage
Overfitting and regularization are central concerns in every production ML system:
- Deep learning: Dropout is used in nearly every neural network to prevent co-adaptation of neurons during training.
- Natural language processing: Weight decay (L2 regularization) is standard when fine-tuning large language models on small datasets.
- Computer vision: Data augmentation (flipping, rotating, cropping images) acts as implicit regularization by expanding the effective training set.
- Medical AI: With limited patient data, regularization is critical to prevent models from memorizing individual patients rather than learning disease patterns.
- Ensemble methods: Random forests and gradient boosting use tree depth limits and minimum sample sizes as regularization to prevent overfitting.