probability stats25 min

Central Limit Theorem

Why averages are Normal — the most important theorem in statistics

0/9Not Started

Why This Matters

Roll a die once and you get any number from 1 to 6 with equal probability — that is a flat, uniform distribution, not a bell curve at all. But roll 100 dice and take their average, and something remarkable happens: that average will be very close to 3.5 and will follow a bell curve. Repeat this experiment thousands of times and you get a perfect Normal distribution. This is the Central Limit Theorem (CLT), arguably the most important result in all of statistics.

The CLT says that the sampling distribution of the mean approaches a Normal distribution as the sample size grows, regardless of the shape of the original distribution. The standard error — the standard deviation of this sampling distribution — shrinks as the sample size increases, specifically as sigma / sqrt(n). This is why larger samples give more precise estimates. The CLT is the foundation of confidence intervals, hypothesis tests, and polling accuracy.

Define Terms

Visual Model

Any PopulationAny shape distribution
Take Many SamplesEach of size n
Compute Sample MeansOne mean per sample
Sampling DistributionDistribution of means
Approximately NormalBell curve shape
Mean = Population MeanCentered at mu
SE = sigma / sqrt(n)Gets smaller with more data

The full process at a glance. Click Start tour to walk through each step.

The CLT: no matter what shape the population has, sample means form a bell curve. Larger samples make the bell curve narrower.

Code Example

Code
// Demonstrate the Central Limit Theorem
// Population: uniform die rolls (not Normal at all)
function rollDie() {
  return Math.floor(Math.random() * 6) + 1;
}

// Take a sample of size n and return the mean
function sampleMean(n) {
  let sum = 0;
  for (let i = 0; i < n; i++) {
    sum += rollDie();
  }
  return sum / n;
}

// Generate 10000 sample means for different sample sizes
function cltDemo(sampleSize, numSamples) {
  const means = [];
  for (let i = 0; i < numSamples; i++) {
    means.push(sampleMean(sampleSize));
  }
  
  // Compute mean and SD of the sample means
  const avg = means.reduce((a, b) => a + b, 0) / means.length;
  const variance = means.reduce((a, b) => a + (b - avg) ** 2, 0) / means.length;
  const sd = Math.sqrt(variance);
  
  return { mean: avg, sd: sd };
}

// Population: die has mean 3.5, SD = 1.708
console.log("Population mean: 3.5, SD: 1.708");

// CLT with increasing sample sizes
const n5 = cltDemo(5, 10000);
console.log(`n=5:  mean=${n5.mean.toFixed(3)}, SE=${n5.sd.toFixed(3)} (theory: ${(1.708/Math.sqrt(5)).toFixed(3)})`);

const n30 = cltDemo(30, 10000);
console.log(`n=30: mean=${n30.mean.toFixed(3)}, SE=${n30.sd.toFixed(3)} (theory: ${(1.708/Math.sqrt(30)).toFixed(3)})`);

const n100 = cltDemo(100, 10000);
console.log(`n=100: mean=${n100.mean.toFixed(3)}, SE=${n100.sd.toFixed(3)} (theory: ${(1.708/Math.sqrt(100)).toFixed(3)})`);

Interactive Experiment

Try these exercises:

  • Simulate the CLT with a highly skewed distribution (e.g., exponential: -Math.log(Math.random())). Does the sample mean still converge to Normal?
  • For n=5, n=30, and n=100, compute the theoretical standard error sigma/sqrt(n) and compare to your simulated SE. How close are they?
  • At what sample size does a uniform distribution start looking Normal? Try n=2, n=5, n=10, n=30 and compare histograms visually.
  • Verify that the mean of the sampling distribution equals the population mean, regardless of n.
  • Double the sample size from 25 to 100. By what factor does the standard error decrease? Confirm it is sqrt(4) = 2.

Quick Quiz

Coding Challenge

Simulate the Central Limit Theorem

Write a function `simulateCLT(sampleSize, numSamples)` that: (1) generates numSamples random samples, each of size sampleSize, from a uniform distribution on [1, 6] (integers, like a die), (2) computes the mean of each sample, and (3) returns the mean and standard deviation of those sample means, rounded to 2 decimal places, as a string in the format 'mean,sd'. Use Math.floor(Math.random() * 6) + 1 for die rolls.

Loading editor...

Real-World Usage

The Central Limit Theorem is the backbone of modern statistics:

  • Polling and surveys: Political polls sample a few thousand people to estimate the opinions of millions. The CLT guarantees the sample mean is close to the true proportion, with error bounded by the standard error.
  • A/B testing: When comparing two website variants, the average metrics (click rate, revenue) follow approximate Normal distributions due to the CLT, enabling z-tests and confidence intervals.
  • Quality control: Manufacturing uses control charts based on the CLT. Sample means of batch measurements should fall within known bands if the process is stable.
  • Finance: Portfolio returns are averages of many asset returns. The CLT justifies using Normal-based risk models even when individual asset returns are not Normal.
  • Machine learning: Stochastic gradient descent averages gradients over mini-batches. The CLT explains why larger batch sizes give more stable gradient estimates.

Connections