probability stats30 min

Hypothesis Testing

Making decisions from data — null hypotheses, p-values, and significance

0/9Not Started

Why This Matters

Does a new drug actually work, or did patients just get lucky? Does version B of your website genuinely convert better than version A, or is the difference just random noise? Hypothesis testing gives you a rigorous framework to answer these questions. Instead of guessing, you compute the probability of seeing your data if there were no real effect. If that probability is tiny, you conclude the effect is real.

The null hypothesis (H0) is the default assumption — usually "there is no effect" or "there is no difference." The p-value measures how surprising your data is under the null hypothesis. A small p-value (typically less than 0.05) means the data is very unlikely if H0 were true, so you reject H0 in favor of the alternative. This framework is the backbone of scientific research, clinical trials, quality control, and every A/B test run by every tech company.

Define Terms

Visual Model

Research QuestionIs there an effect?
Null Hypothesis H0No effect (default)
Alternative H1There IS an effect
Collect DataRun experiment
Test Statisticz = (x-bar - mu0) / SE
P-valueProbability of data if H0 true
Reject H0p < alpha (significant)
Fail to Reject H0p >= alpha (not significant)

The full process at a glance. Click Start tour to walk through each step.

Hypothesis testing: state H0, collect data, compute a test statistic, find the p-value, and make a decision.

Code Example

Code
// Z-test: is the sample mean significantly different from mu0?
function zTest(data, mu0) {
  const n = data.length;
  const mean = data.reduce((a, b) => a + b, 0) / n;
  const variance = data.reduce((a, b) => a + (b - mean) ** 2, 0) / (n - 1);
  const se = Math.sqrt(variance / n);
  const z = (mean - mu0) / se;
  
  // Approximate two-tailed p-value using Normal CDF
  // Using the error function approximation
  function normalCDF(x) {
    const t = 1 / (1 + 0.2316419 * Math.abs(x));
    const d = 0.3989422804 * Math.exp(-x * x / 2);
    const p = d * t * (0.3193815 + t * (-0.3565638 + t * (1.781478 + t * (-1.821256 + t * 1.330274))));
    return x > 0 ? 1 - p : p;
  }
  
  const pValue = 2 * (1 - normalCDF(Math.abs(z)));
  
  return { mean, z, pValue, se };
}

// Example: test if average is different from 100
const data = [102, 105, 98, 110, 97, 103, 108, 101, 106, 99,
              104, 107, 95, 111, 100, 103, 109, 96, 105, 102];
const result = zTest(data, 100);
console.log(`Sample mean: ${result.mean.toFixed(2)}`);
console.log(`Z-statistic: ${result.z.toFixed(4)}`);
console.log(`P-value: ${result.pValue.toFixed(4)}`);
console.log(`Significant at 0.05? ${result.pValue < 0.05}`);

// Example: no real effect
const noEffect = [100, 101, 99, 100, 101, 99, 100, 100, 101, 99];
const result2 = zTest(noEffect, 100);
console.log(`\nNo-effect P-value: ${result2.pValue.toFixed(4)}`);
console.log(`Significant? ${result2.pValue < 0.05}`);

Interactive Experiment

Try these exercises:

  • Run the z-test with data that truly has mean 100 (generate random Normal(100, 10) samples). How often do you get p below 0.05? It should be about 5% of the time (false positives).
  • Try sample sizes of 10, 50, and 200 from Normal(102, 10). For which sample size does the test detect the small effect (mean=102 vs. mu0=100)?
  • Change alpha from 0.05 to 0.01. Does the same data set still get rejected? What is the tradeoff?
  • Generate 20 independent tests where H0 is true. On average, how many show p below 0.05? This demonstrates the multiple testing problem.
  • What happens to the z-statistic if you multiply all data by 10 but also change mu0 proportionally? Does the p-value change?

Quick Quiz

Coding Challenge

Perform a Z-Test

Write a function `performZTest(data, mu0, alpha)` that performs a two-tailed z-test. It should return 'reject' if the p-value is less than alpha, and 'fail to reject' otherwise. Compute z = (mean - mu0) / (sd / sqrt(n)) where sd uses n-1 in the denominator. For the p-value, use: pValue = 2 * (1 - normalCDF(|z|)) where normalCDF(x) = 0.5 * (1 + erf(x / sqrt(2))). For JavaScript, use this erf approximation: erf(x) = 1 - (1/(1 + 0.3275911*|x|))^2 * ... (or just use a simpler approach).

Loading editor...

Real-World Usage

Hypothesis testing drives decisions across every industry:

  • A/B testing at tech companies: Google, Netflix, and Amazon run thousands of A/B tests. Each test uses hypothesis testing to decide if a change improves engagement, revenue, or retention.
  • Clinical drug trials: Before a drug reaches the market, randomized controlled trials use hypothesis tests to determine if the drug is more effective than a placebo. The FDA requires statistical significance.
  • Manufacturing quality control: Statistical process control uses hypothesis tests to detect when a production line has drifted from its target specifications, triggering corrective action.
  • Academic research: Nearly every scientific paper that claims a discovery uses hypothesis testing. The p below 0.05 threshold (though debated) remains the standard for publication.
  • Fraud detection: Financial systems test whether transaction patterns are significantly different from normal behavior to flag potential fraud.

Connections