Why This Matters
Does a new drug actually work, or did patients just get lucky? Does version B of your website genuinely convert better than version A, or is the difference just random noise? Hypothesis testing gives you a rigorous framework to answer these questions. Instead of guessing, you compute the probability of seeing your data if there were no real effect. If that probability is tiny, you conclude the effect is real.
The null hypothesis (H0) is the default assumption — usually "there is no effect" or "there is no difference." The p-value measures how surprising your data is under the null hypothesis. A small p-value (typically less than 0.05) means the data is very unlikely if H0 were true, so you reject H0 in favor of the alternative. This framework is the backbone of scientific research, clinical trials, quality control, and every A/B test run by every tech company.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
Hypothesis testing: state H0, collect data, compute a test statistic, find the p-value, and make a decision.
Code Example
// Z-test: is the sample mean significantly different from mu0?
function zTest(data, mu0) {
const n = data.length;
const mean = data.reduce((a, b) => a + b, 0) / n;
const variance = data.reduce((a, b) => a + (b - mean) ** 2, 0) / (n - 1);
const se = Math.sqrt(variance / n);
const z = (mean - mu0) / se;
// Approximate two-tailed p-value using Normal CDF
// Using the error function approximation
function normalCDF(x) {
const t = 1 / (1 + 0.2316419 * Math.abs(x));
const d = 0.3989422804 * Math.exp(-x * x / 2);
const p = d * t * (0.3193815 + t * (-0.3565638 + t * (1.781478 + t * (-1.821256 + t * 1.330274))));
return x > 0 ? 1 - p : p;
}
const pValue = 2 * (1 - normalCDF(Math.abs(z)));
return { mean, z, pValue, se };
}
// Example: test if average is different from 100
const data = [102, 105, 98, 110, 97, 103, 108, 101, 106, 99,
104, 107, 95, 111, 100, 103, 109, 96, 105, 102];
const result = zTest(data, 100);
console.log(`Sample mean: ${result.mean.toFixed(2)}`);
console.log(`Z-statistic: ${result.z.toFixed(4)}`);
console.log(`P-value: ${result.pValue.toFixed(4)}`);
console.log(`Significant at 0.05? ${result.pValue < 0.05}`);
// Example: no real effect
const noEffect = [100, 101, 99, 100, 101, 99, 100, 100, 101, 99];
const result2 = zTest(noEffect, 100);
console.log(`\nNo-effect P-value: ${result2.pValue.toFixed(4)}`);
console.log(`Significant? ${result2.pValue < 0.05}`);Interactive Experiment
Try these exercises:
- Run the z-test with data that truly has mean 100 (generate random Normal(100, 10) samples). How often do you get p below 0.05? It should be about 5% of the time (false positives).
- Try sample sizes of 10, 50, and 200 from Normal(102, 10). For which sample size does the test detect the small effect (mean=102 vs. mu0=100)?
- Change alpha from 0.05 to 0.01. Does the same data set still get rejected? What is the tradeoff?
- Generate 20 independent tests where H0 is true. On average, how many show p below 0.05? This demonstrates the multiple testing problem.
- What happens to the z-statistic if you multiply all data by 10 but also change mu0 proportionally? Does the p-value change?
Quick Quiz
Coding Challenge
Write a function `performZTest(data, mu0, alpha)` that performs a two-tailed z-test. It should return 'reject' if the p-value is less than alpha, and 'fail to reject' otherwise. Compute z = (mean - mu0) / (sd / sqrt(n)) where sd uses n-1 in the denominator. For the p-value, use: pValue = 2 * (1 - normalCDF(|z|)) where normalCDF(x) = 0.5 * (1 + erf(x / sqrt(2))). For JavaScript, use this erf approximation: erf(x) = 1 - (1/(1 + 0.3275911*|x|))^2 * ... (or just use a simpler approach).
Real-World Usage
Hypothesis testing drives decisions across every industry:
- A/B testing at tech companies: Google, Netflix, and Amazon run thousands of A/B tests. Each test uses hypothesis testing to decide if a change improves engagement, revenue, or retention.
- Clinical drug trials: Before a drug reaches the market, randomized controlled trials use hypothesis tests to determine if the drug is more effective than a placebo. The FDA requires statistical significance.
- Manufacturing quality control: Statistical process control uses hypothesis tests to detect when a production line has drifted from its target specifications, triggering corrective action.
- Academic research: Nearly every scientific paper that claims a discovery uses hypothesis testing. The p below 0.05 threshold (though debated) remains the standard for publication.
- Fraud detection: Financial systems test whether transaction patterns are significantly different from normal behavior to flag potential fraud.