Why This Matters
How much does an extra year of experience increase salary? How does temperature affect ice cream sales? Linear regression answers these questions by finding the straight line that best fits a set of data points. It is the single most widely used statistical technique in the world, and understanding it is a prerequisite for machine learning, econometrics, and data science.
The least squares method finds the line that minimizes the sum of squared prediction errors. The slope tells you how much y changes for each unit increase in x. R-squared measures how well the line fits the data — an R-squared of 0.9 means the line explains 90% of the variation in y. From predicting house prices to calibrating scientific instruments to training neural networks (which are just stacked regressions), this technique is everywhere.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
Linear regression finds slope and intercept that minimize squared errors. R-squared measures how well the line fits.
Code Example
// Simple linear regression: y = mx + b
function linearRegression(xArr, yArr) {
const n = xArr.length;
const xMean = xArr.reduce((a, b) => a + b, 0) / n;
const yMean = yArr.reduce((a, b) => a + b, 0) / n;
// Slope: m = sum((xi - xMean)(yi - yMean)) / sum((xi - xMean)^2)
let numerator = 0, denominator = 0;
for (let i = 0; i < n; i++) {
numerator += (xArr[i] - xMean) * (yArr[i] - yMean);
denominator += (xArr[i] - xMean) ** 2;
}
const slope = numerator / denominator;
const intercept = yMean - slope * xMean;
// R-squared
let sse = 0, sst = 0;
for (let i = 0; i < n; i++) {
const predicted = slope * xArr[i] + intercept;
sse += (yArr[i] - predicted) ** 2;
sst += (yArr[i] - yMean) ** 2;
}
const rSquared = 1 - sse / sst;
return { slope, intercept, rSquared };
}
// Example: study hours vs. exam score
const hours = [1, 2, 3, 4, 5, 6, 7, 8];
const scores = [52, 58, 65, 70, 74, 80, 85, 91];
const result = linearRegression(hours, scores);
console.log(`Slope: ${result.slope.toFixed(2)}`); // ~5.5
console.log(`Intercept: ${result.intercept.toFixed(2)}`); // ~47.5
console.log(`R-squared: ${result.rSquared.toFixed(4)}`); // ~0.99
// Predict score for 10 hours of study
const predicted = result.slope * 10 + result.intercept;
console.log(`Predicted score for 10 hours: ${predicted.toFixed(1)}`);Interactive Experiment
Try these exercises:
- Add a noisy data point like (5, 120) to the hours/scores data. How does it affect the slope and R-squared?
- Generate perfectly linear data (y = 3x + 2 for x = 1..10). What is R-squared? Now add random noise to y and watch R-squared drop.
- Try regression on data with no relationship (random x, random y). What do you expect for slope and R-squared?
- Compute the regression line for two different datasets with the same mean but different spreads. How does spread affect R-squared?
- Verify that the regression line always passes through the point (x_mean, y_mean).
Quick Quiz
Coding Challenge
Write a function `simpleRegression(xArr, yArr)` that computes the slope and intercept of the least-squares regression line. Return a string in the format 'slope,intercept' with both values rounded to 2 decimal places. Use the formulas: slope = sum((xi-xMean)(yi-yMean)) / sum((xi-xMean)^2), intercept = yMean - slope * xMean.
Real-World Usage
Linear regression is foundational across industries:
- Machine learning: Linear regression is the simplest supervised learning algorithm. Understanding it is essential before moving to logistic regression, neural networks, and other complex models.
- Economics: Regression models estimate the effect of education on wages, interest rates on inflation, and advertising spend on sales, controlling for other variables.
- Engineering: Calibration curves (mapping sensor readings to actual values) are regression lines. Thermal expansion, stress-strain relationships, and dose-response curves all use regression.
- Real estate: Housing price prediction models start with linear regression on features like square footage, bedrooms, and location. Zillow Zestimates use regression at massive scale.
- Healthcare: Regression models predict patient outcomes based on vital signs, lab results, and demographics. They power clinical decision support systems.