Why This Matters
Singular Value Decomposition (SVD) is arguably the most important factorization in all of linear algebra. Every matrix -- any size, any rank -- can be decomposed into the product of three special matrices: A = U * Sigma * V^T. Here U and V are orthogonal matrices (their columns are perpendicular unit vectors) and Sigma is a diagonal matrix of non-negative values called singular values. This decomposition reveals the fundamental structure of the transformation: the directions it acts on (V), how much it stretches (Sigma), and where the results point (U).
Principal Component Analysis (PCA) is the most widely used technique in dimensionality reduction. Given a dataset with many features (columns), PCA finds the directions of maximum variance -- the axes along which the data spreads out the most. By keeping only the top few principal components, you can reduce a 1000-dimensional dataset to 50 dimensions while preserving most of the information. This is essential for visualization, noise reduction, and speeding up machine learning algorithms.
PCA is intimately connected to SVD and eigenvalues. The principal components are the eigenvectors of the data covariance matrix, and the amount of variance each component captures is the corresponding eigenvalue. In practice, PCA is computed via SVD because it is more numerically stable. When you run PCA in scikit-learn or any other library, SVD is running under the hood.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
SVD decomposes any matrix into orthogonal directions and stretch factors. PCA uses this to find the most informative directions in data and reduce dimensionality.
Code Example
// SVD concepts and PCA from scratch in JavaScript
// Step 1: Compute the covariance matrix of centered data
function mean(arr) {
return arr.reduce((s, v) => s + v, 0) / arr.length;
}
function centerData(X) {
// X is array of row-arrays (each row is a data point)
const nCols = X[0].length;
const means = [];
for (let j = 0; j < nCols; j++) {
means.push(mean(X.map(row => row[j])));
}
return X.map(row => row.map((val, j) => val - means[j]));
}
function covarianceMatrix(X) {
// X should be centered
const n = X.length;
const p = X[0].length;
const C = Array.from({length: p}, () => new Array(p).fill(0));
for (let i = 0; i < p; i++) {
for (let j = 0; j < p; j++) {
let sum = 0;
for (let k = 0; k < n; k++) {
sum += X[k][i] * X[k][j];
}
C[i][j] = sum / (n - 1);
}
}
return C;
}
// Example dataset: 5 points in 2D
const data = [[2, 4], [4, 6], [6, 8], [3, 7], [5, 5]];
const centered = centerData(data);
console.log("Centered data:", centered);
const cov = covarianceMatrix(centered);
console.log("Covariance matrix:");
cov.forEach(row => console.log(" ", row.map(v => v.toFixed(2))));
// For 2x2 covariance: eigenvalues from quadratic formula
const trace = cov[0][0] + cov[1][1];
const det = cov[0][0] * cov[1][1] - cov[0][1] * cov[1][0];
const disc = Math.sqrt(trace * trace - 4 * det);
const lambda1 = (trace + disc) / 2;
const lambda2 = (trace - disc) / 2;
console.log("Eigenvalues:", lambda1.toFixed(4), lambda2.toFixed(4));
console.log("Variance explained by PC1:",
(lambda1 / (lambda1 + lambda2) * 100).toFixed(1) + "%");Interactive Experiment
Try these exercises:
- Create a dataset of 2D points that clearly cluster along a diagonal line. Run the covariance matrix computation. Which eigenvalue is much larger?
- Center the data [[1, 2], [3, 4], [5, 6]]. Verify that each column now has mean zero.
- Compute the covariance matrix of [[1, 0], [0, 1], [-1, 0], [0, -1]]. Is it diagonal? What does that mean about the features?
- Take the covariance matrix and compute its eigenvalues. Verify that they sum to the total variance (the trace of the covariance matrix).
- What percentage of variance does the first principal component capture for your dataset? Try adjusting the data to make it capture more or less.
Quick Quiz
Coding Challenge
Write a function called `covMatrix(data)` that takes a 2D array of data points (each row is a sample, each column is a feature) and returns the covariance matrix. Steps: (1) compute the mean of each column, (2) center the data by subtracting column means, (3) compute C[i][j] = sum(centered[k][i] * centered[k][j]) / (n - 1) for all pairs i, j. Return the covariance matrix as a 2D array.
Real-World Usage
SVD and PCA are among the most broadly applied techniques in data science and engineering:
- Recommendation systems: Netflix and Spotify use matrix factorization (a form of SVD) to decompose the user-item interaction matrix. The latent factors capture user preferences and item characteristics.
- Image compression: An image matrix can be approximated using only the top k singular values and their corresponding vectors, dramatically reducing storage while preserving visual quality.
- Natural language processing: Latent Semantic Analysis (LSA) applies SVD to the term-document matrix to discover hidden semantic relationships between words and documents.
- Genomics: PCA on gene expression data reveals population structure and identifies the genes that vary most across samples. It is standard in genome-wide association studies.
- Finance: PCA on stock return data identifies the main factors driving market movements. The first principal component often corresponds to overall market direction.