Why This Matters
Not all data comes with labels. When a streaming service groups users by viewing habits, or when a security system flags unusual network activity, there are no pre-existing "correct answers" to learn from. Unsupervised learning discovers structure and patterns that humans might never notice on their own.
While supervised learning is like studying with an answer key, unsupervised learning is like exploring a new city without a map. You find natural groupings, hidden dimensions, and outliers — all without anyone telling you what to look for.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
Unsupervised learning takes unlabeled data and discovers patterns through clustering, dimensionality reduction, or anomaly detection.
Code Example
// K-Means clustering from scratch
// Groups data points into k clusters
function kMeans(points, k, maxIter = 100) {
// Initialize centroids randomly from data
let centroids = points.slice(0, k).map(p => [...p]);
for (let iter = 0; iter < maxIter; iter++) {
// Assign each point to nearest centroid
const clusters = Array.from({ length: k }, () => []);
for (const point of points) {
let minDist = Infinity;
let closest = 0;
for (let c = 0; c < k; c++) {
const dist = Math.hypot(
point[0] - centroids[c][0],
point[1] - centroids[c][1]
);
if (dist < minDist) {
minDist = dist;
closest = c;
}
}
clusters[closest].push(point);
}
// Update centroids to cluster means
for (let c = 0; c < k; c++) {
if (clusters[c].length === 0) continue;
centroids[c] = [
clusters[c].reduce((s, p) => s + p[0], 0) / clusters[c].length,
clusters[c].reduce((s, p) => s + p[1], 0) / clusters[c].length,
];
}
}
return centroids;
}
const data = [[1,1],[1.5,2],[2,1],[5,5],[6,5],[5.5,6]];
console.log(kMeans(data, 2));
// Two centroids near [1.5, 1.3] and [5.5, 5.3]Interactive Experiment
Try modifying the code above:
- Change
kfrom 2 to 3. How do the clusters change? - Add a third cluster of points around
[10, 10]and run with k=3. - What happens if you set k higher than the natural number of groups?
- Try initializing centroids differently — does it affect the final result?
Quick Quiz
Coding Challenge
Write a function called `assignClusters` that takes an array of 2D points and an array of centroid positions, and returns an array of cluster indices (0-based) indicating which centroid each point is closest to. Use Euclidean distance.
Real-World Usage
Unsupervised learning is used wherever labeled data is scarce or nonexistent:
- Customer segmentation: Grouping shoppers by purchasing behavior to tailor marketing campaigns.
- Anomaly detection: Identifying fraudulent credit card transactions that deviate from normal patterns.
- Image compression: Reducing the number of colors in an image by clustering similar pixel values.
- Topic modeling: Discovering themes in large document collections without predefined categories.
- Gene expression analysis: Grouping genes with similar activation patterns to find biological pathways.