Why This Matters
Computers work with numbers, but the real world is full of words, images, and concepts. An embedding is a way to represent any discrete thing — a word, a user, a product — as a list of numbers (a vector) where similar things are close together in vector space.
Embeddings are the bridge between human concepts and machine learning. They power search engines, recommendation systems, and every large language model. The famous "King - Man + Woman = Queen" result showed that embeddings capture meaning as math.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
Embeddings convert discrete items into dense vectors where similarity in meaning becomes proximity in space.
Code Example
// Cosine similarity between two vectors
function cosineSimilarity(a, b) {
let dot = 0, magA = 0, magB = 0;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i];
magA += a[i] * a[i];
magB += b[i] * b[i];
}
return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}
// Simulated word embeddings (normally learned, here hardcoded)
const embeddings = {
"king": [0.9, 0.2, 0.8, 0.1],
"queen": [0.8, 0.9, 0.8, 0.1],
"man": [0.9, 0.1, 0.2, 0.3],
"woman": [0.8, 0.8, 0.2, 0.3],
"cat": [0.1, 0.3, 0.1, 0.9],
"dog": [0.2, 0.3, 0.1, 0.8],
};
// King - Man + Woman should be close to Queen
function vectorArithmetic(a, b, c) {
return a.map((val, i) => val - b[i] + c[i]);
}
const result = vectorArithmetic(
embeddings["king"],
embeddings["man"],
embeddings["woman"]
);
console.log("King - Man + Woman =", result);
// Find most similar word
for (const [word, vec] of Object.entries(embeddings)) {
console.log(word + ":", cosineSimilarity(result, vec).toFixed(3));
}
// queen should have the highest similarityInteractive Experiment
Try these exercises:
- Compute the cosine similarity between "cat" and "dog" versus "cat" and "king". Which pair is more similar?
- Try your own vector arithmetic: what would "queen" - "woman" + "man" give you?
- Change one dimension of the "cat" embedding and recompute similarities. How sensitive are the results?
- What happens to cosine similarity when two vectors point in exactly opposite directions?
Quick Quiz
Coding Challenge
Write a function called `mostSimilar` that takes a target word and an object of embeddings, and returns the word (other than the target itself) that has the highest cosine similarity to the target.
Real-World Usage
Embeddings are one of the most impactful ideas in modern machine learning:
- Search engines: Convert queries and documents into embeddings, then find documents closest to the query vector.
- Recommendation systems: Netflix embeds users and movies into the same space — your vector is close to movies you would enjoy.
- Large language models: Every token in GPT and Claude starts as an embedding that gets refined through transformer layers.
- Image search: CLIP embeds both images and text into the same vector space, enabling text-to-image search.
- Anomaly detection: Embed transactions or events; outliers are points far from any cluster.