Why This Matters
Imagine a spreadsheet where every order row also contains the customer's name, address, and phone number. When a customer moves, you have to update their address in every single order row. Miss one, and now you have conflicting addresses for the same person. This is the problem normalization solves -- it organizes data to eliminate redundancy and prevent inconsistencies.
Normalization is a formal process with specific rules called normal forms. It is how database designers decide which columns belong in which tables. Get it right, and your data stays consistent and your queries stay clean. Get it wrong, and you will fight data corruption and update anomalies for the life of your application.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
Normalization removes redundancy step by step. Each normal form eliminates a class of data anomalies.
Code Example
-- UNNORMALIZED: customer data repeated in every order
CREATE TABLE orders_bad (
order_id INTEGER PRIMARY KEY,
customer_name TEXT,
customer_email TEXT,
customer_city TEXT,
product_name TEXT,
product_price DECIMAL(10,2),
quantity INTEGER
);
-- Problem: if Alice moves cities, update EVERY order row
-- Problem: if product price changes, old orders show wrong price
-- NORMALIZED (3NF): separate concerns into distinct tables
CREATE TABLE customers (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
city TEXT
);
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
price DECIMAL(10,2) NOT NULL
);
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE order_items (
id INTEGER PRIMARY KEY,
order_id INTEGER REFERENCES orders(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL,
unit_price DECIMAL(10,2) NOT NULL -- snapshot at time of order
);
-- Now: update customer city in ONE place
UPDATE customers SET city = New York WHERE id = 1;
-- Query with JOINs to reconstruct the full picture
SELECT c.name, p.name AS product, oi.quantity, oi.unit_price
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.id = 1;Interactive Experiment
Try these exercises:
- Take a flat spreadsheet with repeated data (e.g., order rows with customer info) and normalize it into 3NF by creating separate tables.
- Identify the update anomaly: change a customer's email in the unnormalized version. How many rows must you update?
- Create the normalized version and make the same change. How many rows now?
- Think of a case where denormalization makes sense. When would you store redundant data on purpose?
Quick Quiz
Coding Challenge
Write a function called `normalize` that takes an array of flat order objects (each with customer_name, customer_email, product_name, product_price, quantity) and returns a normalized structure with separate customers, products, and orders collections. Each entity should have a unique ID, and orders should reference customers and products by ID.
Real-World Usage
Normalization decisions affect every aspect of application design:
- Schema design: Database architects normalize to 3NF by default, then selectively denormalize based on query patterns.
- ORMs: Object-relational mappers expect normalized schemas. Each model class maps to one table, and relationships map to foreign keys.
- Data warehousing: Analytics databases often denormalize into "star schemas" with a central fact table surrounded by dimension tables, optimized for aggregate queries.
- Caching layers: Denormalized cache entries (Redis, Memcached) store pre-joined data to avoid expensive real-time JOINs.
- API design: REST resources often map to normalized tables. A
/users/1/ordersendpoint performs the JOIN for you.