databases30 min

Indexes

Speeding up queries with B-tree and hash indexes

0/9Not Started

Why This Matters

Your database has a million users. You run SELECT * FROM users WHERE email = 'alice@test.com'. Without an index, the database must scan every single row -- all one million of them -- to find Alice. That is a full table scan, and it is painfully slow. With an index on the email column, the database jumps directly to Alice's row in milliseconds.

Indexes are the single most important performance tool in databases. A slow query that takes 30 seconds can drop to 5 milliseconds with the right index. But indexes are not free -- they consume disk space and slow down writes. Understanding when and how to use them is what separates a functioning app from a fast one.

Define Terms

Visual Model

Root[10, 20]
Branch[3, 7]
Branch[15, 18]
Leaf1, 2, 3
Leaf4, 5, 7
Leaf10, 12, 15
Leaf16, 18, 20
Row Dataid=12, email=alice@...
WHERE id = 12
lookup

The full process at a glance. Click Start tour to walk through each step.

Indexes turn O(n) full-table scans into O(log n) or O(1) lookups, but at the cost of write performance.

Code Example

Code
-- Without an index: full table scan
SELECT * FROM users WHERE email = alice@test.com;
-- Scans all rows, O(n)

-- Create a B-tree index on the email column
CREATE INDEX idx_users_email ON users(email);

-- Now the same query uses the index: O(log n)
SELECT * FROM users WHERE email = alice@test.com;

-- See the query plan with EXPLAIN
EXPLAIN QUERY PLAN
SELECT * FROM users WHERE email = alice@test.com;
-- Output: SEARCH users USING INDEX idx_users_email (email=?)

-- Composite index: multiple columns
CREATE INDEX idx_orders_user_date
ON orders(user_id, created_at);

-- This index helps queries that filter by user_id,
-- or by user_id AND created_at together
SELECT * FROM orders
WHERE user_id = 1 AND created_at > 2024-01-01;

-- But NOT queries that only filter by created_at
-- (leftmost column must be in the WHERE clause)

-- Unique index: also enforces uniqueness
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Drop an index you no longer need
DROP INDEX idx_users_email;

Interactive Experiment

Try these exercises:

  • Create a table with 10,000 rows (you can generate them with a loop in SQL or Python). Time a query with and without an index.
  • Run EXPLAIN QUERY PLAN (SQLite) or EXPLAIN ANALYZE (PostgreSQL) on a query before and after adding an index. Compare the output.
  • Create a composite index on two columns. Test which queries use it and which do not (hint: the leftmost column matters).
  • Try adding 5 indexes to a table, then benchmark INSERT performance vs a table with no indexes.

Quick Quiz

Coding Challenge

Build a Simple In-Memory Index

Write a class called `IndexedTable` that stores rows (objects) and maintains a hash index on a specified column. It should support `insert(row)`, `findByIndex(value)` for O(1) lookup by the indexed column, and `scan(filterFn)` for O(n) full-table scan when no index helps.

Loading editor...

Real-World Usage

Indexes are critical in every production database:

  • Login systems: An index on the email column ensures instant user lookup during authentication instead of scanning millions of rows.
  • Search features: Composite indexes on (category, price) let e-commerce sites filter products efficiently.
  • Analytics dashboards: Indexes on timestamp columns make "last 7 days" queries fast even on tables with billions of rows.
  • Full-text search: Specialized indexes (GIN, GiST in PostgreSQL) support keyword search across text columns.
  • Unique constraints: Unique indexes prevent duplicate emails, usernames, or order numbers at the database level.

Connections