B+ Trees: Leaf-Linked Trees for Range Queries and Database Indexing
How B+ Trees extend B-Trees by storing all data in linked leaf nodes, enabling fast range scans that power every major database index.
Terminology
- B+ Tree: a variant of the B-Tree where all actual data records (or pointers to them) live exclusively in leaf nodes
- Internal node: a node that stores only keys and child pointers, never data; it acts purely as a routing guide
- Leaf node: a node at the bottom level that holds keys and their associated data (or record pointers)
- Leaf chain: a doubly or singly linked list connecting all leaf nodes left to right in sorted key order
- Range query: a search for all keys between a lower bound and an upper bound, e.g., "find all orders between Jan 1 and Jan 31"
- Fan-out: the number of child pointers in an internal node; higher fan-out means a shallower tree
- Clustered index: an index where the physical order of rows on disk matches the index order
- Non-clustered index: an index where leaf entries point to rows stored elsewhere on disk
- Fill factor: the percentage of a node's capacity that is filled; databases let you tune this to leave room for future inserts
- Page: a fixed-size block (typically 4-16 KB) that the database reads from disk in one I/O operation; each B+ Tree node maps to one page
What & Why
A B+ Tree is the data structure behind nearly every database index you have ever used. It is a refinement of the B-Tree with one critical difference: internal nodes store only keys for routing, and all actual data lives in the leaf nodes. The leaves are linked together in a sorted chain, which makes range queries trivially fast: find the starting leaf, then walk the chain.
Why does this matter? Databases spend most of their time answering two kinds of queries: point lookups ("find the row with id = 42") and range scans ("find all rows where date is between X and Y"). A plain B-Tree handles point lookups well, but range scans require repeated tree traversals. The B+ Tree's leaf chain turns a range scan into a sequential read, which is exactly what disks are good at.
Key differences from a B-Tree:
- Internal nodes hold only keys and child pointers, no data. This means more keys fit per internal node, increasing fan-out and reducing tree height.
- All data lives in leaf nodes. Every search, whether point or range, ends at a leaf.
- Leaf nodes are linked in sorted order, enabling sequential scans without backtracking up the tree.
- Keys in internal nodes are duplicated: they appear both as routing keys in internal nodes and as actual keys in leaves.
How It Works
Structure
Internal nodes partition the key space. Each leaf holds a sorted run of keys with associated data pointers, plus a "next" pointer to the following leaf.
Point Lookup
Start at the root. At each internal node, compare the search key against the routing keys to pick the correct child pointer. Repeat until you reach a leaf. Scan the leaf's keys for a match. The cost is one disk read per tree level.
Range Query
Perform a point lookup for the lower bound to land on the correct leaf. Then follow the leaf chain forward, reading consecutive leaves until you pass the upper bound. Because leaves are linked and sorted, this is a sequential scan, which is very fast on disk.
Insertion
Find the target leaf via a root-to-leaf traversal. Insert the key in sorted position within the leaf. If the leaf overflows (more than $m - 1$ keys), split it into two leaves, copy the first key of the new right leaf up to the parent as a routing key, and link the new leaf into the chain. Splits can cascade upward just like in a B-Tree.
Deletion
Find the target leaf and remove the key. If the leaf underflows (fewer than $\lceil m/2 \rceil - 1$ keys), borrow from a sibling or merge with a sibling. When merging, update the parent's routing keys. Note that even if a deleted key appears as a routing key in an internal node, it can remain there: routing keys only need to correctly partition the key space, not correspond to existing data.
Complexity Analysis
For a B+ Tree of order $m$ with $n$ keys:
| Operation | Disk I/O | Notes |
|---|---|---|
| Point lookup | $O(\log_m n)$ | One page read per level |
| Range query ($k$ results) | $O(\log_m n + k/m)$ | Traverse to start leaf, then scan $k/m$ consecutive leaves |
| Insert | $O(\log_m n)$ | Plus possible splits |
| Delete | $O(\log_m n)$ | Plus possible merges |
| Space | $O(n)$ | Slight overhead from duplicated routing keys |
The range query cost is the key advantage. In a plain B-Tree, a range scan of $k$ results could require $O(k \cdot \log_m n)$ I/O in the worst case (re-traversing from the root for each result). The B+ Tree's leaf chain reduces this to $O(\log_m n + k/m)$.
Because internal nodes store only keys (no data pointers), more keys fit per page. If data pointers take $d$ bytes each, a B+ Tree internal node fits roughly $d/\text{key\_size}$ more keys than a B-Tree node of the same page size, increasing the effective fan-out.
Implementation
ALGORITHM BPlusTreeSearch(tree, key)
INPUT: tree: a B+ Tree, key: the value to find
OUTPUT: (leaf, index) or NOT_FOUND
BEGIN
node <- tree.root
WHILE NOT node.isLeaf DO
i <- 0
WHILE i < node.numKeys AND key >= node.keys[i] DO
i <- i + 1
END WHILE
node <- DiskRead(node.children[i])
END WHILE
FOR i <- 0 TO node.numKeys - 1 DO
IF node.keys[i] = key THEN
RETURN (node, i)
END IF
END FOR
RETURN NOT_FOUND
END
ALGORITHM BPlusTreeRangeScan(tree, low, high)
INPUT: tree: a B+ Tree, low: lower bound, high: upper bound
OUTPUT: list of all keys where low <= key <= high
BEGIN
result <- empty list
leaf <- FindLeaf(tree, low)
WHILE leaf IS NOT NULL DO
FOR i <- 0 TO leaf.numKeys - 1 DO
IF leaf.keys[i] > high THEN
RETURN result
END IF
IF leaf.keys[i] >= low THEN
Append(result, leaf.keys[i])
END IF
END FOR
leaf <- leaf.next
END WHILE
RETURN result
END
ALGORITHM BPlusTreeInsert(tree, key, data)
INPUT: tree, key to insert, associated data
BEGIN
leaf <- FindLeaf(tree, key)
InsertIntoLeaf(leaf, key, data)
IF leaf.numKeys > MaxLeafKeys THEN
SplitLeaf(leaf)
END IF
END
ALGORITHM SplitLeaf(leaf)
BEGIN
newLeaf <- AllocateLeafNode()
mid <- leaf.numKeys / 2
Move keys[mid .. leaf.numKeys-1] and data to newLeaf
newLeaf.next <- leaf.next
leaf.next <- newLeaf
leaf.numKeys <- mid
newLeaf.numKeys <- original numKeys - mid
InsertIntoParent(leaf, newLeaf.keys[0], newLeaf)
END
Real-World Applications
- MySQL/InnoDB: the default storage engine uses B+ Trees for both clustered (primary key) and secondary indexes; the clustered index stores actual row data in leaf pages
- PostgreSQL: uses B+ Tree indexes (called "btree" in PostgreSQL) as the default index type for most queries
- SQLite: stores both table data and indexes as B+ Trees within its single-file database format
- File systems: NTFS uses B+ Trees for its Master File Table; ext4's directory indexing (htree) is a B+ Tree variant
- MongoDB: WiredTiger storage engine uses B+ Trees for its indexes alongside LSM Trees for certain workloads
Key Takeaways
- B+ Trees separate routing (internal nodes) from data storage (leaf nodes), maximizing fan-out and minimizing tree height
- The linked leaf chain enables $O(\log_m n + k/m)$ range scans, which is why every major relational database uses B+ Trees for indexing
- Point lookups cost the same as a B-Tree: $O(\log_m n)$ disk reads
- Internal nodes hold more keys per page than B-Tree nodes (no data pointers), so the tree is shallower for the same dataset
- B+ Trees are the single most important data structure in database engineering