Notes/B+ Trees: Leaf-Linked Trees for Range Queries and Database Indexing

B+ Trees: Leaf-Linked Trees for Range Queries and Database Indexing

How B+ Trees extend B-Trees by storing all data in linked leaf nodes, enabling fast range scans that power every major database index.

2021-02-02AI-Synthesized from Personal Notes

Advanced Data StructuresB Plus TreesData Structures

Terminology

B+ Tree: a variant of the B-Tree where all actual data records (or pointers to them) live exclusively in leaf nodes
Internal node: a node that stores only keys and child pointers, never data; it acts purely as a routing guide
Leaf node: a node at the bottom level that holds keys and their associated data (or record pointers)
Leaf chain: a doubly or singly linked list connecting all leaf nodes left to right in sorted key order
Range query: a search for all keys between a lower bound and an upper bound, e.g., "find all orders between Jan 1 and Jan 31"
Fan-out: the number of child pointers in an internal node; higher fan-out means a shallower tree
Clustered index: an index where the physical order of rows on disk matches the index order
Non-clustered index: an index where leaf entries point to rows stored elsewhere on disk
Fill factor: the percentage of a node's capacity that is filled; databases let you tune this to leave room for future inserts
Page: a fixed-size block (typically 4-16 KB) that the database reads from disk in one I/O operation; each B+ Tree node maps to one page

What & Why

A B+ Tree is the data structure behind nearly every database index you have ever used. It is a refinement of the B-Tree with one critical difference: internal nodes store only keys for routing, and all actual data lives in the leaf nodes. The leaves are linked together in a sorted chain, which makes range queries trivially fast: find the starting leaf, then walk the chain.

Why does this matter? Databases spend most of their time answering two kinds of queries: point lookups ("find the row with id = 42") and range scans ("find all rows where date is between X and Y"). A plain B-Tree handles point lookups well, but range scans require repeated tree traversals. The B+ Tree's leaf chain turns a range scan into a sequential read, which is exactly what disks are good at.

Key differences from a B-Tree:

Internal nodes hold only keys and child pointers, no data. This means more keys fit per internal node, increasing fan-out and reducing tree height.
All data lives in leaf nodes. Every search, whether point or range, ends at a leaf.
Leaf nodes are linked in sorted order, enabling sequential scans without backtracking up the tree.
Keys in internal nodes are duplicated: they appear both as routing keys in internal nodes and as actual keys in leaves.

How It Works

Structure

Internal nodes partition the key space. Each leaf holds a sorted run of keys with associated data pointers, plus a "next" pointer to the following leaf.

Point Lookup

Start at the root. At each internal node, compare the search key against the routing keys to pick the correct child pointer. Repeat until you reach a leaf. Scan the leaf's keys for a match. The cost is one disk read per tree level.

Range Query

Perform a point lookup for the lower bound to land on the correct leaf. Then follow the leaf chain forward, reading consecutive leaves until you pass the upper bound. Because leaves are linked and sorted, this is a sequential scan, which is very fast on disk.

Insertion

Find the target leaf via a root-to-leaf traversal. Insert the key in sorted position within the leaf. If the leaf overflows (more than $m - 1$ keys), split it into two leaves, copy the first key of the new right leaf up to the parent as a routing key, and link the new leaf into the chain. Splits can cascade upward just like in a B-Tree.

Deletion

Find the target leaf and remove the key. If the leaf underflows (fewer than $\lceil m/2 \rceil - 1$ keys), borrow from a sibling or merge with a sibling. When merging, update the parent's routing keys. Note that even if a deleted key appears as a routing key in an internal node, it can remain there: routing keys only need to correctly partition the key space, not correspond to existing data.

Complexity Analysis

For a B+ Tree of order $m$ with $n$ keys:

$h = O(\log_m n)$

Operation	Disk I/O	Notes
Point lookup	$O(\log_m n)$	One page read per level
Range query ($k$ results)	$O(\log_m n + k/m)$	Traverse to start leaf, then scan $k/m$ consecutive leaves
Insert	$O(\log_m n)$	Plus possible splits
Delete	$O(\log_m n)$	Plus possible merges
Space	$O(n)$	Slight overhead from duplicated routing keys

The range query cost is the key advantage. In a plain B-Tree, a range scan of $k$ results could require $O(k \cdot \log_m n)$ I/O in the worst case (re-traversing from the root for each result). The B+ Tree's leaf chain reduces this to $O(\log_m n + k/m)$.

Because internal nodes store only keys (no data pointers), more keys fit per page. If data pointers take $d$ bytes each, a B+ Tree internal node fits roughly $d/\text{key\_size}$ more keys than a B-Tree node of the same page size, increasing the effective fan-out.

Implementation

ALGORITHM BPlusTreeSearch(tree, key)
INPUT: tree: a B+ Tree, key: the value to find
OUTPUT: (leaf, index) or NOT_FOUND

BEGIN
  node <- tree.root

  WHILE NOT node.isLeaf DO
    i <- 0
    WHILE i < node.numKeys AND key >= node.keys[i] DO
      i <- i + 1
    END WHILE
    node <- DiskRead(node.children[i])
  END WHILE

  FOR i <- 0 TO node.numKeys - 1 DO
    IF node.keys[i] = key THEN
      RETURN (node, i)
    END IF
  END FOR

  RETURN NOT_FOUND
END

ALGORITHM BPlusTreeRangeScan(tree, low, high)
INPUT: tree: a B+ Tree, low: lower bound, high: upper bound
OUTPUT: list of all keys where low <= key <= high

BEGIN
  result <- empty list
  leaf <- FindLeaf(tree, low)

  WHILE leaf IS NOT NULL DO
    FOR i <- 0 TO leaf.numKeys - 1 DO
      IF leaf.keys[i] > high THEN
        RETURN result
      END IF
      IF leaf.keys[i] >= low THEN
        Append(result, leaf.keys[i])
      END IF
    END FOR
    leaf <- leaf.next
  END WHILE

  RETURN result
END

ALGORITHM BPlusTreeInsert(tree, key, data)
INPUT: tree, key to insert, associated data
BEGIN
  leaf <- FindLeaf(tree, key)
  InsertIntoLeaf(leaf, key, data)

  IF leaf.numKeys > MaxLeafKeys THEN
    SplitLeaf(leaf)
  END IF
END

ALGORITHM SplitLeaf(leaf)
BEGIN
  newLeaf <- AllocateLeafNode()
  mid <- leaf.numKeys / 2

  Move keys[mid .. leaf.numKeys-1] and data to newLeaf
  newLeaf.next <- leaf.next
  leaf.next <- newLeaf
  leaf.numKeys <- mid
  newLeaf.numKeys <- original numKeys - mid

  InsertIntoParent(leaf, newLeaf.keys[0], newLeaf)
END

Real-World Applications

MySQL/InnoDB: the default storage engine uses B+ Trees for both clustered (primary key) and secondary indexes; the clustered index stores actual row data in leaf pages
PostgreSQL: uses B+ Tree indexes (called "btree" in PostgreSQL) as the default index type for most queries
SQLite: stores both table data and indexes as B+ Trees within its single-file database format
File systems: NTFS uses B+ Trees for its Master File Table; ext4's directory indexing (htree) is a B+ Tree variant
MongoDB: WiredTiger storage engine uses B+ Trees for its indexes alongside LSM Trees for certain workloads

Key Takeaways

B+ Trees separate routing (internal nodes) from data storage (leaf nodes), maximizing fan-out and minimizing tree height
The linked leaf chain enables $O(\log_m n + k/m)$ range scans, which is why every major relational database uses B+ Trees for indexing
Point lookups cost the same as a B-Tree: $O(\log_m n)$ disk reads
Internal nodes hold more keys per page than B-Tree nodes (no data pointers), so the tree is shallower for the same dataset
B+ Trees are the single most important data structure in database engineering

2021-02-03

Tries: Prefix Trees for Autocomplete and String Matching

How tries store strings character by character in a tree structure, enabling prefix search, autocomplete, and fast string matching.

2021-02-04

Skip Lists: Probabilistic Balancing with Layered Linked Lists

How skip lists use randomized multi-level linked lists to achieve O(log n) search without the complexity of tree rotations.

2021-02-05

Bloom Filters: Probabilistic Membership Testing

How Bloom filters use multiple hash functions and a bit array to answer 'is this element in the set?' with extreme space efficiency, at the cost of occasional false positives.