Red-Black Trees: Self-Balancing BSTs with Color Invariants
How red-black trees use node coloring and rotations to guarantee O(log n) operations, powering standard library maps and sets across languages.
Terminology
- Binary Search Tree (BST): a tree where each node's left children have smaller keys and right children have larger keys
- Self-balancing: automatically maintaining a bounded height after insertions and deletions, preventing degeneration into a linked list
- Red-black tree: a BST where each node is colored red or black, with specific invariants that guarantee the tree's height is at most $2 \log_2(n+1)$
- Color invariant: a rule about node colors that must hold at all times; violations are fixed by recoloring and rotations
- Rotation: a local restructuring operation (left or right) that changes parent-child relationships while preserving BST ordering
- Left rotation: promotes a node's right child to take its place, moving the original node down to the left
- Right rotation: promotes a node's left child to take its place, moving the original node down to the right
- Uncle node: the sibling of a node's parent; its color determines which fix-up case applies during insertion
- Black height: the number of black nodes on any path from a given node down to a null leaf (not counting the node itself)
- NIL node: a sentinel representing a null child; all NIL nodes are considered black
What & Why
A red-black tree is a self-balancing binary search tree that uses node coloring (red or black) and a small set of invariants to keep the tree approximately balanced. The result: search, insert, and delete all run in guaranteed O(\log n) worst-case time.
Why not use a plain BST? An unbalanced BST can degenerate into a linked list if keys are inserted in sorted order, giving O(n) operations. AVL trees solve this with strict balancing (height difference of at most 1 between subtrees), but they require more rotations on insertion and deletion. Red-black trees relax the balancing constraint slightly, allowing the tree to be up to twice as deep as a perfectly balanced tree, but requiring fewer rotations per operation.
The five red-black invariants:
- Every node is either red or black
- The root is black
- Every NIL (leaf sentinel) is black
- If a node is red, both its children are black (no two consecutive red nodes)
- For each node, all paths from that node to descendant NIL nodes contain the same number of black nodes
These invariants guarantee that the longest path from root to any leaf is at most twice the shortest path, bounding the height at 2 \log_2(n+1).
How It Works
Rotations
Rotations are the mechanical tool for fixing invariant violations. They restructure a small subtree (3 nodes) while preserving BST order.
Insertion
Insert the new node as in a standard BST and color it red. This may violate invariant 4 (no two consecutive reds). Fix up by examining the uncle node:
- Case 1: Uncle is red: recolor the parent and uncle to black, grandparent to red, then move the fix-up point to the grandparent
- Case 2: Uncle is black, node is an "inner" child: rotate the node up to become an "outer" child, reducing to Case 3
- Case 3: Uncle is black, node is an "outer" child: rotate the grandparent and recolor; the fix-up is complete
At most 2 rotations are needed per insertion. Recoloring may propagate up to the root, but each step is $O(1)$.
Deletion
Deletion is more complex. Remove the node as in a standard BST. If the removed node was black, the black height invariant (rule 5) is violated. Fix up by examining the sibling node and its children, applying rotations and recoloring. At most 3 rotations are needed per deletion.
Complexity Analysis
The height of a red-black tree with $n$ internal nodes satisfies:
This follows from invariants 4 and 5: the black height is at most $\log_2(n+1)$, and red nodes can at most double the path length.
| Operation | Time | Rotations | Notes |
|---|---|---|---|
| Search | $O(\log n)$ | 0 | Standard BST search |
| Insert | $O(\log n)$ | at most 2 | Walk down + fix up |
| Delete | $O(\log n)$ | at most 3 | Walk down + fix up |
| Space | $O(n)$ | - | One extra bit per node for color |
Compared to AVL trees: AVL trees have a stricter balance (height at most $1.44 \log_2 n$), making searches slightly faster. But AVL trees may need $O(\log n)$ rotations per deletion, while red-black trees need at most 3. This makes red-black trees preferable when insertions and deletions are frequent.
Implementation
STRUCTURE RBNode
key: value stored in this node
color: RED or BLACK
left: pointer to left child
right: pointer to right child
parent: pointer to parent node
END STRUCTURE
ALGORITHM LeftRotate(tree, x)
INPUT: tree: red-black tree, x: node to rotate
BEGIN
y <- x.right
x.right <- y.left
IF y.left IS NOT NIL THEN
y.left.parent <- x
END IF
y.parent <- x.parent
IF x.parent IS NIL THEN
tree.root <- y
ELSE IF x = x.parent.left THEN
x.parent.left <- y
ELSE
x.parent.right <- y
END IF
y.left <- x
x.parent <- y
END
ALGORITHM RBInsert(tree, key)
INPUT: tree: red-black tree, key: value to insert
BEGIN
newNode <- CreateNode(key, color=RED)
BSTInsert(tree, newNode)
InsertFixUp(tree, newNode)
END
ALGORITHM InsertFixUp(tree, z)
INPUT: tree: red-black tree, z: newly inserted red node
BEGIN
WHILE z.parent.color = RED DO
IF z.parent = z.parent.parent.left THEN
uncle <- z.parent.parent.right
IF uncle.color = RED THEN
// Case 1: uncle is red
z.parent.color <- BLACK
uncle.color <- BLACK
z.parent.parent.color <- RED
z <- z.parent.parent
ELSE
IF z = z.parent.right THEN
// Case 2: uncle is black, z is right child
z <- z.parent
LeftRotate(tree, z)
END IF
// Case 3: uncle is black, z is left child
z.parent.color <- BLACK
z.parent.parent.color <- RED
RightRotate(tree, z.parent.parent)
END IF
ELSE
// Mirror cases (swap left/right)
uncle <- z.parent.parent.left
IF uncle.color = RED THEN
z.parent.color <- BLACK
uncle.color <- BLACK
z.parent.parent.color <- RED
z <- z.parent.parent
ELSE
IF z = z.parent.left THEN
z <- z.parent
RightRotate(tree, z)
END IF
z.parent.color <- BLACK
z.parent.parent.color <- RED
LeftRotate(tree, z.parent.parent)
END IF
END IF
END WHILE
tree.root.color <- BLACK
END
Real-World Applications
- C++ std::map and std::set: most implementations (GCC libstdc++, LLVM libc++) use red-black trees as the underlying data structure for ordered associative containers
- Java TreeMap and TreeSet: the Java standard library implements these sorted collections with red-black trees
- Linux kernel: the Completely Fair Scheduler (CFS) uses a red-black tree to manage process scheduling, ordering tasks by virtual runtime
- Linux kernel memory management: virtual memory areas (VMAs) are organized in a red-black tree for fast lookup of address ranges
- Nginx: uses red-black trees for timer management and event scheduling in its event loop
- Epoll: the Linux epoll subsystem uses a red-black tree to manage file descriptor registrations
Key Takeaways
- Red-black trees guarantee $O(\log n)$ worst-case time for search, insert, and delete through five color invariants
- The height is bounded by $2 \log_2(n+1)$, which is slightly worse than AVL trees but requires fewer rotations per mutation
- Insertions need at most 2 rotations; deletions need at most 3. This bounded rotation count makes red-black trees efficient for write-heavy workloads
- The extra cost is one bit per node (the color) and more complex insertion/deletion logic compared to a plain BST
- Red-black trees are the default choice for ordered map/set implementations in C++, Java, and the Linux kernel