Back to Blog

Red-Black Trees: Self-Balancing BSTs with Color Invariants

How red-black trees use node coloring and rotations to guarantee O(log n) operations, powering standard library maps and sets across languages.

2021-02-07
Share
Advanced Data Structuresred-black-treesdata-structures

Terminology

  • Binary Search Tree (BST): a tree where each node's left children have smaller keys and right children have larger keys
  • Self-balancing: automatically maintaining a bounded height after insertions and deletions, preventing degeneration into a linked list
  • Red-black tree: a BST where each node is colored red or black, with specific invariants that guarantee the tree's height is at most $2 \log_2(n+1)$
  • Color invariant: a rule about node colors that must hold at all times; violations are fixed by recoloring and rotations
  • Rotation: a local restructuring operation (left or right) that changes parent-child relationships while preserving BST ordering
  • Left rotation: promotes a node's right child to take its place, moving the original node down to the left
  • Right rotation: promotes a node's left child to take its place, moving the original node down to the right
  • Uncle node: the sibling of a node's parent; its color determines which fix-up case applies during insertion
  • Black height: the number of black nodes on any path from a given node down to a null leaf (not counting the node itself)
  • NIL node: a sentinel representing a null child; all NIL nodes are considered black

What & Why

A red-black tree is a self-balancing binary search tree that uses node coloring (red or black) and a small set of invariants to keep the tree approximately balanced. The result: search, insert, and delete all run in guaranteed O(\log n) worst-case time.

Why not use a plain BST? An unbalanced BST can degenerate into a linked list if keys are inserted in sorted order, giving O(n) operations. AVL trees solve this with strict balancing (height difference of at most 1 between subtrees), but they require more rotations on insertion and deletion. Red-black trees relax the balancing constraint slightly, allowing the tree to be up to twice as deep as a perfectly balanced tree, but requiring fewer rotations per operation.

The five red-black invariants:

  1. Every node is either red or black
  2. The root is black
  3. Every NIL (leaf sentinel) is black
  4. If a node is red, both its children are black (no two consecutive red nodes)
  5. For each node, all paths from that node to descendant NIL nodes contain the same number of black nodes

These invariants guarantee that the longest path from root to any leaf is at most twice the shortest path, bounding the height at 2 \log_2(n+1).

How It Works

Rotations

Rotations are the mechanical tool for fixing invariant violations. They restructure a small subtree (3 nodes) while preserving BST order.

Before left rotation on X X a Y b c rotate left After left rotation Y X c a b

Insertion

Insert the new node as in a standard BST and color it red. This may violate invariant 4 (no two consecutive reds). Fix up by examining the uncle node:

  • Case 1: Uncle is red: recolor the parent and uncle to black, grandparent to red, then move the fix-up point to the grandparent
  • Case 2: Uncle is black, node is an "inner" child: rotate the node up to become an "outer" child, reducing to Case 3
  • Case 3: Uncle is black, node is an "outer" child: rotate the grandparent and recolor; the fix-up is complete

At most 2 rotations are needed per insertion. Recoloring may propagate up to the root, but each step is $O(1)$.

Deletion

Deletion is more complex. Remove the node as in a standard BST. If the removed node was black, the black height invariant (rule 5) is violated. Fix up by examining the sibling node and its children, applying rotations and recoloring. At most 3 rotations are needed per deletion.

Complexity Analysis

The height of a red-black tree with $n$ internal nodes satisfies:

$h \leq 2 \log_2(n + 1)$

This follows from invariants 4 and 5: the black height is at most $\log_2(n+1)$, and red nodes can at most double the path length.

Operation Time Rotations Notes
Search $O(\log n)$ 0 Standard BST search
Insert $O(\log n)$ at most 2 Walk down + fix up
Delete $O(\log n)$ at most 3 Walk down + fix up
Space $O(n)$ - One extra bit per node for color

Compared to AVL trees: AVL trees have a stricter balance (height at most $1.44 \log_2 n$), making searches slightly faster. But AVL trees may need $O(\log n)$ rotations per deletion, while red-black trees need at most 3. This makes red-black trees preferable when insertions and deletions are frequent.

Implementation

STRUCTURE RBNode
  key: value stored in this node
  color: RED or BLACK
  left: pointer to left child
  right: pointer to right child
  parent: pointer to parent node
END STRUCTURE

ALGORITHM LeftRotate(tree, x)
INPUT: tree: red-black tree, x: node to rotate
BEGIN
  y <- x.right
  x.right <- y.left
  IF y.left IS NOT NIL THEN
    y.left.parent <- x
  END IF
  y.parent <- x.parent
  IF x.parent IS NIL THEN
    tree.root <- y
  ELSE IF x = x.parent.left THEN
    x.parent.left <- y
  ELSE
    x.parent.right <- y
  END IF
  y.left <- x
  x.parent <- y
END

ALGORITHM RBInsert(tree, key)
INPUT: tree: red-black tree, key: value to insert
BEGIN
  newNode <- CreateNode(key, color=RED)
  BSTInsert(tree, newNode)
  InsertFixUp(tree, newNode)
END

ALGORITHM InsertFixUp(tree, z)
INPUT: tree: red-black tree, z: newly inserted red node
BEGIN
  WHILE z.parent.color = RED DO
    IF z.parent = z.parent.parent.left THEN
      uncle <- z.parent.parent.right
      IF uncle.color = RED THEN
        // Case 1: uncle is red
        z.parent.color <- BLACK
        uncle.color <- BLACK
        z.parent.parent.color <- RED
        z <- z.parent.parent
      ELSE
        IF z = z.parent.right THEN
          // Case 2: uncle is black, z is right child
          z <- z.parent
          LeftRotate(tree, z)
        END IF
        // Case 3: uncle is black, z is left child
        z.parent.color <- BLACK
        z.parent.parent.color <- RED
        RightRotate(tree, z.parent.parent)
      END IF
    ELSE
      // Mirror cases (swap left/right)
      uncle <- z.parent.parent.left
      IF uncle.color = RED THEN
        z.parent.color <- BLACK
        uncle.color <- BLACK
        z.parent.parent.color <- RED
        z <- z.parent.parent
      ELSE
        IF z = z.parent.left THEN
          z <- z.parent
          RightRotate(tree, z)
        END IF
        z.parent.color <- BLACK
        z.parent.parent.color <- RED
        LeftRotate(tree, z.parent.parent)
      END IF
    END IF
  END WHILE
  tree.root.color <- BLACK
END

Real-World Applications

  • C++ std::map and std::set: most implementations (GCC libstdc++, LLVM libc++) use red-black trees as the underlying data structure for ordered associative containers
  • Java TreeMap and TreeSet: the Java standard library implements these sorted collections with red-black trees
  • Linux kernel: the Completely Fair Scheduler (CFS) uses a red-black tree to manage process scheduling, ordering tasks by virtual runtime
  • Linux kernel memory management: virtual memory areas (VMAs) are organized in a red-black tree for fast lookup of address ranges
  • Nginx: uses red-black trees for timer management and event scheduling in its event loop
  • Epoll: the Linux epoll subsystem uses a red-black tree to manage file descriptor registrations

Key Takeaways

  • Red-black trees guarantee $O(\log n)$ worst-case time for search, insert, and delete through five color invariants
  • The height is bounded by $2 \log_2(n+1)$, which is slightly worse than AVL trees but requires fewer rotations per mutation
  • Insertions need at most 2 rotations; deletions need at most 3. This bounded rotation count makes red-black trees efficient for write-heavy workloads
  • The extra cost is one bit per node (the color) and more complex insertion/deletion logic compared to a plain BST
  • Red-black trees are the default choice for ordered map/set implementations in C++, Java, and the Linux kernel