Notes/Skip Lists: Probabilistic Balancing with Layered Linked Lists

Skip Lists: Probabilistic Balancing with Layered Linked Lists

How skip lists use randomized multi-level linked lists to achieve O(log n) search without the complexity of tree rotations.

2021-02-04AI-Synthesized from Personal Notes

Advanced Data StructuresSkip ListsData Structures

Terminology

Skip list: a probabilistic data structure built from multiple layers of sorted linked lists, where higher layers skip over elements to enable fast search
Level: one of the horizontal linked lists in the skip list; level 0 (the bottom) contains all elements
Tower: the vertical stack of nodes for a single element across multiple levels
Header (sentinel): a special node at the far left of every level, acting as the starting point for traversal
Promotion: the process of adding an element to a higher level; decided by a coin flip (probability $p$, typically $1/2$)
Probabilistic balancing: achieving expected $O(\log n)$ performance through randomization rather than deterministic rebalancing rules
Expected value: the average outcome over many trials; skip list operations are $O(\log n)$ in expectation
Geometric distribution: the probability distribution governing how many levels a node reaches; each level is an independent coin flip
Sorted linked list: a linked list where elements are maintained in ascending key order

What & Why

A skip list is a randomized data structure that provides the same $O(\log n)$ expected performance as a balanced BST for search, insert, and delete, but with a much simpler implementation. Instead of complex rotation rules (like red-black trees or AVL trees), a skip list uses coin flips to decide how "tall" each element's tower should be.

The core idea: start with a sorted linked list (level 0). Then build express lanes on top. Level 1 contains roughly half the elements, level 2 roughly a quarter, and so on. To search, start at the top level and move right until you would overshoot, then drop down a level and continue. This is analogous to scanning an index before reading the full list.

Why use a skip list?

Simpler to implement than balanced BSTs (no rotations, no color invariants)
Supports concurrent access more naturally than trees (lock-free variants exist)
Expected $O(\log n)$ for search, insert, and delete
Easy to implement range queries by walking level 0

The trade-off is that performance is probabilistic, not guaranteed. Worst case is $O(n)$, but this is astronomically unlikely with a good random number generator.

How It Works

Structure

A skip list has $L$ levels. Level 0 is a complete sorted linked list. Each higher level is a subset of the level below, with each element promoted independently with probability $p$ (usually $1/2$).

Search

Start at the header of the highest level. Move right along the current level as long as the next node's key is less than or equal to the target. When the next node would overshoot (or is null), drop down one level and continue. Repeat until you find the key at level 0 or determine it does not exist.

Insertion

First, search for the insertion position, recording the "update" pointers at each level (the last node visited before dropping down). Then flip coins to determine the new node's height. Create the node and splice it into each level up to its height by adjusting the forward pointers.

Deletion

Search for the node, recording update pointers at each level. Remove the node from each level by relinking the forward pointers. If the top levels become empty, reduce the skip list's maximum level.

Complexity Analysis

With promotion probability $p = 1/2$ and $n$ elements:

$E[\text{height}] = O(\log n)$

Operation	Expected Time	Worst Case	Notes
Search	$O(\log n)$	$O(n)$	Worst case is astronomically unlikely
Insert	$O(\log n)$	$O(n)$	Search + coin flips + splice
Delete	$O(\log n)$	$O(n)$	Search + unlink at each level
Space	$O(n)$ expected	$O(n \log n)$	Each element appears in $1/(1-p)$ levels on average

The expected number of levels for a single element follows a geometric distribution. With $p = 1/2$, an element appears in $1/(1 - 1/2) = 2$ levels on average, so total space is $O(2n) = O(n)$.

The expected number of comparisons during search is $O(\frac{1}{p} \log_{1/p} n)$. With $p = 1/2$, this is $O(2 \log_2 n) = O(\log n)$.

Implementation

STRUCTURE SkipListNode
  key: value stored in this node
  forward: array of pointers, one per level this node participates in
END STRUCTURE

ALGORITHM SkipListSearch(list, target)
INPUT: list: a skip list, target: key to find
OUTPUT: the node containing target, or NOT_FOUND
BEGIN
  node <- list.header
  FOR level <- list.maxLevel DOWNTO 0 DO
    WHILE node.forward[level] IS NOT NULL
          AND node.forward[level].key < target DO
      node <- node.forward[level]
    END WHILE
  END FOR

  node <- node.forward[0]
  IF node IS NOT NULL AND node.key = target THEN
    RETURN node
  ELSE
    RETURN NOT_FOUND
  END IF
END

ALGORITHM RandomLevel(maxLevel, p)
INPUT: maxLevel: cap on levels, p: promotion probability
OUTPUT: level for a new node
BEGIN
  level <- 0
  WHILE Random() < p AND level < maxLevel DO
    level <- level + 1
  END WHILE
  RETURN level
END

ALGORITHM SkipListInsert(list, key)
INPUT: list: a skip list, key: value to insert
BEGIN
  update <- array of size maxLevel + 1
  node <- list.header

  FOR level <- list.maxLevel DOWNTO 0 DO
    WHILE node.forward[level] IS NOT NULL
          AND node.forward[level].key < key DO
      node <- node.forward[level]
    END WHILE
    update[level] <- node
  END FOR

  newLevel <- RandomLevel(list.maxLevel, 0.5)

  IF newLevel > list.currentLevel THEN
    FOR level <- list.currentLevel + 1 TO newLevel DO
      update[level] <- list.header
    END FOR
    list.currentLevel <- newLevel
  END IF

  newNode <- CreateNode(key, newLevel)

  FOR level <- 0 TO newLevel DO
    newNode.forward[level] <- update[level].forward[level]
    update[level].forward[level] <- newNode
  END FOR
END

Real-World Applications

Redis sorted sets: Redis uses skip lists as the primary data structure for its sorted set type (ZSET), chosen for simplicity and good concurrent performance
LevelDB / RocksDB: the in-memory component (memtable) of these LSM-Tree-based stores uses a skip list for fast sorted inserts and lookups
Java ConcurrentSkipListMap: the Java standard library provides a lock-free concurrent sorted map built on skip lists
Lucene: Apache Lucene uses skip lists in its posting list format to enable fast document ID skipping during query evaluation
Lock-free concurrent data structures: skip lists are easier to make lock-free than balanced trees because updates are localized to pointer splicing rather than global rotations

Key Takeaways

Skip lists achieve expected $O(\log n)$ search, insert, and delete using randomization instead of deterministic balancing
The structure is layered sorted linked lists: level 0 has all elements, higher levels have progressively fewer
Implementation is significantly simpler than balanced BSTs, with no rotations or color invariants
Skip lists are naturally suited to concurrent access, which is why Redis and Java's concurrent collections use them
The trade-off is probabilistic guarantees rather than worst-case guarantees, but pathological cases are vanishingly rare in practice

2021-02-05

Bloom Filters: Probabilistic Membership Testing

How Bloom filters use multiple hash functions and a bit array to answer 'is this element in the set?' with extreme space efficiency, at the cost of occasional false positives.

2021-02-06

LSM Trees: Write-Optimized Storage with Log-Structured Merge

How LSM Trees turn random writes into sequential I/O, powering the storage engines behind Cassandra, RocksDB, and LevelDB.

2021-02-07

Red-Black Trees: Self-Balancing BSTs with Color Invariants

How red-black trees use node coloring and rotations to guarantee O(log n) operations, powering standard library maps and sets across languages.