Back to Blog

Skip Lists: Probabilistic Balancing with Layered Linked Lists

How skip lists use randomized multi-level linked lists to achieve O(log n) search without the complexity of tree rotations.

2021-02-04
Share
Advanced Data Structuresskip-listsdata-structures

Terminology

  • Skip list: a probabilistic data structure built from multiple layers of sorted linked lists, where higher layers skip over elements to enable fast search
  • Level: one of the horizontal linked lists in the skip list; level 0 (the bottom) contains all elements
  • Tower: the vertical stack of nodes for a single element across multiple levels
  • Header (sentinel): a special node at the far left of every level, acting as the starting point for traversal
  • Promotion: the process of adding an element to a higher level; decided by a coin flip (probability $p$, typically $1/2$)
  • Probabilistic balancing: achieving expected $O(\log n)$ performance through randomization rather than deterministic rebalancing rules
  • Expected value: the average outcome over many trials; skip list operations are $O(\log n)$ in expectation
  • Geometric distribution: the probability distribution governing how many levels a node reaches; each level is an independent coin flip
  • Sorted linked list: a linked list where elements are maintained in ascending key order

What & Why

A skip list is a randomized data structure that provides the same O(\log n) expected performance as a balanced BST for search, insert, and delete, but with a much simpler implementation. Instead of complex rotation rules (like red-black trees or AVL trees), a skip list uses coin flips to decide how "tall" each element's tower should be.

The core idea: start with a sorted linked list (level 0). Then build express lanes on top. Level 1 contains roughly half the elements, level 2 roughly a quarter, and so on. To search, start at the top level and move right until you would overshoot, then drop down a level and continue. This is analogous to scanning an index before reading the full list.

Why use a skip list?

  • Simpler to implement than balanced BSTs (no rotations, no color invariants)
  • Supports concurrent access more naturally than trees (lock-free variants exist)
  • Expected O(\log n) for search, insert, and delete
  • Easy to implement range queries by walking level 0

The trade-off is that performance is probabilistic, not guaranteed. Worst case is O(n), but this is astronomically unlikely with a good random number generator.

How It Works

Structure

A skip list has $L$ levels. Level 0 is a complete sorted linked list. Each higher level is a subset of the level below, with each element promoted independently with probability $p$ (usually $1/2$).

L3 L2 L1 L0 H H H H 3 3 7 7 7 7 12 20 20 20 25 25 Search for 20: H(L3) -> 7(L3) -> drop -> 7(L2) -> 20(L2) found

Search

Start at the header of the highest level. Move right along the current level as long as the next node's key is less than or equal to the target. When the next node would overshoot (or is null), drop down one level and continue. Repeat until you find the key at level 0 or determine it does not exist.

Insertion

First, search for the insertion position, recording the "update" pointers at each level (the last node visited before dropping down). Then flip coins to determine the new node's height. Create the node and splice it into each level up to its height by adjusting the forward pointers.

Deletion

Search for the node, recording update pointers at each level. Remove the node from each level by relinking the forward pointers. If the top levels become empty, reduce the skip list's maximum level.

Complexity Analysis

With promotion probability $p = 1/2$ and $n$ elements:

$E[\text{height}] = O(\log n)$
Operation Expected Time Worst Case Notes
Search $O(\log n)$ $O(n)$ Worst case is astronomically unlikely
Insert $O(\log n)$ $O(n)$ Search + coin flips + splice
Delete $O(\log n)$ $O(n)$ Search + unlink at each level
Space $O(n)$ expected $O(n \log n)$ Each element appears in $1/(1-p)$ levels on average

The expected number of levels for a single element follows a geometric distribution. With $p = 1/2$, an element appears in $1/(1 - 1/2) = 2$ levels on average, so total space is $O(2n) = O(n)$.

The expected number of comparisons during search is $O(\frac{1}{p} \log_{1/p} n)$. With $p = 1/2$, this is $O(2 \log_2 n) = O(\log n)$.

Implementation

STRUCTURE SkipListNode
  key: value stored in this node
  forward: array of pointers, one per level this node participates in
END STRUCTURE

ALGORITHM SkipListSearch(list, target)
INPUT: list: a skip list, target: key to find
OUTPUT: the node containing target, or NOT_FOUND
BEGIN
  node <- list.header
  FOR level <- list.maxLevel DOWNTO 0 DO
    WHILE node.forward[level] IS NOT NULL
          AND node.forward[level].key < target DO
      node <- node.forward[level]
    END WHILE
  END FOR

  node <- node.forward[0]
  IF node IS NOT NULL AND node.key = target THEN
    RETURN node
  ELSE
    RETURN NOT_FOUND
  END IF
END

ALGORITHM RandomLevel(maxLevel, p)
INPUT: maxLevel: cap on levels, p: promotion probability
OUTPUT: level for a new node
BEGIN
  level <- 0
  WHILE Random() < p AND level < maxLevel DO
    level <- level + 1
  END WHILE
  RETURN level
END

ALGORITHM SkipListInsert(list, key)
INPUT: list: a skip list, key: value to insert
BEGIN
  update <- array of size maxLevel + 1
  node <- list.header

  FOR level <- list.maxLevel DOWNTO 0 DO
    WHILE node.forward[level] IS NOT NULL
          AND node.forward[level].key < key DO
      node <- node.forward[level]
    END WHILE
    update[level] <- node
  END FOR

  newLevel <- RandomLevel(list.maxLevel, 0.5)

  IF newLevel > list.currentLevel THEN
    FOR level <- list.currentLevel + 1 TO newLevel DO
      update[level] <- list.header
    END FOR
    list.currentLevel <- newLevel
  END IF

  newNode <- CreateNode(key, newLevel)

  FOR level <- 0 TO newLevel DO
    newNode.forward[level] <- update[level].forward[level]
    update[level].forward[level] <- newNode
  END FOR
END

Real-World Applications

  • Redis sorted sets: Redis uses skip lists as the primary data structure for its sorted set type (ZSET), chosen for simplicity and good concurrent performance
  • LevelDB / RocksDB: the in-memory component (memtable) of these LSM-Tree-based stores uses a skip list for fast sorted inserts and lookups
  • Java ConcurrentSkipListMap: the Java standard library provides a lock-free concurrent sorted map built on skip lists
  • Lucene: Apache Lucene uses skip lists in its posting list format to enable fast document ID skipping during query evaluation
  • Lock-free concurrent data structures: skip lists are easier to make lock-free than balanced trees because updates are localized to pointer splicing rather than global rotations

Key Takeaways

  • Skip lists achieve expected $O(\log n)$ search, insert, and delete using randomization instead of deterministic balancing
  • The structure is layered sorted linked lists: level 0 has all elements, higher levels have progressively fewer
  • Implementation is significantly simpler than balanced BSTs, with no rotations or color invariants
  • Skip lists are naturally suited to concurrent access, which is why Redis and Java's concurrent collections use them
  • The trade-off is probabilistic guarantees rather than worst-case guarantees, but pathological cases are vanishingly rare in practice