Skip Lists: Probabilistic Balancing with Layered Linked Lists
How skip lists use randomized multi-level linked lists to achieve O(log n) search without the complexity of tree rotations.
Terminology
- Skip list: a probabilistic data structure built from multiple layers of sorted linked lists, where higher layers skip over elements to enable fast search
- Level: one of the horizontal linked lists in the skip list; level 0 (the bottom) contains all elements
- Tower: the vertical stack of nodes for a single element across multiple levels
- Header (sentinel): a special node at the far left of every level, acting as the starting point for traversal
- Promotion: the process of adding an element to a higher level; decided by a coin flip (probability $p$, typically $1/2$)
- Probabilistic balancing: achieving expected $O(\log n)$ performance through randomization rather than deterministic rebalancing rules
- Expected value: the average outcome over many trials; skip list operations are $O(\log n)$ in expectation
- Geometric distribution: the probability distribution governing how many levels a node reaches; each level is an independent coin flip
- Sorted linked list: a linked list where elements are maintained in ascending key order
What & Why
A skip list is a randomized data structure that provides the same O(\log n) expected performance as a balanced BST for search, insert, and delete, but with a much simpler implementation. Instead of complex rotation rules (like red-black trees or AVL trees), a skip list uses coin flips to decide how "tall" each element's tower should be.
The core idea: start with a sorted linked list (level 0). Then build express lanes on top. Level 1 contains roughly half the elements, level 2 roughly a quarter, and so on. To search, start at the top level and move right until you would overshoot, then drop down a level and continue. This is analogous to scanning an index before reading the full list.
Why use a skip list?
- Simpler to implement than balanced BSTs (no rotations, no color invariants)
- Supports concurrent access more naturally than trees (lock-free variants exist)
- Expected
O(\log n)for search, insert, and delete - Easy to implement range queries by walking level 0
The trade-off is that performance is probabilistic, not guaranteed. Worst case is O(n), but this is astronomically unlikely with a good random number generator.
How It Works
Structure
A skip list has $L$ levels. Level 0 is a complete sorted linked list. Each higher level is a subset of the level below, with each element promoted independently with probability $p$ (usually $1/2$).
Search
Start at the header of the highest level. Move right along the current level as long as the next node's key is less than or equal to the target. When the next node would overshoot (or is null), drop down one level and continue. Repeat until you find the key at level 0 or determine it does not exist.
Insertion
First, search for the insertion position, recording the "update" pointers at each level (the last node visited before dropping down). Then flip coins to determine the new node's height. Create the node and splice it into each level up to its height by adjusting the forward pointers.
Deletion
Search for the node, recording update pointers at each level. Remove the node from each level by relinking the forward pointers. If the top levels become empty, reduce the skip list's maximum level.
Complexity Analysis
With promotion probability $p = 1/2$ and $n$ elements:
| Operation | Expected Time | Worst Case | Notes |
|---|---|---|---|
| Search | $O(\log n)$ | $O(n)$ | Worst case is astronomically unlikely |
| Insert | $O(\log n)$ | $O(n)$ | Search + coin flips + splice |
| Delete | $O(\log n)$ | $O(n)$ | Search + unlink at each level |
| Space | $O(n)$ expected | $O(n \log n)$ | Each element appears in $1/(1-p)$ levels on average |
The expected number of levels for a single element follows a geometric distribution. With $p = 1/2$, an element appears in $1/(1 - 1/2) = 2$ levels on average, so total space is $O(2n) = O(n)$.
The expected number of comparisons during search is $O(\frac{1}{p} \log_{1/p} n)$. With $p = 1/2$, this is $O(2 \log_2 n) = O(\log n)$.
Implementation
STRUCTURE SkipListNode
key: value stored in this node
forward: array of pointers, one per level this node participates in
END STRUCTURE
ALGORITHM SkipListSearch(list, target)
INPUT: list: a skip list, target: key to find
OUTPUT: the node containing target, or NOT_FOUND
BEGIN
node <- list.header
FOR level <- list.maxLevel DOWNTO 0 DO
WHILE node.forward[level] IS NOT NULL
AND node.forward[level].key < target DO
node <- node.forward[level]
END WHILE
END FOR
node <- node.forward[0]
IF node IS NOT NULL AND node.key = target THEN
RETURN node
ELSE
RETURN NOT_FOUND
END IF
END
ALGORITHM RandomLevel(maxLevel, p)
INPUT: maxLevel: cap on levels, p: promotion probability
OUTPUT: level for a new node
BEGIN
level <- 0
WHILE Random() < p AND level < maxLevel DO
level <- level + 1
END WHILE
RETURN level
END
ALGORITHM SkipListInsert(list, key)
INPUT: list: a skip list, key: value to insert
BEGIN
update <- array of size maxLevel + 1
node <- list.header
FOR level <- list.maxLevel DOWNTO 0 DO
WHILE node.forward[level] IS NOT NULL
AND node.forward[level].key < key DO
node <- node.forward[level]
END WHILE
update[level] <- node
END FOR
newLevel <- RandomLevel(list.maxLevel, 0.5)
IF newLevel > list.currentLevel THEN
FOR level <- list.currentLevel + 1 TO newLevel DO
update[level] <- list.header
END FOR
list.currentLevel <- newLevel
END IF
newNode <- CreateNode(key, newLevel)
FOR level <- 0 TO newLevel DO
newNode.forward[level] <- update[level].forward[level]
update[level].forward[level] <- newNode
END FOR
END
Real-World Applications
- Redis sorted sets: Redis uses skip lists as the primary data structure for its sorted set type (ZSET), chosen for simplicity and good concurrent performance
- LevelDB / RocksDB: the in-memory component (memtable) of these LSM-Tree-based stores uses a skip list for fast sorted inserts and lookups
- Java ConcurrentSkipListMap: the Java standard library provides a lock-free concurrent sorted map built on skip lists
- Lucene: Apache Lucene uses skip lists in its posting list format to enable fast document ID skipping during query evaluation
- Lock-free concurrent data structures: skip lists are easier to make lock-free than balanced trees because updates are localized to pointer splicing rather than global rotations
Key Takeaways
- Skip lists achieve expected $O(\log n)$ search, insert, and delete using randomization instead of deterministic balancing
- The structure is layered sorted linked lists: level 0 has all elements, higher levels have progressively fewer
- Implementation is significantly simpler than balanced BSTs, with no rotations or color invariants
- Skip lists are naturally suited to concurrent access, which is why Redis and Java's concurrent collections use them
- The trade-off is probabilistic guarantees rather than worst-case guarantees, but pathological cases are vanishingly rare in practice