Top-K and Heap Patterns
Solve Top-K problems efficiently using heaps: Kth largest element, merge K sorted lists, median from data stream, and min-heap vs max-heap selection strategies.
Terminology
| Term | Definition |
|---|---|
| Min-heap | A complete binary tree where every parent is smaller than or equal to its children. The root holds the minimum element. Supports $O(\log n)$ insert and extract-min. |
| Max-heap | A complete binary tree where every parent is larger than or equal to its children. The root holds the maximum element. |
| Priority queue | An abstract data type that supports insert and extract-min (or extract-max). Typically implemented with a heap. |
| Top-K | Finding the $K$ largest (or smallest) elements from a collection. A min-heap of size $K$ efficiently tracks the top $K$ largest. |
| Kth largest element | The element that would be at position $K$ if the array were sorted in descending order. The root of a size-$K$ min-heap after processing all elements. |
| K-way merge | Merging $K$ sorted lists into one sorted list using a min-heap to always extract the smallest current element across all lists. |
| Two-heap pattern | Using a max-heap for the lower half and a min-heap for the upper half to maintain a running median in $O(\log n)$ per insertion. |
| Heapify | Building a heap from an unsorted array in $O(n)$ time by sifting down from the last non-leaf node to the root. |
What & Why
"Find the K largest elements" and "find the Kth largest element" are among the most common interview questions. The naive approach sorts the entire array in O(n \log n), but a heap-based approach does it in O(n \log K), which is much better when K \ll n.
The key insight: maintain a min-heap of size K. As you process each element, if it is larger than the heap's minimum (the root), replace the root and re-heapify. After processing all elements, the heap contains exactly the top K largest, and the root is the Kth largest.
The heap pattern extends to:
- Merge K sorted lists: use a min-heap to always pick the smallest head across all lists
- Median from data stream: use two heaps (max-heap for lower half, min-heap for upper half) to maintain the median in
O(\log n)per insertion
How It Works
Kth Largest with a Min-Heap of Size K
Process elements one by one. Keep only the K largest seen so far in a min-heap. The root is always the Kth largest.
Two-Heap Pattern: Running Median
Maintain a max-heap (lower half) and a min-heap (upper half). Balance their sizes so they differ by at most 1. The median is the root of the larger heap (or the average of both roots).
Complexity Analysis
| Problem | Sort Approach | Heap Approach | Space |
|---|---|---|---|
| Kth largest | $O(n \log n)$ | $O(n \log K)$ | $O(K)$ |
| Top-K frequent | $O(n \log n)$ | $O(n \log K)$ | $O(n + K)$ |
| Merge K sorted lists | $O(N \log N)$ | $O(N \log K)$ | $O(K)$ |
| Running median (per insert) | $O(n)$ insertion sort | $O(\log n)$ | $O(n)$ |
For Kth largest, each of the $n$ elements requires at most one heap operation of cost $O(\log K)$:
When $K$ is small relative to $n$, this is significantly faster than sorting.
Implementation
Kth Largest Element
ALGORITHM KthLargest(A, n, K)
INPUT: array A of size n, integer K
OUTPUT: the Kth largest element
heap = empty min-heap
FOR i = 0 TO n - 1 DO
IF SIZE(heap) < K THEN
heap.insert(A[i])
ELSE IF A[i] > heap.peekMin() THEN
heap.extractMin()
heap.insert(A[i])
END IF
END FOR
RETURN heap.peekMin()
END
Merge K Sorted Lists
ALGORITHM MergeKSorted(lists, K)
INPUT: K sorted lists
OUTPUT: single merged sorted list
heap = empty min-heap
result = empty list
// Initialize heap with first element of each list
FOR i = 0 TO K - 1 DO
IF lists[i] is not empty THEN
heap.insert((lists[i][0], i, 0)) // (value, listIndex, elementIndex)
END IF
END FOR
WHILE heap is not empty DO
(val, listIdx, elemIdx) = heap.extractMin()
result.append(val)
IF elemIdx + 1 < LENGTH(lists[listIdx]) THEN
nextVal = lists[listIdx][elemIdx + 1]
heap.insert((nextVal, listIdx, elemIdx + 1))
END IF
END WHILE
RETURN result
END
Running Median (Two-Heap)
ALGORITHM MedianFinder()
maxHeap = empty max-heap // lower half
minHeap = empty min-heap // upper half
ALGORITHM AddNumber(num)
maxHeap.insert(num)
minHeap.insert(maxHeap.extractMax())
// Balance: maxHeap can have at most 1 more element
IF SIZE(minHeap) > SIZE(maxHeap) THEN
maxHeap.insert(minHeap.extractMin())
END IF
END
ALGORITHM GetMedian()
IF SIZE(maxHeap) > SIZE(minHeap) THEN
RETURN maxHeap.peekMax()
ELSE
RETURN (maxHeap.peekMax() + minHeap.peekMin()) / 2
END IF
END
Real-World Applications
- Search engine ranking: finding the top-K most relevant documents from millions of candidates uses a min-heap to track the best results.
- Real-time analytics: streaming "top trending topics" or "most active users" over a time window uses heap-based top-K tracking.
- Merge sort external: merging K sorted runs from disk during external sort uses K-way merge with a min-heap.
- Median filtering in signal processing: the two-heap pattern computes running medians for noise reduction in audio and image processing.
- Task scheduling: operating system schedulers use priority queues (heaps) to select the highest-priority process to run next.
Key Takeaways
- For "Kth largest," use a min-heap of size $K$: the root is always the Kth largest element, and each insertion costs $O(\log K)$
- For "Kth smallest," use a max-heap of size $K$ with the same logic inverted
- K-way merge uses a min-heap of size $K$ to always extract the globally smallest element across all lists in $O(\log K)$
- The two-heap pattern (max-heap + min-heap) maintains a running median in $O(\log n)$ per insertion by keeping the halves balanced
- When $K \ll n$, heap-based approaches beat sorting: $O(n \log K)$ vs $O(n \log n)$