Notes/Top-K and Heap Patterns

Top-K and Heap Patterns

Solve Top-K problems efficiently using heaps: Kth largest element, merge K sorted lists, median from data stream, and min-heap vs max-heap selection strategies.

2025-09-25AI-Synthesized from Personal Notes

Coding PatternsTop KHeapPriority QueueLeetcode

Terminology

Term	Definition
Min-heap	A complete binary tree where every parent is smaller than or equal to its children. The root holds the minimum element. Supports $O(\log n)$ insert and extract-min.
Max-heap	A complete binary tree where every parent is larger than or equal to its children. The root holds the maximum element.
Priority queue	An abstract data type that supports insert and extract-min (or extract-max). Typically implemented with a heap.
Top-K	Finding the $K$ largest (or smallest) elements from a collection. A min-heap of size $K$ efficiently tracks the top $K$ largest.
Kth largest element	The element that would be at position $K$ if the array were sorted in descending order. The root of a size-$K$ min-heap after processing all elements.
K-way merge	Merging $K$ sorted lists into one sorted list using a min-heap to always extract the smallest current element across all lists.
Two-heap pattern	Using a max-heap for the lower half and a min-heap for the upper half to maintain a running median in $O(\log n)$ per insertion.
Heapify	Building a heap from an unsorted array in $O(n)$ time by sifting down from the last non-leaf node to the root.

What & Why

"Find the K largest elements" and "find the Kth largest element" are among the most common interview questions. The naive approach sorts the entire array in $O(n \log n)$, but a heap-based approach does it in $O(n \log K)$, which is much better when $K \ll n$.

The key insight: maintain a min-heap of size $K$. As you process each element, if it is larger than the heap's minimum (the root), replace the root and re-heapify. After processing all elements, the heap contains exactly the top $K$ largest, and the root is the Kth largest.

The heap pattern extends to:

Merge K sorted lists: use a min-heap to always pick the smallest head across all lists
Median from data stream: use two heaps (max-heap for lower half, min-heap for upper half) to maintain the median in $O(\log n)$ per insertion

How It Works

Kth Largest with a Min-Heap of Size K

Process elements one by one. Keep only the $K$ largest seen so far in a min-heap. The root is always the Kth largest.

Two-Heap Pattern: Running Median

Maintain a max-heap (lower half) and a min-heap (upper half). Balance their sizes so they differ by at most 1. The median is the root of the larger heap (or the average of both roots).

Complexity Analysis

Problem	Sort Approach	Heap Approach	Space
Kth largest	$O(n \log n)$	$O(n \log K)$	$O(K)$
Top-K frequent	$O(n \log n)$	$O(n \log K)$	$O(n + K)$
Merge K sorted lists	$O(N \log N)$	$O(N \log K)$	$O(K)$
Running median (per insert)	$O(n)$ insertion sort	$O(\log n)$	$O(n)$

For Kth largest, each of the $n$ elements requires at most one heap operation of cost $O(\log K)$:

$T(n) = O(n \log K)$

When $K$ is small relative to $n$, this is significantly faster than sorting.

Implementation

Kth Largest Element

ALGORITHM KthLargest(A, n, K)
  INPUT: array A of size n, integer K
  OUTPUT: the Kth largest element

  heap = empty min-heap

  FOR i = 0 TO n - 1 DO
    IF SIZE(heap) < K THEN
      heap.insert(A[i])
    ELSE IF A[i] > heap.peekMin() THEN
      heap.extractMin()
      heap.insert(A[i])
    END IF
  END FOR

  RETURN heap.peekMin()
END

Merge K Sorted Lists

ALGORITHM MergeKSorted(lists, K)
  INPUT: K sorted lists
  OUTPUT: single merged sorted list

  heap = empty min-heap
  result = empty list

  // Initialize heap with first element of each list
  FOR i = 0 TO K - 1 DO
    IF lists[i] is not empty THEN
      heap.insert((lists[i][0], i, 0))   // (value, listIndex, elementIndex)
    END IF
  END FOR

  WHILE heap is not empty DO
    (val, listIdx, elemIdx) = heap.extractMin()
    result.append(val)

    IF elemIdx + 1 < LENGTH(lists[listIdx]) THEN
      nextVal = lists[listIdx][elemIdx + 1]
      heap.insert((nextVal, listIdx, elemIdx + 1))
    END IF
  END WHILE

  RETURN result
END

Running Median (Two-Heap)

ALGORITHM MedianFinder()
  maxHeap = empty max-heap   // lower half
  minHeap = empty min-heap   // upper half

ALGORITHM AddNumber(num)
  maxHeap.insert(num)
  minHeap.insert(maxHeap.extractMax())

  // Balance: maxHeap can have at most 1 more element
  IF SIZE(minHeap) > SIZE(maxHeap) THEN
    maxHeap.insert(minHeap.extractMin())
  END IF
END

ALGORITHM GetMedian()
  IF SIZE(maxHeap) > SIZE(minHeap) THEN
    RETURN maxHeap.peekMax()
  ELSE
    RETURN (maxHeap.peekMax() + minHeap.peekMin()) / 2
  END IF
END

Real-World Applications

Search engine ranking: finding the top-K most relevant documents from millions of candidates uses a min-heap to track the best results.
Real-time analytics: streaming "top trending topics" or "most active users" over a time window uses heap-based top-K tracking.
Merge sort external: merging K sorted runs from disk during external sort uses K-way merge with a min-heap.
Median filtering in signal processing: the two-heap pattern computes running medians for noise reduction in audio and image processing.
Task scheduling: operating system schedulers use priority queues (heaps) to select the highest-priority process to run next.

Key Takeaways

For "Kth largest," use a min-heap of size $K$: the root is always the Kth largest element, and each insertion costs $O(\log K)$
For "Kth smallest," use a max-heap of size $K$ with the same logic inverted
K-way merge uses a min-heap of size $K$ to always extract the globally smallest element across all lists in $O(\log K)$
The two-heap pattern (max-heap + min-heap) maintains a running median in $O(\log n)$ per insertion by keeping the halves balanced
When $K \ll n$, heap-based approaches beat sorting: $O(n \log K)$ vs $O(n \log n)$

2025-09-26

Trie-Based String Problems

Use tries for autocomplete, word search, longest common prefix, and wildcard matching, with efficient prefix lookups and character-by-character traversal.

2025-09-27

Binary Search on the Answer

Apply binary search to optimization problems by searching over the answer space: Koko eating bananas, split array largest sum, and capacity to ship packages.

2025-09-28

Graph Coloring and Bipartite Checking

Understand graph coloring, 2-coloring via BFS/DFS for bipartite checking, possible bipartition problems, and an introduction to the chromatic number.