Back to Blog

Heaps and Priority Queues: Always Know the Extreme

How binary heaps maintain the min or max element at the top in O(log n) time, powering priority queues, heap sort, and scheduling systems.

2021-01-13
Share
CS Fundamentalsheapspriority-queuesdata-structures

Terminology

Heap
A complete binary tree that satisfies the heap property: every parent node is ordered relative to its children (smaller in a min-heap, larger in a max-heap).
Min-Heap
A heap where every parent's key is less than or equal to its children's keys. The root holds the minimum element.
Max-Heap
A heap where every parent's key is greater than or equal to its children's keys. The root holds the maximum element.
Complete Binary Tree
A binary tree where every level is fully filled except possibly the last, which is filled from left to right. This property allows array-based storage with no gaps.
Priority Queue
An abstract data type that supports inserting elements with priorities and extracting the element with the highest (or lowest) priority. Binary heaps are the most common implementation.
Heapify
The process of restoring the heap property after an element is added or removed. "Sift up" fixes after insertion; "sift down" fixes after extraction.
Heap Sort
A comparison-based sorting algorithm that builds a max-heap from the input and repeatedly extracts the maximum to produce sorted output. Runs in $O(n \log n)$ time with $O(1)$ extra space.

What & Why

A heap is a tree-based data structure that keeps the smallest (or largest) element always accessible at the root. It supports insertion and extraction of the extreme element in O(\log n) time, making it the go-to implementation for priority queues.

Why does this matter? Any time you need to repeatedly find the "most important" element from a changing collection, you need a priority queue. Operating system schedulers pick the highest-priority process. Dijkstra's shortest-path algorithm extracts the closest unvisited node. Event-driven simulations process the next event in time order. All of these rely on heaps.

Key properties:

  • O(1) access to the min or max: the root always holds the extreme element
  • O(\log n) insert and extract: maintained by sifting elements up or down the tree
  • Array-backed: the complete binary tree maps perfectly to an array with no pointer overhead
  • O(n) build time: a heap can be constructed from an unsorted array in linear time using bottom-up heapify

How It Works

Array Representation

Because a heap is a complete binary tree, it maps directly to an array with no wasted space. For a node at index $i$ (0-based):

  • Left child: $2i + 1$
  • Right child: $2i + 2$
  • Parent: $\lfloor (i - 1) / 2 \rfloor$
1 3 2 7 6 5 4 Array: [1, 3, 2, 7, 6, 5, 4] Index: 0 1 2 3 4 5 6

Every parent is smaller than both children, satisfying the min-heap property. The minimum (1) is always at index 0.

Insert (Sift Up)

  1. Append the new element at the end of the array (next available leaf position)
  2. Compare it with its parent; if it violates the heap property, swap them
  3. Repeat step 2, moving up the tree, until the property is restored or the root is reached

This takes at most $O(\log n)$ swaps, one per level of the tree.

Extract Min/Max (Sift Down)

  1. Save the root element (the min or max)
  2. Move the last element in the array to the root position
  3. Compare the new root with its children; swap with the smaller child (min-heap) or larger child (max-heap)
  4. Repeat step 3, moving down the tree, until the property is restored or a leaf is reached

This also takes at most $O(\log n)$ swaps.

Build Heap (Bottom-Up Heapify)

To convert an unsorted array into a heap, call sift-down on every non-leaf node, starting from the last non-leaf and working toward the root. Despite calling sift-down $O(n/2)$ times, the total work is $O(n)$ because most nodes are near the bottom and sift down only a few levels.

The formal analysis sums the work at each level:

$\sum_{k=0}^{\lfloor \log n \rfloor} \frac{n}{2^{k+1}} \cdot k = O(n)$

Complexity Analysis

Operation Time Notes
Find min/max $O(1)$ Always at the root (index 0)
Insert $O(\log n)$ Sift up from leaf to root
Extract min/max $O(\log n)$ Sift down from root to leaf
Build heap $O(n)$ Bottom-up heapify
Heap sort $O(n \log n)$ Build heap + $n$ extractions
Space $O(n)$ Array storage, no pointers needed

Heap Sort Analysis

Heap sort works in two phases:

  1. Build a max-heap from the input array: $O(n)$
  2. Repeatedly extract the max and place it at the end of the array: $n$ extractions at $O(\log n)$ each
$T(n) = O(n) + \sum_{k=1}^{n} O(\log k) = O(n \log n)$

Heap sort is in-place ($O(1)$ extra space) and always $O(n \log n)$, but it is not stable and has poor cache locality compared to quicksort.

Implementation

Min-Heap (Pseudocode)

STRUCTURE MinHeap:
    data ← empty array

FUNCTION peek(heap):
    RETURN heap.data[0]

FUNCTION insert(heap, val):
    APPEND val TO heap.data
    siftUp(heap, length(heap.data) - 1)

FUNCTION extractMin(heap):
    IF length(heap.data) = 0 THEN RETURN UNDEFINED
    min ← heap.data[0]
    last ← REMOVE last element FROM heap.data
    IF length(heap.data) > 0 THEN
        heap.data[0] ← last
        siftDown(heap, 0)
    RETURN min

FUNCTION siftUp(heap, i):
    WHILE i > 0:
        parent ← floor((i - 1) / 2)
        IF heap.data[i] >= heap.data[parent] THEN BREAK
        SWAP heap.data[i] AND heap.data[parent]
        i ← parent

FUNCTION siftDown(heap, i):
    n ← length(heap.data)
    LOOP:
        smallest ← i
        left ← 2 * i + 1
        right ← 2 * i + 2
        IF left < n AND heap.data[left] < heap.data[smallest] THEN smallest ← left
        IF right < n AND heap.data[right] < heap.data[smallest] THEN smallest ← right
        IF smallest = i THEN BREAK
        SWAP heap.data[i] AND heap.data[smallest]
        i ← smallest

Heap Sort (Pseudocode)

FUNCTION heapSort(arr):
    n ← length(arr)

    // Build max-heap (bottom-up)
    FOR i ← floor(n / 2) - 1 DOWNTO 0:
        siftDownMax(arr, i, n)

    // Extract max repeatedly
    FOR end ← n - 1 DOWNTO 1:
        SWAP arr[0] AND arr[end]
        siftDownMax(arr, 0, end)

FUNCTION siftDownMax(arr, i, size):
    LOOP:
        largest ← i
        left ← 2 * i + 1
        right ← 2 * i + 2
        IF left < size AND arr[left] > arr[largest] THEN largest ← left
        IF right < size AND arr[right] > arr[largest] THEN largest ← right
        IF largest = i THEN BREAK
        SWAP arr[i] AND arr[largest]
        i ← largest

Real-World Applications

  • OS process scheduling: priority queues determine which process runs next based on priority levels
  • Dijkstra's algorithm: a min-heap extracts the nearest unvisited vertex in $O(\log n)$, enabling efficient shortest-path computation
  • Event-driven simulation: events are processed in timestamp order using a min-heap as the event queue
  • Median maintenance: two heaps (a max-heap for the lower half and a min-heap for the upper half) track the running median of a data stream
  • Merge K sorted lists: a min-heap of size $K$ efficiently merges $K$ sorted sequences in $O(N \log K)$ total time
  • Huffman coding: builds an optimal prefix code by repeatedly extracting the two lowest-frequency nodes from a min-heap

Key Takeaways

  • A binary heap is a complete binary tree stored in an array, with parent-child relationships computed by index arithmetic
  • Min-heaps keep the smallest element at the root; max-heaps keep the largest
  • Insert and extract operations are $O(\log n)$ via sift-up and sift-down
  • Building a heap from an unsorted array is $O(n)$ using bottom-up heapify, not $O(n \log n)$
  • Heap sort achieves $O(n \log n)$ worst-case with $O(1)$ extra space, but quicksort is usually faster in practice due to better cache behavior