Greedy Algorithms: Activity Selection, Huffman Coding, and Greedy vs DP
When making the locally optimal choice at each step leads to a globally optimal solution, and when it does not.
Terminology
- Greedy algorithm: an algorithm that builds a solution incrementally, making the locally optimal choice at each step without reconsidering previous decisions.
- Greedy choice property: a problem has this property if a globally optimal solution can always be reached by making locally optimal choices. This is what makes greedy algorithms correct.
- Optimal substructure: after making a greedy choice, the remaining problem is a smaller instance of the same problem. Shared with dynamic programming.
- Activity selection: given a set of activities with start and finish times, select the maximum number of non-overlapping activities.
- Huffman coding: a lossless data compression algorithm that assigns variable-length binary codes to characters based on their frequency. More frequent characters get shorter codes.
- Prefix-free code: a coding scheme where no codeword is a prefix of another, ensuring unambiguous decoding. Huffman codes are always prefix-free.
- Matroid: a mathematical structure that generalizes when greedy algorithms produce optimal solutions. If a problem's feasible sets form a matroid, greedy works.
- Exchange argument: a proof technique for greedy algorithms. You assume an optimal solution differs from the greedy solution, then show you can "exchange" elements to make it match the greedy solution without worsening it.
What & Why
Greedy algorithms are the simplest optimization strategy: at each step, pick the option that looks best right now. No backtracking, no exploring alternatives, no storing subproblem results. Just commit to the locally optimal choice and move on.
This sounds too simple to work, and often it does not. The key question is: when does greedy produce a globally optimal solution?
The answer: when the problem has both the greedy choice property and optimal substructure. The greedy choice property means there exists an optimal solution that includes the greedy choice. Optimal substructure means the remaining problem after the greedy choice is a smaller instance of the same problem.
Greedy vs DP:
- Both require optimal substructure
- DP explores all subproblems and combines their solutions
- Greedy commits to one choice per step and never looks back
- Greedy is faster (often
O(n \log n)orO(n)) but only works when the greedy choice property holds - When greedy does not work, you typically need DP
How It Works
Activity Selection
Given $n$ activities with start times $s_i$ and finish times $f_i$, select the maximum number of non-overlapping activities.
The greedy strategy: sort activities by finish time, then greedily pick the activity that finishes earliest and does not overlap with the last selected activity.
Why does this work? If you pick the activity that finishes earliest, you leave the maximum remaining time for future activities. The exchange argument formalizes this: any optimal solution that does not start with the earliest-finishing activity can be modified to do so without reducing the count.
Huffman Coding
Huffman coding builds an optimal prefix-free binary code for a set of characters with known frequencies. The algorithm uses a priority queue (min-heap):
- Create a leaf node for each character with its frequency
- Repeatedly extract the two nodes with the lowest frequency
- Create a new internal node with these two as children, frequency = sum of their frequencies
- Insert the new node back into the priority queue
- When one node remains, it is the root of the Huffman tree
Left edges represent 0, right edges represent 1. The code for each character is the path from root to its leaf.
When Greedy Fails
Consider the 0/1 Knapsack problem. A greedy approach (pick items by highest value-to-weight ratio) does not always work. Example: items with (weight=1, value=1) and (weight=2, value=3), capacity=2. Greedy picks the first item (ratio 1.0), getting value 1. But taking only the second item gives value 3.
The fractional knapsack (where you can take fractions of items) does yield to greedy. The 0/1 version requires DP.
Complexity Analysis
| Algorithm | Time | Space | Notes |
|---|---|---|---|
| Activity Selection | $O(n \log n)$ | $O(1)$ | Dominated by sorting |
| Huffman Coding | $O(n \log n)$ | $O(n)$ | Priority queue operations |
| Fractional Knapsack | $O(n \log n)$ | $O(1)$ | Sort by value/weight ratio |
| Kruskal's MST | $O(E \log E)$ | $O(V)$ | Greedy edge selection + union-find |
Huffman Optimality
Huffman coding produces the optimal prefix-free code. The expected code length is:
where $f_i$ is the frequency and $l_i$ is the code length for character $i$. Huffman minimizes $L$ among all prefix-free codes. The result satisfies:
where $H(X) = -\sum f_i \log_2 f_i$ is the Shannon entropy, the theoretical minimum bits per symbol.
Implementation
Activity Selection
ALGORITHM ActivitySelection(activities, n)
INPUT: array of n activities, each with start and finish time
OUTPUT: maximum set of non-overlapping activities
SORT activities by finish time ascending
selected = [activities[0]]
lastFinish = activities[0].finish
FOR i = 1 TO n - 1 DO
IF activities[i].start >= lastFinish THEN
APPEND activities[i] to selected
lastFinish = activities[i].finish
END IF
END FOR
RETURN selected
END
Huffman Coding
ALGORITHM HuffmanBuild(chars, freqs, n)
INPUT: array of n characters with their frequencies
OUTPUT: root of the Huffman tree
CREATE min-priority queue Q
FOR i = 0 TO n - 1 DO
CREATE leaf node with char = chars[i], freq = freqs[i]
INSERT node into Q
END FOR
WHILE SIZE(Q) > 1 DO
left = EXTRACT_MIN(Q)
right = EXTRACT_MIN(Q)
CREATE internal node with:
freq = left.freq + right.freq
leftChild = left
rightChild = right
INSERT node into Q
END WHILE
RETURN EXTRACT_MIN(Q) // root of Huffman tree
END
ALGORITHM HuffmanCodes(node, currentCode, codeTable)
INPUT: Huffman tree node, current binary string, output table
OUTPUT: populates codeTable with character-to-code mappings
IF node is a leaf THEN
codeTable[node.char] = currentCode
RETURN
END IF
HuffmanCodes(node.leftChild, currentCode + "0", codeTable)
HuffmanCodes(node.rightChild, currentCode + "1", codeTable)
END
Fractional Knapsack
ALGORITHM FractionalKnapsack(items, n, W)
INPUT: array of n items (weight, value), capacity W
OUTPUT: maximum total value
SORT items by value/weight ratio descending
totalValue = 0
remainingCapacity = W
FOR i = 0 TO n - 1 DO
IF items[i].weight <= remainingCapacity THEN
// Take the whole item
totalValue = totalValue + items[i].value
remainingCapacity = remainingCapacity - items[i].weight
ELSE
// Take a fraction
fraction = remainingCapacity / items[i].weight
totalValue = totalValue + items[i].value * fraction
BREAK
END IF
END FOR
RETURN totalValue
END
Real-World Applications
- Data compression: Huffman coding is used in JPEG, PNG (via DEFLATE), and MP3. Modern variants like arithmetic coding extend the same greedy principle.
- Task scheduling: operating systems use greedy scheduling (shortest job first, earliest deadline first) to minimize average wait time or meet deadlines.
- Network routing: Dijkstra's shortest path algorithm is greedy, always expanding the nearest unvisited node.
- Minimum spanning trees: both Kruskal's and Prim's algorithms are greedy, selecting the cheapest edge that does not create a cycle.
- Coin change (specific denominations): for standard currency denominations (1, 5, 10, 25 cents), the greedy approach of always picking the largest coin works. For arbitrary denominations, it does not, and DP is needed.
Key Takeaways
- Greedy algorithms make the locally optimal choice at each step and never reconsider, making them simple and fast
- They work when the problem has both the greedy choice property and optimal substructure
- Activity selection (sort by finish time) and Huffman coding (merge lowest frequencies) are the two classic greedy algorithms
- Greedy fails for problems like 0/1 Knapsack where local optimality does not guarantee global optimality; use DP instead
- Proving greedy correctness typically uses an exchange argument: show that any optimal solution can be transformed to match the greedy solution