Notes/Greedy Algorithms: Activity Selection, Huffman Coding, and Greedy vs DP

Greedy Algorithms: Activity Selection, Huffman Coding, and Greedy vs DP

When making the locally optimal choice at each step leads to a globally optimal solution, and when it does not.

2021-04-06AI-Synthesized from Personal Notes

AlgorithmsGreedy

Terminology

Greedy algorithm: an algorithm that builds a solution incrementally, making the locally optimal choice at each step without reconsidering previous decisions.
Greedy choice property: a problem has this property if a globally optimal solution can always be reached by making locally optimal choices. This is what makes greedy algorithms correct.
Optimal substructure: after making a greedy choice, the remaining problem is a smaller instance of the same problem. Shared with dynamic programming.
Activity selection: given a set of activities with start and finish times, select the maximum number of non-overlapping activities.
Huffman coding: a lossless data compression algorithm that assigns variable-length binary codes to characters based on their frequency. More frequent characters get shorter codes.
Prefix-free code: a coding scheme where no codeword is a prefix of another, ensuring unambiguous decoding. Huffman codes are always prefix-free.
Matroid: a mathematical structure that generalizes when greedy algorithms produce optimal solutions. If a problem's feasible sets form a matroid, greedy works.
Exchange argument: a proof technique for greedy algorithms. You assume an optimal solution differs from the greedy solution, then show you can "exchange" elements to make it match the greedy solution without worsening it.

What & Why

Greedy algorithms are the simplest optimization strategy: at each step, pick the option that looks best right now. No backtracking, no exploring alternatives, no storing subproblem results. Just commit to the locally optimal choice and move on.

This sounds too simple to work, and often it does not. The key question is: when does greedy produce a globally optimal solution?

The answer: when the problem has both the greedy choice property and optimal substructure. The greedy choice property means there exists an optimal solution that includes the greedy choice. Optimal substructure means the remaining problem after the greedy choice is a smaller instance of the same problem.

Greedy vs DP:

Both require optimal substructure
DP explores all subproblems and combines their solutions
Greedy commits to one choice per step and never looks back
Greedy is faster (often $O(n \log n)$ or $O(n)$) but only works when the greedy choice property holds
When greedy does not work, you typically need DP

How It Works

Activity Selection

Given $n$ activities with start times $s_i$ and finish times $f_i$, select the maximum number of non-overlapping activities.

The greedy strategy: sort activities by finish time, then greedily pick the activity that finishes earliest and does not overlap with the last selected activity.

Why does this work? If you pick the activity that finishes earliest, you leave the maximum remaining time for future activities. The exchange argument formalizes this: any optimal solution that does not start with the earliest-finishing activity can be modified to do so without reducing the count.

Huffman Coding

Huffman coding builds an optimal prefix-free binary code for a set of characters with known frequencies. The algorithm uses a priority queue (min-heap):

Create a leaf node for each character with its frequency
Repeatedly extract the two nodes with the lowest frequency
Create a new internal node with these two as children, frequency = sum of their frequencies
Insert the new node back into the priority queue
When one node remains, it is the root of the Huffman tree

Left edges represent 0, right edges represent 1. The code for each character is the path from root to its leaf.

When Greedy Fails

Consider the 0/1 Knapsack problem. A greedy approach (pick items by highest value-to-weight ratio) does not always work. Example: items with (weight=1, value=1) and (weight=2, value=3), capacity=2. Greedy picks the first item (ratio 1.0), getting value 1. But taking only the second item gives value 3.

The fractional knapsack (where you can take fractions of items) does yield to greedy. The 0/1 version requires DP.

Complexity Analysis

Algorithm	Time	Space	Notes
Activity Selection	$O(n \log n)$	$O(1)$	Dominated by sorting
Huffman Coding	$O(n \log n)$	$O(n)$	Priority queue operations
Fractional Knapsack	$O(n \log n)$	$O(1)$	Sort by value/weight ratio
Kruskal's MST	$O(E \log E)$	$O(V)$	Greedy edge selection + union-find

Huffman Optimality

Huffman coding produces the optimal prefix-free code. The expected code length is:

$L = \sum_{i=1}^{n} f_i \cdot l_i$

where $f_i$ is the frequency and $l_i$ is the code length for character $i$. Huffman minimizes $L$ among all prefix-free codes. The result satisfies:

$H(X) \leq L < H(X) + 1$

where $H(X) = -\sum f_i \log_2 f_i$ is the Shannon entropy, the theoretical minimum bits per symbol.

Implementation

Activity Selection

ALGORITHM ActivitySelection(activities, n)
  INPUT: array of n activities, each with start and finish time
  OUTPUT: maximum set of non-overlapping activities

  SORT activities by finish time ascending

  selected = [activities[0]]
  lastFinish = activities[0].finish

  FOR i = 1 TO n - 1 DO
    IF activities[i].start >= lastFinish THEN
      APPEND activities[i] to selected
      lastFinish = activities[i].finish
    END IF
  END FOR

  RETURN selected
END

Huffman Coding

ALGORITHM HuffmanBuild(chars, freqs, n)
  INPUT: array of n characters with their frequencies
  OUTPUT: root of the Huffman tree

  CREATE min-priority queue Q

  FOR i = 0 TO n - 1 DO
    CREATE leaf node with char = chars[i], freq = freqs[i]
    INSERT node into Q
  END FOR

  WHILE SIZE(Q) > 1 DO
    left = EXTRACT_MIN(Q)
    right = EXTRACT_MIN(Q)

    CREATE internal node with:
      freq = left.freq + right.freq
      leftChild = left
      rightChild = right

    INSERT node into Q
  END WHILE

  RETURN EXTRACT_MIN(Q)   // root of Huffman tree
END

ALGORITHM HuffmanCodes(node, currentCode, codeTable)
  INPUT: Huffman tree node, current binary string, output table
  OUTPUT: populates codeTable with character-to-code mappings

  IF node is a leaf THEN
    codeTable[node.char] = currentCode
    RETURN
  END IF

  HuffmanCodes(node.leftChild, currentCode + "0", codeTable)
  HuffmanCodes(node.rightChild, currentCode + "1", codeTable)
END

Fractional Knapsack

ALGORITHM FractionalKnapsack(items, n, W)
  INPUT: array of n items (weight, value), capacity W
  OUTPUT: maximum total value

  SORT items by value/weight ratio descending

  totalValue = 0
  remainingCapacity = W

  FOR i = 0 TO n - 1 DO
    IF items[i].weight <= remainingCapacity THEN
      // Take the whole item
      totalValue = totalValue + items[i].value
      remainingCapacity = remainingCapacity - items[i].weight
    ELSE
      // Take a fraction
      fraction = remainingCapacity / items[i].weight
      totalValue = totalValue + items[i].value * fraction
      BREAK
    END IF
  END FOR

  RETURN totalValue
END

Real-World Applications

Data compression: Huffman coding is used in JPEG, PNG (via DEFLATE), and MP3. Modern variants like arithmetic coding extend the same greedy principle.
Task scheduling: operating systems use greedy scheduling (shortest job first, earliest deadline first) to minimize average wait time or meet deadlines.
Network routing: Dijkstra's shortest path algorithm is greedy, always expanding the nearest unvisited node.
Minimum spanning trees: both Kruskal's and Prim's algorithms are greedy, selecting the cheapest edge that does not create a cycle.
Coin change (specific denominations): for standard currency denominations (1, 5, 10, 25 cents), the greedy approach of always picking the largest coin works. For arbitrary denominations, it does not, and DP is needed.

Key Takeaways

Greedy algorithms make the locally optimal choice at each step and never reconsider, making them simple and fast
They work when the problem has both the greedy choice property and optimal substructure
Activity selection (sort by finish time) and Huffman coding (merge lowest frequencies) are the two classic greedy algorithms
Greedy fails for problems like 0/1 Knapsack where local optimality does not guarantee global optimality; use DP instead
Proving greedy correctness typically uses an exchange argument: show that any optimal solution can be transformed to match the greedy solution

2021-04-07

Shortest Path Algorithms: Dijkstra's and Bellman-Ford

Finding the shortest path in weighted graphs with Dijkstra's algorithm for non-negative weights and Bellman-Ford for graphs with negative edges.

2021-04-08

Minimum Spanning Trees: Kruskal's, Prim's, and Union-Find

Connecting all vertices at minimum total cost with Kruskal's and Prim's algorithms, powered by the union-find data structure.

2021-04-09

Topological Sort, Strongly Connected Components, and Tarjan's Algorithm

Advanced graph algorithms for directed graphs: ordering dependencies with topological sort and finding strongly connected components with Tarjan's algorithm.

Greedy Algorithms: Activity Selection, Huffman Coding, and Greedy vs DP

When making the locally optimal choice at each step leads to a globally optimal solution, and when it does not.

Terminology

What & Why

How It Works

Activity Selection

Huffman Coding

When Greedy Fails

Complexity Analysis

Huffman Optimality

Implementation

Activity Selection

Huffman Coding

Fractional Knapsack

Real-World Applications

Key Takeaways

Read More