NP-Hard and Approximation Algorithms
When exact solutions are impossible at scale, approximation algorithms provide provable guarantees. Covering NP-hardness, the traveling salesman problem, knapsack, and greedy/DP approximation strategies.
Terminology
| Term | Definition |
|---|---|
| NP-Hard | A problem at least as hard as every problem in NP; it need not be in NP itself (its solutions may not even be verifiable in polynomial time) |
| Approximation algorithm | A polynomial-time algorithm that finds a solution within a provable factor of the optimal solution |
| Approximation ratio ($\alpha$) | The worst-case ratio between the approximate solution and the optimal solution; $\alpha = 1$ means exact |
| PTAS | Polynomial-Time Approximation Scheme: an algorithm that achieves a $(1+\epsilon)$ ratio for any $\epsilon > 0$, with running time polynomial in $n$ (but possibly exponential in $1/\epsilon$) |
| FPTAS | Fully Polynomial-Time Approximation Scheme: a PTAS where the running time is also polynomial in $1/\epsilon$ |
| TSP (Traveling Salesman Problem) | Find the shortest tour visiting every city exactly once and returning to the start; NP-hard in general |
| Metric TSP | TSP where distances satisfy the triangle inequality; admits a 1.5-approximation (Christofides' algorithm) |
| Inapproximability | A proof that no polynomial-time algorithm can achieve better than a certain approximation ratio (unless P = NP) |
| Greedy heuristic | An algorithm that makes the locally optimal choice at each step; sometimes provides provable approximation guarantees |
What & Why
NP-hard problems are at least as hard as the hardest problems in NP. Unlike NP-complete problems, NP-hard problems do not need to be in NP at all. The Halting Problem, for example, is NP-hard but undecidable, meaning it cannot even be solved in any finite time.
In practice, many optimization problems you encounter are NP-hard: finding the shortest delivery route, packing items into bins, scheduling jobs on machines. Since exact solutions require exponential time, you need a different strategy. Approximation algorithms provide that strategy: they run in polynomial time and guarantee that the solution is within a known factor of optimal.
The beauty of approximation algorithms is that the guarantee is provable, not just empirical. A 2-approximation for vertex cover means the algorithm will never return a cover more than twice the size of the optimal one, no matter the input.
How It Works
NP-Hard vs NP-Complete
NP-complete problems are the intersection of NP and NP-hard. Problems that are NP-hard but not in NP are even harder: you cannot even verify a solution efficiently.
Approximation Strategies
Strategy 1: Greedy with guarantees. The greedy vertex cover algorithm picks an edge, adds both endpoints, removes all covered edges, and repeats. This gives a 2-approximation: the cover is at most twice the optimal size.
Strategy 2: LP relaxation and rounding. Relax integer constraints to continuous ones, solve the linear program in polynomial time, then round the fractional solution to integers. The rounding introduces a bounded error.
Strategy 3: Dynamic programming on restricted inputs. The 0/1 knapsack problem has a pseudo-polynomial DP solution in O(nW). By scaling and rounding weights, you get an FPTAS with (1+\epsilon) approximation ratio.
The Approximation Landscape
Complexity Analysis
| Problem | Exact Time | Approx. Ratio | Approx. Time |
|---|---|---|---|
| Vertex Cover | $O(2^k \cdot n)$ | 2 | $O(V + E)$ |
| Metric TSP | $O(n^2 \cdot 2^n)$ | 1.5 (Christofides) | $O(n^3)$ |
| 0/1 Knapsack | $O(nW)$ pseudo-poly | $(1+\epsilon)$ FPTAS | $O(n^2 / \epsilon)$ |
| Set Cover | $O(2^n)$ | $O(\ln n)$ | $O(n \cdot m)$ |
| General TSP | $O(n^2 \cdot 2^n)$ | No constant ratio possible | N/A |
Implementation
ALGORITHM GreedyVertexCover(graph)
INPUT: graph: undirected graph (V, E)
OUTPUT: vertex cover C (set of vertices covering all edges)
BEGIN
C <- empty set
E_remaining <- copy of E
WHILE E_remaining is not empty DO
Pick any edge (u, v) from E_remaining
C <- C UNION {u, v}
Remove all edges incident to u or v from E_remaining
END WHILE
RETURN C
END
ALGORITHM GreedyKnapsack(items, capacity)
INPUT: items: list of (weight, value) pairs, capacity: W
OUTPUT: approximate maximum value
BEGIN
SORT items by value/weight ratio in decreasing order
totalValue <- 0
remainingCapacity <- capacity
FOR EACH (w, v) IN items DO
IF w <= remainingCapacity THEN
totalValue <- totalValue + v
remainingCapacity <- remainingCapacity - w
END IF
END FOR
RETURN totalValue
END
ALGORITHM NearestNeighborTSP(cities, distanceMatrix)
INPUT: cities: list of n cities, distanceMatrix: n x n distance table
OUTPUT: tour visiting all cities
BEGIN
visited <- {cities[0]}
tour <- [cities[0]]
current <- cities[0]
WHILE |visited| < n DO
nearest <- city in (cities - visited) minimizing distanceMatrix[current][city]
tour <- tour + [nearest]
visited <- visited UNION {nearest}
current <- nearest
END WHILE
tour <- tour + [cities[0]]
RETURN tour
END
Real-World Applications
- Delivery routing: companies like UPS and FedEx solve TSP variants daily using approximation algorithms and local search heuristics; even a 1% improvement saves millions in fuel costs
- Cloud resource allocation: bin packing (NP-hard) determines how virtual machines are placed on physical servers; first-fit decreasing gives a provable approximation
- Network design: Steiner tree problems (connecting a subset of nodes at minimum cost) are NP-hard; 2-approximation algorithms are used in VLSI routing and telecommunications
- Warehouse optimization: knapsack variants determine which products to stock given limited shelf space, solved with FPTAS-based approaches
- Genome assembly: reconstructing DNA sequences from short reads involves NP-hard shortest superstring problems, tackled with greedy approximations
Key Takeaways
- NP-hard problems are at least as hard as everything in NP, but they may not even be in NP (solutions might not be verifiable in polynomial time)
- Approximation algorithms trade exactness for speed: they run in polynomial time and guarantee a solution within a provable factor of optimal
- The approximation landscape ranges from FPTAS (arbitrarily close to optimal) to inapproximable (no constant-factor guarantee exists unless P = NP)
- Greedy algorithms often provide surprisingly good approximation ratios: 2x for vertex cover, $O(\ln n)$ for set cover
- Recognizing a problem as NP-hard is not a dead end; it is a signal to choose the right approximation strategy for your accuracy and speed requirements