Notes/NP-Hard and Approximation Algorithms

NP-Hard and Approximation Algorithms

When exact solutions are impossible at scale, approximation algorithms provide provable guarantees. Covering NP-hardness, the traveling salesman problem, knapsack, and greedy/DP approximation strategies.

2025-10-19AI-Synthesized from Personal Notes

Theory of ComputationNp HardApproximation Algorithms

Terminology

Term	Definition
NP-Hard	A problem at least as hard as every problem in NP; it need not be in NP itself (its solutions may not even be verifiable in polynomial time)
Approximation algorithm	A polynomial-time algorithm that finds a solution within a provable factor of the optimal solution
Approximation ratio ($\alpha$)	The worst-case ratio between the approximate solution and the optimal solution; $\alpha = 1$ means exact
PTAS	Polynomial-Time Approximation Scheme: an algorithm that achieves a $(1+\epsilon)$ ratio for any $\epsilon > 0$, with running time polynomial in $n$ (but possibly exponential in $1/\epsilon$)
FPTAS	Fully Polynomial-Time Approximation Scheme: a PTAS where the running time is also polynomial in $1/\epsilon$
TSP (Traveling Salesman Problem)	Find the shortest tour visiting every city exactly once and returning to the start; NP-hard in general
Metric TSP	TSP where distances satisfy the triangle inequality; admits a 1.5-approximation (Christofides' algorithm)
Inapproximability	A proof that no polynomial-time algorithm can achieve better than a certain approximation ratio (unless P = NP)
Greedy heuristic	An algorithm that makes the locally optimal choice at each step; sometimes provides provable approximation guarantees

What & Why

NP-hard problems are at least as hard as the hardest problems in NP. Unlike NP-complete problems, NP-hard problems do not need to be in NP at all. The Halting Problem, for example, is NP-hard but undecidable, meaning it cannot even be solved in any finite time.

In practice, many optimization problems you encounter are NP-hard: finding the shortest delivery route, packing items into bins, scheduling jobs on machines. Since exact solutions require exponential time, you need a different strategy. Approximation algorithms provide that strategy: they run in polynomial time and guarantee that the solution is within a known factor of optimal.

The beauty of approximation algorithms is that the guarantee is provable, not just empirical. A 2-approximation for vertex cover means the algorithm will never return a cover more than twice the size of the optimal one, no matter the input.

How It Works

NP-Hard vs NP-Complete

NP-complete problems are the intersection of NP and NP-hard. Problems that are NP-hard but not in NP are even harder: you cannot even verify a solution efficiently.

Approximation Strategies

Strategy 1: Greedy with guarantees. The greedy vertex cover algorithm picks an edge, adds both endpoints, removes all covered edges, and repeats. This gives a 2-approximation: the cover is at most twice the optimal size.

Strategy 2: LP relaxation and rounding. Relax integer constraints to continuous ones, solve the linear program in polynomial time, then round the fractional solution to integers. The rounding introduces a bounded error.

Strategy 3: Dynamic programming on restricted inputs. The 0/1 knapsack problem has a pseudo-polynomial DP solution in $O(nW)$. By scaling and rounding weights, you get an FPTAS with $(1+\epsilon)$ approximation ratio.

The Approximation Landscape

Complexity Analysis

Problem	Exact Time	Approx. Ratio	Approx. Time
Vertex Cover	$O(2^k \cdot n)$	2	$O(V + E)$
Metric TSP	$O(n^2 \cdot 2^n)$	1.5 (Christofides)	$O(n^3)$
0/1 Knapsack	$O(nW)$ pseudo-poly	$(1+\epsilon)$ FPTAS	$O(n^2 / \epsilon)$
Set Cover	$O(2^n)$	$O(\ln n)$	$O(n \cdot m)$
General TSP	$O(n^2 \cdot 2^n)$	No constant ratio possible	N/A

Implementation

ALGORITHM GreedyVertexCover(graph)
INPUT: graph: undirected graph (V, E)
OUTPUT: vertex cover C (set of vertices covering all edges)
BEGIN
  C <- empty set
  E_remaining <- copy of E

  WHILE E_remaining is not empty DO
    Pick any edge (u, v) from E_remaining
    C <- C UNION {u, v}
    Remove all edges incident to u or v from E_remaining
  END WHILE

  RETURN C
END

ALGORITHM GreedyKnapsack(items, capacity)
INPUT: items: list of (weight, value) pairs, capacity: W
OUTPUT: approximate maximum value
BEGIN
  SORT items by value/weight ratio in decreasing order
  totalValue <- 0
  remainingCapacity <- capacity

  FOR EACH (w, v) IN items DO
    IF w <= remainingCapacity THEN
      totalValue <- totalValue + v
      remainingCapacity <- remainingCapacity - w
    END IF
  END FOR

  RETURN totalValue
END

ALGORITHM NearestNeighborTSP(cities, distanceMatrix)
INPUT: cities: list of n cities, distanceMatrix: n x n distance table
OUTPUT: tour visiting all cities
BEGIN
  visited <- {cities[0]}
  tour <- [cities[0]]
  current <- cities[0]

  WHILE |visited| < n DO
    nearest <- city in (cities - visited) minimizing distanceMatrix[current][city]
    tour <- tour + [nearest]
    visited <- visited UNION {nearest}
    current <- nearest
  END WHILE

  tour <- tour + [cities[0]]
  RETURN tour
END

Real-World Applications

Delivery routing: companies like UPS and FedEx solve TSP variants daily using approximation algorithms and local search heuristics; even a 1% improvement saves millions in fuel costs
Cloud resource allocation: bin packing (NP-hard) determines how virtual machines are placed on physical servers; first-fit decreasing gives a provable approximation
Network design: Steiner tree problems (connecting a subset of nodes at minimum cost) are NP-hard; 2-approximation algorithms are used in VLSI routing and telecommunications
Warehouse optimization: knapsack variants determine which products to stock given limited shelf space, solved with FPTAS-based approaches
Genome assembly: reconstructing DNA sequences from short reads involves NP-hard shortest superstring problems, tackled with greedy approximations

Key Takeaways

NP-hard problems are at least as hard as everything in NP, but they may not even be in NP (solutions might not be verifiable in polynomial time)
Approximation algorithms trade exactness for speed: they run in polynomial time and guarantee a solution within a provable factor of optimal
The approximation landscape ranges from FPTAS (arbitrarily close to optimal) to inapproximable (no constant-factor guarantee exists unless P = NP)
Greedy algorithms often provide surprisingly good approximation ratios: 2x for vertex cover, $O(\ln n)$ for set cover
Recognizing a problem as NP-hard is not a dead end; it is a signal to choose the right approximation strategy for your accuracy and speed requirements

2025-10-20

Finite Automata and Regular Languages

DFAs, NFAs, regular expressions under the hood, the equivalence between them, and the Pumping Lemma for proving what finite automata cannot do.

2025-10-21

Context-Free Grammars and Parsing

Context-free grammars, pushdown automata, why programming languages are context-free, the Chomsky hierarchy, and how parsers turn source code into syntax trees.

2025-10-22

Turing Machines and Computability

The universal Turing machine, the Church-Turing thesis, the Halting Problem, undecidability, and why some problems can never be solved by any computer.