NP-Complete Problems: The Hardest Problems in NP
Understanding NP-completeness, polynomial-time reductions, the Cook-Levin theorem, SAT, 3-SAT, and why proving any single NP-complete problem is in P would solve them all.
Terminology
| Term | Definition |
|---|---|
| NP-Complete | A problem that is both in NP and NP-hard: it is verifiable in polynomial time and at least as hard as every other problem in NP |
| NP-Hard | A problem at least as hard as every problem in NP; it may or may not be in NP itself |
| Polynomial-time reduction | A transformation from problem A to problem B in polynomial time such that a "yes" for A maps to a "yes" for B and vice versa, written $A \le_p B$ |
| SAT (Boolean satisfiability) | Given a Boolean formula, is there an assignment of TRUE/FALSE to variables that makes the formula evaluate to TRUE? |
| 3-SAT | SAT restricted to CNF formulas where every clause has exactly 3 literals; still NP-complete |
| CNF (Conjunctive Normal Form) | A Boolean formula expressed as an AND of clauses, where each clause is an OR of literals |
| Cook-Levin theorem | The foundational result proving that SAT is NP-complete, establishing the first NP-complete problem |
| Karp's 21 problems | Richard Karp's 1972 list of 21 problems shown to be NP-complete via reductions from SAT |
| Literal | A Boolean variable or its negation, e.g. $x_1$ or $\neg x_1$ |
What & Why
NP-complete problems are the hardest problems in NP. They sit at the top of the difficulty hierarchy: if you could solve any single NP-complete problem in polynomial time, you could solve every problem in NP in polynomial time, which would prove P = NP.
This is powerful because it means all NP-complete problems are computationally equivalent. Solving one solves them all. Conversely, if any one of them is proven to require exponential time, they all do.
For engineers, recognizing that a problem is NP-complete is a signal to stop looking for an exact polynomial-time algorithm and start looking for approximations, heuristics, or special-case restrictions. It is not a dead end; it is a redirect.
How It Works
What Makes a Problem NP-Complete?
A problem L is NP-complete if:
L \in NP(solutions can be verified in polynomial time)- Every problem in NP can be reduced to
Lin polynomial time (Lis NP-hard)
The Cook-Levin Theorem
In 1971, Stephen Cook (and independently Leonid Levin) proved that SAT is NP-complete. The proof works by showing that any computation of a nondeterministic Turing machine can be encoded as a Boolean formula. If the machine accepts, the formula is satisfiable. This was the first NP-complete problem ever identified.
The Reduction Chain
Once SAT was proven NP-complete, other problems were shown NP-complete by reducing SAT (or another known NP-complete problem) to them:
SAT and 3-SAT in Detail
SAT: Given a Boolean formula in CNF, is there a satisfying assignment?
Example: (x_1 \lor \neg x_2) \land (\neg x_1 \lor x_3) \land (x_2 \lor x_3)
Setting x_1 = T, x_2 = T, x_3 = T satisfies all three clauses.
3-SAT: Same as SAT but every clause has exactly 3 literals. Any SAT instance can be converted to 3-SAT in polynomial time (by splitting longer clauses with auxiliary variables). 3-SAT is NP-complete, but 2-SAT is in P, solvable in O(n) using implication graphs.
How Reductions Work
To prove problem B is NP-complete:
- Show B is in NP (give a polynomial-time verifier)
- Pick a known NP-complete problem A
- Show a polynomial-time transformation from any instance of A to an instance of B
- Prove: A has a "yes" answer if and only if the transformed B instance has a "yes" answer
Complexity Analysis
| NP-Complete Problem | Best Known Exact | Verification | Reduced From |
|---|---|---|---|
| SAT | $O(2^n)$ | $O(n \cdot m)$ | Cook-Levin (direct proof) |
| 3-SAT | $O(1.33^n)$ | $O(n)$ | SAT |
| Clique | $O(2^{n/3})$ | $O(k^2)$ | 3-SAT |
| Vertex Cover | $O(1.28^k \cdot n)$ | $O(V + E)$ | 3-SAT / Clique |
| Subset Sum | $O(2^{n/2})$ | $O(n)$ | 3-SAT / Vertex Cover |
The key insight: all these problems have exponential solving times but polynomial verification times. A polynomial-time algorithm for any one of them would give polynomial-time algorithms for all of them.
Implementation
ALGORITHM Reduce3SATtoClique(formula)
INPUT: formula: 3-SAT instance with m clauses over n variables
OUTPUT: graph G and integer k such that formula is satisfiable iff G has a k-clique
BEGIN
k <- m
V <- empty set
E <- empty set
FOR i <- 1 TO m DO
FOR EACH literal l IN clause_i DO
ADD node (i, l) TO V
END FOR
END FOR
FOR EACH pair of nodes (i, l1) and (j, l2) WHERE i != j DO
IF l1 != NOT(l2) THEN
ADD edge ((i, l1), (j, l2)) TO E
END IF
END FOR
RETURN (Graph(V, E), k)
END
ALGORITHM VerifyNPCompleteness(problem)
INPUT: problem: a decision problem
OUTPUT: steps to prove NP-completeness
BEGIN
STEP 1: Show problem is in NP
Define a certificate (witness) for "yes" instances
Show a verifier checks the certificate in polynomial time
STEP 2: Choose a known NP-complete problem A
Pick the problem whose structure most closely matches
STEP 3: Construct polynomial-time reduction A ->p problem
Transform any instance of A into an instance of problem
Prove: A is "yes" if and only if problem is "yes"
STEP 4: Verify reduction runs in polynomial time
The transformation itself must be O(n^k) for some constant k
END
ALGORITHM BruteForceSAT(formula, variables)
INPUT: formula: CNF with m clauses, variables: list of n Boolean variables
OUTPUT: satisfying assignment or UNSATISFIABLE
BEGIN
FOR EACH assignment IN {TRUE, FALSE}^n DO
all_satisfied <- TRUE
FOR EACH clause IN formula DO
IF NOT EvaluateClause(clause, assignment) THEN
all_satisfied <- FALSE
BREAK
END IF
END FOR
IF all_satisfied THEN
RETURN assignment
END IF
END FOR
RETURN UNSATISFIABLE
END
Real-World Applications
- SAT solvers: despite worst-case exponential time, modern SAT solvers (MiniSat, Z3) handle industrial instances with millions of variables using clever heuristics like DPLL and CDCL
- Hardware verification: chip manufacturers encode circuit correctness as SAT instances and use solvers to find bugs before fabrication
- Scheduling: job-shop scheduling, exam timetabling, and airline crew scheduling are NP-complete; airlines use constraint programming and integer linear programming relaxations
- Network design: finding minimum vertex covers for network monitoring, or maximum cliques for community detection in social networks
- Bioinformatics: sequence alignment and phylogenetic tree construction involve NP-complete subproblems, solved with dynamic programming heuristics
Key Takeaways
- NP-complete problems are the hardest problems in NP: solving any one in polynomial time solves them all
- The Cook-Levin theorem established SAT as the first NP-complete problem by encoding Turing machine computation as Boolean formulas
- Polynomial-time reductions ($A \le_p B$) are the tool for proving new problems NP-complete: transform A into B preserving yes/no answers
- 3-SAT is NP-complete but 2-SAT is in P, showing that small structural changes can dramatically shift complexity
- Recognizing NP-completeness in practice is a signal to use approximation algorithms, heuristics, or SAT/ILP solvers rather than seeking exact polynomial-time solutions