Back to Blog

P, NP, NP-Hard, and NP-Complete: The Complexity Zoo

Understanding the major complexity classes, what makes a problem polynomial or non-polynomial, the P vs NP question, NP-hardness, NP-completeness, and how they all relate.

2025-07-29
Share
Proofsproofscomplexityp-vs-npnp-complete

Terminology

Term Definition
Decision problem A problem with a yes/no answer; e.g., "does this graph have a Hamiltonian cycle?"
Polynomial time An algorithm runs in polynomial time if its worst-case runtime is bounded by a polynomial in the input size, such as O(n), O(n^2), or O(n^3)
P (Polynomial) The class of decision problems solvable by a deterministic Turing machine in polynomial time
NP (Nondeterministic Polynomial) The class of decision problems whose "yes" answers can be verified in polynomial time given a certificate (witness)
NP-hard A problem is NP-hard if every problem in NP can be reduced to it in polynomial time; NP-hard problems are at least as hard as the hardest problems in NP
NP-complete A problem that is both in NP and NP-hard; the hardest problems within NP, sitting at the intersection
Polynomial-time reduction A transformation from problem A to problem B computable in polynomial time; if B is easy, then A is easy too
Certificate (witness) A piece of evidence that lets you verify a "yes" answer quickly; e.g., a specific Hamiltonian cycle in a graph
Cook-Levin theorem The 1971 result proving that Boolean satisfiability (SAT) is NP-complete, the first problem shown to have this property

What & Why

Computer scientists classify problems by how hard they are to solve. The most important dividing line is between problems solvable in polynomial time (fast, practical) and those that apparently require exponential time (slow, impractical for large inputs). The complexity classes P, NP, NP-hard, and NP-complete formalize this distinction.

P contains the "easy" problems: sorting, shortest paths, matrix multiplication. NP contains problems where you might not know how to find a solution quickly, but you can check a proposed solution quickly. The million-dollar question (literally, it is a Millennium Prize Problem) is whether P = NP: can every problem with a quickly checkable solution also be quickly solved?

NP-complete problems are the hardest problems in NP. If you could solve any one of them in polynomial time, you could solve all of NP in polynomial time, proving P = NP. NP-hard problems are at least as hard as NP-complete, but they might not even be in NP (their solutions might not be quickly verifiable). Understanding these classes is essential for every programmer, because recognizing that a problem is NP-hard tells you to stop looking for a perfect algorithm and start looking for approximations, heuristics, or special-case solutions.

How It Works

The Classes Visualized

The relationship between P, NP, NP-hard, and NP-complete is best understood as a Venn diagram. Assuming P != NP (which most researchers believe):

Complexity Classes (assuming P != NP) NP (verifiable in poly time) NP-Hard (as hard as NP) P (solvable in poly time) Sorting, Shortest Path Matching, Primality NP-Complete SAT, 3-Coloring TSP (decision), Clique Halting Problem TSP (optimization) General Tiling NP-Hard but not in NP NP-intermediate? (Factoring, Graph Iso) NP-Complete = NP intersection NP-Hard (the hardest verifiable problems)

Common Problems by Complexity Class

Problem Class Best Known Domain
Sorting P $O(n \log n)$ General computing
Shortest path (Dijkstra) P $O((V+E)\log V)$ Navigation, networking
Maximum bipartite matching P $O(V \cdot E)$ Job assignment, dating apps
Primality testing P $O(\log^6 n)$ (AKS) Cryptography
Linear programming P Polynomial (interior point) Optimization, logistics
2-SAT P $O(V + E)$ Constraint satisfaction
Minimum spanning tree P $O(E \log V)$ Network design
Binary search P $O(\log n)$ Databases, search
Topological sort P $O(V + E)$ Build systems, task scheduling
Integer factoring NP (intermediate?) $O(e^{(\log n)^{1/3}})$ (GNFS) RSA cryptography
Graph isomorphism NP (intermediate?) Quasi-polynomial (Babai) Chemistry, pattern matching
Discrete logarithm NP (intermediate?) Sub-exponential Diffie-Hellman, ECC
Boolean satisfiability (SAT) NP-complete $O(1.307^n)$ for 3-SAT Verification, planning
Traveling salesman (decision) NP-complete $O(2^n \cdot n^2)$ Logistics, delivery routing
Graph 3-coloring NP-complete $O(1.3289^n)$ Register allocation, scheduling
Hamiltonian cycle NP-complete $O(2^n \cdot n^2)$ Circuit board testing
Vertex cover NP-complete $O(1.2^n)$ Network monitoring
Subset sum NP-complete $O(2^{n/2})$ (meet-in-middle) Budgeting, bin packing
Clique (find k-clique) NP-complete $O(1.1888^n)$ Social networks, bioinformatics
Knapsack (0/1) NP-complete $O(nW)$ pseudo-polynomial Resource allocation, cargo loading
Set cover NP-complete $O(2^n)$; greedy gives $O(\ln n)$ approx Sensor placement, testing
Job-shop scheduling NP-complete Exponential Manufacturing, OS scheduling
Bin packing NP-complete Exponential; FFD gives 11/9 approx Cloud VM placement, shipping
Steiner tree NP-complete $O(3^k \cdot n)$ (FPT in terminals) VLSI design, network wiring
Protein folding (lattice) NP-complete Exponential Bioinformatics, drug design
TSP optimization (find shortest tour) NP-hard (not in NP) $O(2^n \cdot n^2)$ Delivery routing, logistics
Halting problem NP-hard (undecidable) No algorithm exists Program analysis
Optimal code generation NP-hard Exponential Compilers
General game playing NP-hard (PSPACE/EXPTIME) Exponential+ AI, game engines
Minimum circuit size NP-hard Exponential Circuit optimization
Tiling (Wang tiles) NP-hard (undecidable) No algorithm exists Geometry, materials science

White = P, yellow = NP-intermediate candidates, light red = NP-complete, dark red = NP-hard (outside NP)

P: Problems We Can Solve Fast

A problem is in P if a deterministic algorithm solves it in time O(n^k) for some constant k. These are the "tractable" problems:

  • Sorting an array: O(n \log n)
  • Finding shortest paths (Dijkstra): O((V + E) \log V)
  • Determining if a number is prime (AKS): O(\log^6 n)
  • Maximum matching in a graph: O(V \cdot E)
  • Linear programming: polynomial via interior point methods

Every problem in P is also in NP (if you can solve it fast, you can certainly verify a solution fast).

NP: Problems We Can Verify Fast

A problem is in NP if, given a "yes" instance and a polynomial-size certificate, you can verify the answer in polynomial time. The certificate is the key: it is a proof that the answer is "yes."

Example: "Does this graph have a Hamiltonian cycle?" is in NP. If someone hands you a specific cycle (the certificate), you can check in O(n) time that it visits every vertex exactly once. But finding such a cycle from scratch might take exponential time.

NP stands for "nondeterministic polynomial," because a nondeterministic Turing machine could "guess" the certificate and verify it in polynomial time.

NP-Complete: The Hardest Problems in NP

A problem is NP-complete if:

  1. It is in NP (solutions are verifiable in polynomial time).
  2. Every problem in NP can be reduced to it in polynomial time (it is NP-hard).

NP-complete problems are the hardest problems within NP. They are all equivalent in difficulty: if you solve one in polynomial time, you solve them all (and prove P = NP).

The Cook-Levin theorem (1971) proved that SAT (Boolean satisfiability) is NP-complete. Since then, thousands of problems have been shown NP-complete by reduction from SAT or other known NP-complete problems:

  • 3-SAT (satisfiability with 3-literal clauses)
  • Graph 3-coloring
  • Hamiltonian cycle
  • Subset sum
  • Traveling salesman (decision version: "is there a tour shorter than k?")
  • Vertex cover
  • Clique

NP-Hard: At Least as Hard as NP

A problem is NP-hard if every NP problem reduces to it, but it does not have to be in NP itself. NP-hard problems that are not in NP include:

  • The halting problem (undecidable, not even in NP)
  • Traveling salesman optimization ("find the shortest tour," not a yes/no question)
  • General tiling problems
  • Optimal scheduling with arbitrary constraints

NP-complete is the intersection: NP-hard AND in NP.

NP-Intermediate: The Gray Zone

If P != NP, Ladner's theorem (1975) guarantees that NP-intermediate problems exist: problems in NP that are neither in P nor NP-complete. Candidates include:

  • Integer factoring (used in RSA, no known polynomial algorithm, but not proven NP-complete)
  • Graph isomorphism (recently shown to be in quasi-polynomial time by Babai)
  • Discrete logarithm

These problems are suspected to be harder than P but easier than NP-complete.

Complexity Analysis

Class Solvable? Verifiable? Example
P Yes, in poly time Yes, in poly time Sorting, shortest path
NP Unknown (maybe exponential) Yes, in poly time Hamiltonian cycle, SAT
NP-complete Unknown (hardest in NP) Yes, in poly time 3-SAT, vertex cover, clique
NP-hard Unknown or undecidable Not necessarily Halting problem, TSP optimization
NP-intermediate Unknown Yes, in poly time Factoring, graph isomorphism

The chain of containment (assuming P != NP):

$P \subsetneq NP \subsetneq PSPACE \subseteq EXP$

The best known algorithms for NP-complete problems:

$\text{3-SAT: } O(1.307^n) \quad \text{TSP: } O(2^n \cdot n^2) \quad \text{Clique: } O(1.1888^n)$

All exponential. No polynomial algorithm is known for any NP-complete problem.

Implementation

ALGORITHM VerifySAT(formula, assignment)
INPUT: formula: a Boolean formula in CNF (list of clauses)
       assignment: a proposed truth assignment (the certificate)
OUTPUT: true if the assignment satisfies the formula
BEGIN
  // This runs in O(n * m) where n = variables, m = clauses
  // Verification is polynomial, proving SAT is in NP

  FOR EACH clause IN formula DO
    clauseSatisfied <- false
    FOR EACH literal IN clause DO
      variable <- VARIABLE_OF(literal)
      isNegated <- IS_NEGATED(literal)
      value <- assignment[variable]
      IF isNegated THEN value <- NOT value
      IF value = true THEN
        clauseSatisfied <- true
        BREAK
      END IF
    END FOR
    IF NOT clauseSatisfied THEN
      RETURN false  // this clause is not satisfied
    END IF
  END FOR

  RETURN true  // all clauses satisfied
END

ALGORITHM BruteForce3SAT(formula, variables)
INPUT: formula: a 3-CNF Boolean formula
       variables: list of n Boolean variables
OUTPUT: a satisfying assignment or "UNSATISFIABLE"
BEGIN
  // Try all 2^n possible assignments
  // This is exponential, which is why SAT is hard

  FOR EACH assignment IN ALL_ASSIGNMENTS(variables) DO
    IF VerifySAT(formula, assignment) THEN
      RETURN assignment
    END IF
  END FOR

  RETURN "UNSATISFIABLE"
END

ALGORITHM ReduceVertexCoverToSAT(graph, k)
INPUT: graph: an undirected graph (V, E)
       k: size of desired vertex cover
OUTPUT: a SAT formula that is satisfiable iff graph has a vertex cover of size k
BEGIN
  // This polynomial-time reduction proves Vertex Cover is NP-hard
  // (since SAT is NP-complete)

  // Create variables: x_v for each vertex v (true = v is in the cover)
  variables <- {x_v : v IN V}

  clauses <- empty list

  // For each edge (u, v): at least one endpoint must be in the cover
  FOR EACH edge (u, v) IN E DO
    APPEND clause (x_u OR x_v) TO clauses
  END FOR

  // At most k vertices in the cover (encode cardinality constraint)
  // Use standard encoding: at-most-k constraint on x variables
  APPEND AtMostK_Encoding(variables, k) TO clauses

  RETURN (variables, clauses)
END

ALGORITHM ClassifyProblem(problem)
INPUT: problem: a computational problem
OUTPUT: its likely complexity class
BEGIN
  IF EXISTS polynomial-time algorithm FOR problem THEN
    RETURN "P"
  END IF

  IF solutions ARE verifiable in polynomial time THEN
    IF problem REDUCES_FROM known NP-complete problem THEN
      RETURN "NP-complete"
    ELSE
      RETURN "NP (possibly NP-intermediate)"
    END IF
  ELSE
    IF problem REDUCES_FROM known NP-complete problem THEN
      RETURN "NP-hard (not in NP)"
    ELSE
      RETURN "Unknown / needs further analysis"
    END IF
  END IF
END

Real-World Applications

  • Recognizing NP-hardness saves engineering time: when a problem is NP-hard, you stop searching for a perfect polynomial algorithm and switch to approximation algorithms, heuristics (simulated annealing, genetic algorithms), or special-case solvers
  • SAT solvers (MiniSat, Z3, CaDiCaL) are the practical workhorses for NP-complete problems: despite worst-case exponential time, modern solvers handle industrial instances with millions of variables using clever heuristics like CDCL (conflict-driven clause learning)
  • Cryptography relies on the assumed hardness of NP-intermediate problems: RSA depends on factoring being hard, Diffie-Hellman depends on discrete logarithm being hard, and lattice-based post-quantum crypto depends on shortest vector problems being hard
  • Compiler optimization problems (register allocation, instruction scheduling) are NP-complete, so compilers use heuristic graph coloring and greedy algorithms that work well in practice
  • Network routing, VLSI chip design, airline crew scheduling, and protein folding all involve NP-hard optimization problems solved via approximation algorithms with provable guarantees
  • The P vs NP question has a $1 million Millennium Prize: resolving it would either break all public-key cryptography (if P = NP) or confirm that certain problems are fundamentally intractable (if P != NP)

Key Takeaways

  • P contains problems solvable in polynomial time; NP contains problems whose solutions are verifiable in polynomial time; every problem in P is also in NP
  • NP-complete problems are the hardest in NP: they are in NP and every NP problem reduces to them; solving any one in polynomial time would prove P = NP
  • NP-hard problems are at least as hard as NP-complete but may not be in NP themselves (e.g., the halting problem is NP-hard but undecidable)
  • NP-complete = NP intersected with NP-hard, the sweet spot of problems that are both verifiable and maximally hard within NP
  • If P != NP (widely believed), NP-intermediate problems exist between P and NP-complete, with integer factoring and graph isomorphism as leading candidates
  • In practice, NP-hardness is a signal to use approximation algorithms, heuristics, or SAT/SMT solvers rather than searching for an exact polynomial-time solution