Back to Blog

Kolmogorov Complexity: Randomness as Incompressibility

Understanding Kolmogorov complexity, the shortest program that produces a string, Berry's paradox, randomness defined as incompressibility, and connections to Gödel and Turing.

2025-07-26
Share
Proofsproofskolmogorovincompressibilityinformation-theory

Terminology

Term Definition
Kolmogorov complexity $K(x)$ The length of the shortest program (on a fixed universal Turing machine) that outputs string $x$ and halts
Incompressible string A string $x$ where $K(x) \geq |x|$; no program shorter than $x$ itself can produce it
Algorithmic randomness A string is algorithmically random if it is incompressible: it has no shorter description than itself
Universal Turing machine A Turing machine that can simulate any other Turing machine; serves as the reference computer for defining $K(x)$
Berry's paradox "The smallest number not definable in fewer than twenty words" is defined in fewer than twenty words, a self-referential contradiction
Invariance theorem $K(x)$ is the same (up to an additive constant) regardless of which universal Turing machine is used as the reference
Shannon entropy The average information content per symbol in a probabilistic source; $H = -\sum p_i \log_2 p_i$
Prefix-free code A code where no codeword is a prefix of another; used in the prefix-free variant of Kolmogorov complexity

What & Why

Kolmogorov complexity provides a rigorous, individual definition of randomness. Instead of asking "was this string produced by a random process?" (a question about the source), it asks "can this string be compressed?" (a question about the string itself). A string is random if and only if it is incompressible: no program shorter than the string can produce it.

Independently discovered by Ray Solomonoff (1960), Andrey Kolmogorov (1965), and Gregory Chaitin (1966), this concept unifies information theory, computability theory, and the foundations of probability. It connects to Gödel's incompleteness (you cannot prove most strings are incompressible) and to Turing's halting problem (K(x) is uncomputable).

For computer scientists, Kolmogorov complexity provides the theoretical foundation for data compression, explains why some data compresses well and other data does not, gives an alternative proof of the halting problem's undecidability, and provides a tool for proving lower bounds in computational complexity.

How It Works

Definition

Fix a universal Turing machine U. The Kolmogorov complexity of a string x is:

$K(x) = \min\{|p| : U(p) = x\}$

where |p| is the length of program p in bits, and U(p) = x means U outputs x when given p as input.

Examples

The string "000...0" (a billion zeros) has low Kolmogorov complexity: a short program can produce it ("print '0' one billion times"). But a string of a billion truly random bits has high Kolmogorov complexity: no program significantly shorter than the string itself can produce it.

Low K(x): patterned string 000000000000000000... Short program: "print 0 x N" High K(x): random string 10110100011101001... No program shorter than x K(x) is UNCOMPUTABLE No algorithm can compute K(x) for all strings

Incompressibility and Counting

Most strings are incompressible. For strings of length n, there are 2^n possible strings but only 2^0 + 2^1 + \cdots + 2^{n-1} = 2^n - 1 programs shorter than n bits. So at least one string of length n has K(x) \geq n. In fact, at least 2^n - 2^{n-c} + 1 strings satisfy K(x) \geq n - c. The vast majority of strings are nearly incompressible.

Uncomputability

K(x) is uncomputable. The proof uses Berry's paradox. Suppose a program M computes K(x). Then consider: "find the first string x with K(x) > n and output it." This program has length roughly \log n + c bits, but it outputs a string with K(x) > n. For large enough n, \log n + c < n, so the program is shorter than K(x), contradicting the definition. Therefore M cannot exist.

Connection to Gödel and Turing

Chaitin showed that for any formal system F with complexity K(F) = c, the system cannot prove "K(x) > c + O(1)" for any specific string x. This is an information-theoretic version of Gödel's incompleteness: a formal system cannot establish facts about complexity that exceed its own complexity.

The uncomputability of K(x) also implies the halting problem is undecidable: if you could decide halting, you could compute K(x) by enumerating all programs in order of length and running each to see if it outputs x.

Complexity Analysis

Property Bound Notes
Upper bound $K(x) \leq |x| + c$ A program can always just contain $x$ literally
Patterned strings $K(0^n) \leq \log n + c$ Only need to encode the length $n$
Random strings $K(x) \geq |x|$ for most $x$ At least half of all $n$-bit strings have $K(x) \geq n - 1$
Subadditivity $K(xy) \leq K(x) + K(y) + c$ Concatenation cost is at most the sum plus a constant
Invariance $|K_U(x) - K_V(x)| \leq c_{UV}$ Different universal machines differ by at most a constant

The relationship between Kolmogorov complexity and Shannon entropy: for a source producing strings of length n with distribution P:

$\mathbb{E}[K(x)] \approx n \cdot H(P)$

where H(P) = -\sum p_i \log_2 p_i is the Shannon entropy per symbol. Kolmogorov complexity is the individual-string analog of Shannon entropy.

Implementation

ALGORITHM ApproximateKolmogorov(x, maxProgramLength)
INPUT: x: target string, maxProgramLength: search budget
OUTPUT: upper bound on K(x)
BEGIN
  // We cannot compute K(x) exactly, but we can find upper bounds
  // by searching for short programs that output x

  bestLength <- LENGTH(x)  // trivial upper bound: literal encoding

  FOR programLength FROM 1 TO maxProgramLength DO
    FOR EACH program p OF LENGTH programLength DO
      output <- SIMULATE(p, TIME_LIMIT)
      IF output = x THEN
        bestLength <- MIN(bestLength, programLength)
        RETURN bestLength  // found a shorter description
      END IF
    END FOR
  END FOR

  RETURN bestLength
  // Note: this is only an upper bound; the true K(x) may be lower
  // (a longer program we did not try might output x)
  // We can NEVER compute a lower bound on K(x) in general
END

ALGORITHM IncompressibilityArgument(n)
// Proves most strings of length n are incompressible
INPUT: n: string length
OUTPUT: count of incompressible strings
BEGIN
  totalStrings <- 2^n
  shortPrograms <- 0

  // Count programs shorter than n bits
  FOR length FROM 0 TO n - 1 DO
    shortPrograms <- shortPrograms + 2^length
  END FOR
  // shortPrograms = 2^n - 1

  incompressible <- totalStrings - shortPrograms
  // incompressible >= 1

  // More precisely: strings with K(x) >= n - c
  // At least 2^n - 2^(n-c) + 1 such strings exist
  // For c = 1: at least 2^(n-1) + 1 strings have K(x) >= n - 1
  // So at least HALF of all strings are nearly incompressible

  RETURN incompressible
END

ALGORITHM NormalizedCompressionDistance(x, y, compressor)
INPUT: x, y: two strings, compressor: a practical compression algorithm
OUTPUT: NCD(x,y) approximating normalized information distance
BEGIN
  // NCD approximates the theoretical normalized information distance
  // which is based on Kolmogorov complexity

  Cx <- LENGTH(compressor(x))
  Cy <- LENGTH(compressor(y))
  Cxy <- LENGTH(compressor(CONCATENATE(x, y)))

  NCD <- (Cxy - MIN(Cx, Cy)) / MAX(Cx, Cy)
  RETURN NCD
  // NCD near 0: x and y are similar
  // NCD near 1: x and y are very different
END

Real-World Applications

  • Data compression algorithms (gzip, zstd, brotli) approximate Kolmogorov complexity: they find short descriptions of data by exploiting patterns, but can never achieve the theoretical minimum $K(x)$
  • The Normalized Compression Distance (NCD) uses practical compressors to approximate Kolmogorov-based similarity, applied to plagiarism detection, malware classification, and genomic sequence comparison
  • Kolmogorov complexity provides alternative proofs of many results in information theory, combinatorics, and computational complexity, often simpler than probabilistic arguments
  • In machine learning, minimum description length (MDL) uses Kolmogorov-inspired principles for model selection: prefer the model that gives the shortest total description of the model plus the data given the model
  • Randomness testing: while $K(x)$ is uncomputable, practical tests for randomness (NIST suite, Diehard tests) approximate the idea by checking for compressible patterns
  • Chaitin's incompleteness theorem shows that formal systems have a "complexity horizon" beyond which they cannot prove incompressibility, connecting information theory to the foundations of mathematics

Key Takeaways

  • Kolmogorov complexity $K(x)$ is the length of the shortest program that outputs $x$; it measures the intrinsic information content of an individual string
  • A string is algorithmically random if and only if it is incompressible: $K(x) \geq |x|$; most strings are random by this definition
  • $K(x)$ is uncomputable, proved via Berry's paradox: a program that computes $K(x)$ leads to a self-referential contradiction
  • The invariance theorem ensures $K(x)$ is well-defined up to a constant, regardless of which universal Turing machine is used
  • Chaitin's incompleteness theorem is an information-theoretic Gödel: no formal system can prove $K(x) > c$ for strings beyond its own complexity
  • Practical applications include data compression, similarity metrics (NCD), model selection (MDL), and randomness testing