Kolmogorov Complexity: Randomness as Incompressibility
Understanding Kolmogorov complexity, the shortest program that produces a string, Berry's paradox, randomness defined as incompressibility, and connections to Gödel and Turing.
Terminology
| Term | Definition |
|---|---|
| Kolmogorov complexity $K(x)$ | The length of the shortest program (on a fixed universal Turing machine) that outputs string $x$ and halts |
| Incompressible string | A string $x$ where $K(x) \geq |x|$; no program shorter than $x$ itself can produce it |
| Algorithmic randomness | A string is algorithmically random if it is incompressible: it has no shorter description than itself |
| Universal Turing machine | A Turing machine that can simulate any other Turing machine; serves as the reference computer for defining $K(x)$ |
| Berry's paradox | "The smallest number not definable in fewer than twenty words" is defined in fewer than twenty words, a self-referential contradiction |
| Invariance theorem | $K(x)$ is the same (up to an additive constant) regardless of which universal Turing machine is used as the reference |
| Shannon entropy | The average information content per symbol in a probabilistic source; $H = -\sum p_i \log_2 p_i$ |
| Prefix-free code | A code where no codeword is a prefix of another; used in the prefix-free variant of Kolmogorov complexity |
What & Why
Kolmogorov complexity provides a rigorous, individual definition of randomness. Instead of asking "was this string produced by a random process?" (a question about the source), it asks "can this string be compressed?" (a question about the string itself). A string is random if and only if it is incompressible: no program shorter than the string can produce it.
Independently discovered by Ray Solomonoff (1960), Andrey Kolmogorov (1965), and Gregory Chaitin (1966), this concept unifies information theory, computability theory, and the foundations of probability. It connects to Gödel's incompleteness (you cannot prove most strings are incompressible) and to Turing's halting problem (K(x) is uncomputable).
For computer scientists, Kolmogorov complexity provides the theoretical foundation for data compression, explains why some data compresses well and other data does not, gives an alternative proof of the halting problem's undecidability, and provides a tool for proving lower bounds in computational complexity.
How It Works
Definition
Fix a universal Turing machine U. The Kolmogorov complexity of a string x is:
where |p| is the length of program p in bits, and U(p) = x means U outputs x when given p as input.
Examples
The string "000...0" (a billion zeros) has low Kolmogorov complexity: a short program can produce it ("print '0' one billion times"). But a string of a billion truly random bits has high Kolmogorov complexity: no program significantly shorter than the string itself can produce it.
Incompressibility and Counting
Most strings are incompressible. For strings of length n, there are 2^n possible strings but only 2^0 + 2^1 + \cdots + 2^{n-1} = 2^n - 1 programs shorter than n bits. So at least one string of length n has K(x) \geq n. In fact, at least 2^n - 2^{n-c} + 1 strings satisfy K(x) \geq n - c. The vast majority of strings are nearly incompressible.
Uncomputability
K(x) is uncomputable. The proof uses Berry's paradox. Suppose a program M computes K(x). Then consider: "find the first string x with K(x) > n and output it." This program has length roughly \log n + c bits, but it outputs a string with K(x) > n. For large enough n, \log n + c < n, so the program is shorter than K(x), contradicting the definition. Therefore M cannot exist.
Connection to Gödel and Turing
Chaitin showed that for any formal system F with complexity K(F) = c, the system cannot prove "K(x) > c + O(1)" for any specific string x. This is an information-theoretic version of Gödel's incompleteness: a formal system cannot establish facts about complexity that exceed its own complexity.
The uncomputability of K(x) also implies the halting problem is undecidable: if you could decide halting, you could compute K(x) by enumerating all programs in order of length and running each to see if it outputs x.
Complexity Analysis
| Property | Bound | Notes |
|---|---|---|
| Upper bound | $K(x) \leq |x| + c$ | A program can always just contain $x$ literally |
| Patterned strings | $K(0^n) \leq \log n + c$ | Only need to encode the length $n$ |
| Random strings | $K(x) \geq |x|$ for most $x$ | At least half of all $n$-bit strings have $K(x) \geq n - 1$ |
| Subadditivity | $K(xy) \leq K(x) + K(y) + c$ | Concatenation cost is at most the sum plus a constant |
| Invariance | $|K_U(x) - K_V(x)| \leq c_{UV}$ | Different universal machines differ by at most a constant |
The relationship between Kolmogorov complexity and Shannon entropy: for a source producing strings of length n with distribution P:
where H(P) = -\sum p_i \log_2 p_i is the Shannon entropy per symbol. Kolmogorov complexity is the individual-string analog of Shannon entropy.
Implementation
ALGORITHM ApproximateKolmogorov(x, maxProgramLength)
INPUT: x: target string, maxProgramLength: search budget
OUTPUT: upper bound on K(x)
BEGIN
// We cannot compute K(x) exactly, but we can find upper bounds
// by searching for short programs that output x
bestLength <- LENGTH(x) // trivial upper bound: literal encoding
FOR programLength FROM 1 TO maxProgramLength DO
FOR EACH program p OF LENGTH programLength DO
output <- SIMULATE(p, TIME_LIMIT)
IF output = x THEN
bestLength <- MIN(bestLength, programLength)
RETURN bestLength // found a shorter description
END IF
END FOR
END FOR
RETURN bestLength
// Note: this is only an upper bound; the true K(x) may be lower
// (a longer program we did not try might output x)
// We can NEVER compute a lower bound on K(x) in general
END
ALGORITHM IncompressibilityArgument(n)
// Proves most strings of length n are incompressible
INPUT: n: string length
OUTPUT: count of incompressible strings
BEGIN
totalStrings <- 2^n
shortPrograms <- 0
// Count programs shorter than n bits
FOR length FROM 0 TO n - 1 DO
shortPrograms <- shortPrograms + 2^length
END FOR
// shortPrograms = 2^n - 1
incompressible <- totalStrings - shortPrograms
// incompressible >= 1
// More precisely: strings with K(x) >= n - c
// At least 2^n - 2^(n-c) + 1 such strings exist
// For c = 1: at least 2^(n-1) + 1 strings have K(x) >= n - 1
// So at least HALF of all strings are nearly incompressible
RETURN incompressible
END
ALGORITHM NormalizedCompressionDistance(x, y, compressor)
INPUT: x, y: two strings, compressor: a practical compression algorithm
OUTPUT: NCD(x,y) approximating normalized information distance
BEGIN
// NCD approximates the theoretical normalized information distance
// which is based on Kolmogorov complexity
Cx <- LENGTH(compressor(x))
Cy <- LENGTH(compressor(y))
Cxy <- LENGTH(compressor(CONCATENATE(x, y)))
NCD <- (Cxy - MIN(Cx, Cy)) / MAX(Cx, Cy)
RETURN NCD
// NCD near 0: x and y are similar
// NCD near 1: x and y are very different
END
Real-World Applications
- Data compression algorithms (gzip, zstd, brotli) approximate Kolmogorov complexity: they find short descriptions of data by exploiting patterns, but can never achieve the theoretical minimum $K(x)$
- The Normalized Compression Distance (NCD) uses practical compressors to approximate Kolmogorov-based similarity, applied to plagiarism detection, malware classification, and genomic sequence comparison
- Kolmogorov complexity provides alternative proofs of many results in information theory, combinatorics, and computational complexity, often simpler than probabilistic arguments
- In machine learning, minimum description length (MDL) uses Kolmogorov-inspired principles for model selection: prefer the model that gives the shortest total description of the model plus the data given the model
- Randomness testing: while $K(x)$ is uncomputable, practical tests for randomness (NIST suite, Diehard tests) approximate the idea by checking for compressible patterns
- Chaitin's incompleteness theorem shows that formal systems have a "complexity horizon" beyond which they cannot prove incompressibility, connecting information theory to the foundations of mathematics
Key Takeaways
- Kolmogorov complexity $K(x)$ is the length of the shortest program that outputs $x$; it measures the intrinsic information content of an individual string
- A string is algorithmically random if and only if it is incompressible: $K(x) \geq |x|$; most strings are random by this definition
- $K(x)$ is uncomputable, proved via Berry's paradox: a program that computes $K(x)$ leads to a self-referential contradiction
- The invariance theorem ensures $K(x)$ is well-defined up to a constant, regardless of which universal Turing machine is used
- Chaitin's incompleteness theorem is an information-theoretic Gödel: no formal system can prove $K(x) > c$ for strings beyond its own complexity
- Practical applications include data compression, similarity metrics (NCD), model selection (MDL), and randomness testing