Notes/Kolmogorov Complexity: Randomness as Incompressibility

Kolmogorov Complexity: Randomness as Incompressibility

Understanding Kolmogorov complexity, the shortest program that produces a string, Berry's paradox, randomness defined as incompressibility, and connections to Gödel and Turing.

2025-07-26AI-Synthesized from Personal Notes

Pure MathKolmogorovIncompressibilityInformation Theory

Terminology

Term	Definition
Kolmogorov complexity $K(x)$	The length of the shortest program (on a fixed universal Turing machine) that outputs string $x$ and halts
Incompressible string	A string $x$ where $K(x) \geq \|x\|$; no program shorter than $x$ itself can produce it
Algorithmic randomness	A string is algorithmically random if it is incompressible: it has no shorter description than itself
Universal Turing machine	A Turing machine that can simulate any other Turing machine; serves as the reference computer for defining $K(x)$
Berry's paradox	"The smallest number not definable in fewer than twenty words" is defined in fewer than twenty words, a self-referential contradiction
Invariance theorem	$K(x)$ is the same (up to an additive constant) regardless of which universal Turing machine is used as the reference
Shannon entropy	The average information content per symbol in a probabilistic source; $H = -\sum p_i \log_2 p_i$
Prefix-free code	A code where no codeword is a prefix of another; used in the prefix-free variant of Kolmogorov complexity

What & Why

Kolmogorov complexity provides a rigorous, individual definition of randomness. Instead of asking "was this string produced by a random process?" (a question about the source), it asks "can this string be compressed?" (a question about the string itself). A string is random if and only if it is incompressible: no program shorter than the string can produce it.

Independently discovered by Ray Solomonoff (1960), Andrey Kolmogorov (1965), and Gregory Chaitin (1966), this concept unifies information theory, computability theory, and the foundations of probability. It connects to Gödel's incompleteness (you cannot prove most strings are incompressible) and to Turing's halting problem ($K(x)$ is uncomputable).

For computer scientists, Kolmogorov complexity provides the theoretical foundation for data compression, explains why some data compresses well and other data does not, gives an alternative proof of the halting problem's undecidability, and provides a tool for proving lower bounds in computational complexity.

How It Works

Definition

Fix a universal Turing machine $U$. The Kolmogorov complexity of a string $x$ is:

$K(x) = \min\{|p| : U(p) = x\}$

where $|p|$ is the length of program $p$ in bits, and $U(p) = x$ means $U$ outputs $x$ when given $p$ as input.

Examples

The string "000...0" (a billion zeros) has low Kolmogorov complexity: a short program can produce it ("print '0' one billion times"). But a string of a billion truly random bits has high Kolmogorov complexity: no program significantly shorter than the string itself can produce it.

Incompressibility and Counting

Most strings are incompressible. For strings of length $n$, there are $2^n$ possible strings but only $2^0 + 2^1 + \cdots + 2^{n-1} = 2^n - 1$ programs shorter than $n$ bits. So at least one string of length $n$ has $K(x) \geq n$. In fact, at least $2^n - 2^{n-c} + 1$ strings satisfy $K(x) \geq n - c$. The vast majority of strings are nearly incompressible.

Uncomputability

$K(x)$ is uncomputable. The proof uses Berry's paradox. Suppose a program $M$ computes $K(x)$. Then consider: "find the first string $x$ with $K(x) > n$ and output it." This program has length roughly $\log n + c$ bits, but it outputs a string with $K(x) > n$. For large enough $n$, $\log n + c < n$, so the program is shorter than $K(x)$, contradicting the definition. Therefore $M$ cannot exist.

Connection to Gödel and Turing

Chaitin showed that for any formal system $F$ with complexity $K(F) = c$, the system cannot prove "$K(x) > c + O(1)$" for any specific string $x$. This is an information-theoretic version of Gödel's incompleteness: a formal system cannot establish facts about complexity that exceed its own complexity.

The uncomputability of $K(x)$ also implies the halting problem is undecidable: if you could decide halting, you could compute $K(x)$ by enumerating all programs in order of length and running each to see if it outputs $x$.

Complexity Analysis

Property	Bound	Notes
Upper bound	$K(x) \leq \|x\| + c$	A program can always just contain $x$ literally
Patterned strings	$K(0^n) \leq \log n + c$	Only need to encode the length $n$
Random strings	$K(x) \geq \|x\|$ for most $x$	At least half of all $n$-bit strings have $K(x) \geq n - 1$
Subadditivity	$K(xy) \leq K(x) + K(y) + c$	Concatenation cost is at most the sum plus a constant
Invariance	$\|K_U(x) - K_V(x)\| \leq c_{UV}$	Different universal machines differ by at most a constant

The relationship between Kolmogorov complexity and Shannon entropy: for a source producing strings of length $n$ with distribution $P$:

$\mathbb{E}[K(x)] \approx n \cdot H(P)$

where $H(P) = -\sum p_i \log_2 p_i$ is the Shannon entropy per symbol. Kolmogorov complexity is the individual-string analog of Shannon entropy.

Implementation

ALGORITHM ApproximateKolmogorov(x, maxProgramLength)
INPUT: x: target string, maxProgramLength: search budget
OUTPUT: upper bound on K(x)
BEGIN
  // We cannot compute K(x) exactly, but we can find upper bounds
  // by searching for short programs that output x

  bestLength <- LENGTH(x)  // trivial upper bound: literal encoding

  FOR programLength FROM 1 TO maxProgramLength DO
    FOR EACH program p OF LENGTH programLength DO
      output <- SIMULATE(p, TIME_LIMIT)
      IF output = x THEN
        bestLength <- MIN(bestLength, programLength)
        RETURN bestLength  // found a shorter description
      END IF
    END FOR
  END FOR

  RETURN bestLength
  // Note: this is only an upper bound; the true K(x) may be lower
  // (a longer program we did not try might output x)
  // We can NEVER compute a lower bound on K(x) in general
END

ALGORITHM IncompressibilityArgument(n)
// Proves most strings of length n are incompressible
INPUT: n: string length
OUTPUT: count of incompressible strings
BEGIN
  totalStrings <- 2^n
  shortPrograms <- 0

  // Count programs shorter than n bits
  FOR length FROM 0 TO n - 1 DO
    shortPrograms <- shortPrograms + 2^length
  END FOR
  // shortPrograms = 2^n - 1

  incompressible <- totalStrings - shortPrograms
  // incompressible >= 1

  // More precisely: strings with K(x) >= n - c
  // At least 2^n - 2^(n-c) + 1 such strings exist
  // For c = 1: at least 2^(n-1) + 1 strings have K(x) >= n - 1
  // So at least HALF of all strings are nearly incompressible

  RETURN incompressible
END

ALGORITHM NormalizedCompressionDistance(x, y, compressor)
INPUT: x, y: two strings, compressor: a practical compression algorithm
OUTPUT: NCD(x,y) approximating normalized information distance
BEGIN
  // NCD approximates the theoretical normalized information distance
  // which is based on Kolmogorov complexity

  Cx <- LENGTH(compressor(x))
  Cy <- LENGTH(compressor(y))
  Cxy <- LENGTH(compressor(CONCATENATE(x, y)))

  NCD <- (Cxy - MIN(Cx, Cy)) / MAX(Cx, Cy)
  RETURN NCD
  // NCD near 0: x and y are similar
  // NCD near 1: x and y are very different
END

Real-World Applications

Data compression algorithms (gzip, zstd, brotli) approximate Kolmogorov complexity: they find short descriptions of data by exploiting patterns, but can never achieve the theoretical minimum $K(x)$
The Normalized Compression Distance (NCD) uses practical compressors to approximate Kolmogorov-based similarity, applied to plagiarism detection, malware classification, and genomic sequence comparison
Kolmogorov complexity provides alternative proofs of many results in information theory, combinatorics, and computational complexity, often simpler than probabilistic arguments
In machine learning, minimum description length (MDL) uses Kolmogorov-inspired principles for model selection: prefer the model that gives the shortest total description of the model plus the data given the model
Randomness testing: while $K(x)$ is uncomputable, practical tests for randomness (NIST suite, Diehard tests) approximate the idea by checking for compressible patterns
Chaitin's incompleteness theorem shows that formal systems have a "complexity horizon" beyond which they cannot prove incompressibility, connecting information theory to the foundations of mathematics

Key Takeaways

Kolmogorov complexity $K(x)$ is the length of the shortest program that outputs $x$; it measures the intrinsic information content of an individual string
A string is algorithmically random if and only if it is incompressible: $K(x) \geq |x|$; most strings are random by this definition
$K(x)$ is uncomputable, proved via Berry's paradox: a program that computes $K(x)$ leads to a self-referential contradiction
The invariance theorem ensures $K(x)$ is well-defined up to a constant, regardless of which universal Turing machine is used
Chaitin's incompleteness theorem is an information-theoretic Gödel: no formal system can prove $K(x) > c$ for strings beyond its own complexity
Practical applications include data compression, similarity metrics (NCD), model selection (MDL), and randomness testing

2025-07-27

Arrow's Impossibility Theorem: Why No Perfect Voting System Exists

Understanding Arrow's impossibility theorem, the five fairness axioms for voting systems, why democracy is mathematically hard, and what this means for real-world elections.

2025-07-28

Shannon's Source Coding Theorem: The Fundamental Limit of Compression

Understanding Shannon's source coding theorem, entropy as the absolute limit of lossless compression, optimal code lengths, and Huffman coding as the practical realization.

2025-07-29

P, NP, NP-Hard, and NP-Complete: The Complexity Zoo

Understanding the major complexity classes, what makes a problem polynomial or non-polynomial, the P vs NP question, NP-hardness, NP-completeness, and how they all relate.