Back to Blog

Artificial Chemistry and Self-Replication: From Von Neumann to the Origin of Life

How computational models of chemistry explore self-reproducing automata, autocatalytic sets, and the fundamental question of how life bootstraps itself.

2025-09-22
Share
Artificial Lifeartificial-chemistryself-replicationvon-neumann

Terminology

Term Definition
Artificial Chemistry (AChem) A computational framework consisting of molecules, reaction rules, and a reactor algorithm that models chemical-like interactions in silico
Self-Replication The ability of a structure to produce a copy of itself, including the instructions for copying
Von Neumann's Universal Constructor A theoretical self-reproducing automaton that contains both a construction mechanism and a description (blueprint) of itself, predating the discovery of DNA's role
Autocatalytic Set A set of molecules where every molecule's formation is catalyzed by at least one other molecule in the set, creating a self-sustaining reaction network
Metabolism A network of chemical reactions that converts raw materials (food) into energy and building blocks, maintaining the organism far from thermodynamic equilibrium
Quine A program that outputs its own source code, the computational analog of self-replication
Reactor Algorithm The rule governing how molecules are selected for interaction in an artificial chemistry (e.g., well-stirred reactor picks random pairs)
Catalysis A process where a molecule (catalyst) speeds up a reaction without being consumed, enabling reaction networks that would otherwise be too slow
Autopoiesis A system that continuously produces and maintains itself, creating a boundary between self and environment (proposed by Maturana and Varela as a definition of life)

What & Why

How does life start from non-life? This is arguably the deepest open question in science. Artificial chemistry attacks it computationally: define a set of abstract molecules, specify how they react, and simulate the system to see if self-replicating, self-maintaining structures emerge spontaneously.

The intellectual lineage begins with John von Neumann in the 1940s. Before anyone knew that DNA was the molecule of heredity, von Neumann proved that a self-reproducing automaton must contain two components: a universal constructor (the machinery that builds things) and a description (the blueprint of what to build). The constructor reads the description, builds a copy of itself plus the description, and attaches the copied description to the new constructor. This is exactly how biological cells work: ribosomes (constructor) read DNA (description) to build proteins, and DNA polymerase copies the DNA for the daughter cell.

Artificial chemistry matters because:

  • It provides a formal framework for studying the origin of life without needing actual wet chemistry.
  • It connects to theoretical computer science (self-replicating programs, quines, fixed-point theorems).
  • It informs synthetic biology, where researchers design artificial metabolisms and self-replicating molecular systems.
  • It raises deep questions about what "life" means: is a self-replicating computer program alive?

How It Works

The Three Components of an Artificial Chemistry

Every AChem is defined by:

  1. Molecules (S): The set of possible molecular species (strings, graphs, lambda expressions, or any formal object).
  2. Reaction rules (R): How molecules interact. Given inputs a, b \in S, produce outputs c, d \in S (or nothing). Rules can be explicit (a lookup table) or algorithmic (e.g., string rewriting).
  3. Reactor algorithm (A): How molecules are selected for interaction. The simplest is the well-stirred reactor: pick two molecules uniformly at random, apply the reaction rules, replace them with the products.

Von Neumann's Self-Reproducing Automaton

Von Neumann's design operates on a 2D cellular automaton grid with 29 states per cell. The machine has three parts:

Constructor (C): Reads instructions and builds structures cell by cell on the grid.

Description (D): A tape of instructions encoding the entire machine (C + D + copier).

Copier: Duplicates the description tape and attaches it to the newly constructed machine.

Constructor (C) reads D, builds copy Description (D) blueprint of C+D copy D Offspring C' + D' (complete copy) construct C' Von Neumann's self-reproduction: C reads D, builds C'+D'

The key insight: the description serves a dual role. It is both interpreted (read by the constructor as instructions) and copied (duplicated verbatim for the offspring). DNA works the same way: it is transcribed into mRNA (interpreted) and replicated during cell division (copied).

Autocatalytic Sets

Stuart Kauffman proposed that life may have begun not with a single self-replicating molecule, but with a collectively self-sustaining network. In an autocatalytic set:

  • Every reaction is catalyzed by at least one molecule in the set.
  • Every molecule is produced by at least one reaction in the set.
  • The set is self-sustaining given a "food set" of simple molecules from the environment.

Kauffman showed that as the diversity of molecules in a primordial soup increases, the probability of an autocatalytic set forming undergoes a phase transition: below a critical diversity, no self-sustaining network exists; above it, one almost certainly does.

Tierra and Avida

Tom Ray's Tierra (1991) placed self-replicating programs in a shared memory space. Programs competed for CPU time and memory, evolved through copying errors (mutations), and produced parasites, hyperparasites, and symbiotic relationships. Avida extended this with a more controlled environment, becoming a major platform for studying digital evolution.

Complexity Analysis

Operation Time Notes
Well-stirred reactor step $O(C_r)$ $C_r$ = cost of one reaction rule evaluation
Autocatalytic set detection (RAF) $O(|R| \cdot |S|)$ $|R|$ = reactions, $|S|$ = species; polynomial algorithm by Hordijk & Steel
Kauffman phase transition N/A Occurs at $p \cdot M^2 \approx M$ where $p$ = reaction probability, $M$ = molecule types
Von Neumann constructor simulation $O(N^2)$ per CA step $N \times N$ grid, 29-state cellular automaton

For a well-stirred reactor with $M$ molecule instances running for $T$ reaction events, total cost is $O(T \cdot C_r)$. The critical question is not computational cost but whether the system produces interesting dynamics (self-replication, metabolism, evolution) within a feasible number of steps.

Kauffman's phase transition threshold for autocatalytic sets:

$p \cdot M \geq 1$

where $p$ is the probability that a random molecule catalyzes a random reaction and $M$ is the number of molecular species. Above this threshold, autocatalytic sets appear with high probability.

Implementation

ALGORITHM WellStirredReactor(molecules, rules, maxSteps)
INPUT: molecules: multiset of molecular species,
       rules: function (a, b) -> (c, d) or NULL,
       maxSteps: integer
OUTPUT: final multiset of molecules

BEGIN
  FOR step FROM 1 TO maxSteps DO
    a <- REMOVE random molecule from molecules
    b <- REMOVE random molecule from molecules

    products <- rules(a, b)

    IF products != NULL THEN
      (c, d) <- products
      ADD c TO molecules
      ADD d TO molecules
    ELSE
      // No reaction: put them back
      ADD a TO molecules
      ADD b TO molecules
    END IF
  END FOR

  RETURN molecules
END
ALGORITHM DetectRAF(species, reactions, foodSet, catalysts)
INPUT: species: set of molecule types,
       reactions: list of {inputs, outputs, catalyzedBy},
       foodSet: set of molecule types available from environment,
       catalysts: map of reaction -> set of catalyzing species
OUTPUT: rafSet: the maximal reflexively autocatalytic food-generated set, or EMPTY

BEGIN
  // Start with all reactions and species
  activeReactions <- COPY of reactions
  activeSpecies <- COPY of species

  changed <- TRUE
  WHILE changed DO
    changed <- FALSE

    // Remove reactions whose catalyst is not in activeSpecies
    FOR EACH reaction IN activeReactions DO
      IF catalysts[reaction] INTERSECT activeSpecies = EMPTY THEN
        REMOVE reaction FROM activeReactions
        changed <- TRUE
      END IF
    END FOR

    // Remove species not producible from foodSet + activeReactions
    producible <- COPY of foodSet
    FOR EACH reaction IN activeReactions DO
      IF ALL inputs of reaction ARE IN producible THEN
        ADD ALL outputs of reaction TO producible
      END IF
    END FOR
    // Repeat until stable
    REPEAT producible expansion until no new species added

    FOR EACH s IN activeSpecies DO
      IF s NOT IN producible AND s NOT IN foodSet THEN
        REMOVE s FROM activeSpecies
        changed <- TRUE
      END IF
    END FOR
  END WHILE

  IF LENGTH(activeReactions) > 0 THEN
    RETURN activeReactions (the RAF set)
  ELSE
    RETURN EMPTY
  END IF
END

Real-World Applications

  • Origin of life research: Artificial chemistries test hypotheses about how self-replicating systems could emerge from prebiotic chemistry, complementing wet-lab experiments
  • Synthetic biology: Designing artificial metabolisms and minimal cells draws on AChem principles to engineer self-sustaining biochemical networks
  • Drug discovery: Reaction network models help predict metabolic pathways and identify drug targets in complex biochemical systems
  • Materials science: Self-assembling molecular systems (DNA origami, supramolecular chemistry) are designed using principles from artificial chemistry
  • Digital evolution platforms: Avida and similar systems study fundamental evolutionary dynamics (mutation rates, fitness landscapes, the evolution of complexity) in controlled digital environments
  • Theoretical computer science: Self-replicating programs (quines) and von Neumann's constructor connect to fixed-point theorems, Kleene's recursion theorem, and the foundations of computability

Key Takeaways

  • Artificial chemistry models life-like dynamics using abstract molecules, reaction rules, and a reactor algorithm
  • Von Neumann proved that self-reproduction requires a dual-role description (interpreted as instructions and copied verbatim), predicting DNA's mechanism decades before molecular biology confirmed it
  • Autocatalytic sets offer an alternative to "replicator-first" origin-of-life theories: collectively self-sustaining networks can emerge via a phase transition at $p \cdot M \geq 1$
  • Digital evolution platforms (Tierra, Avida) demonstrate that open-ended evolution, parasitism, and symbiosis emerge from self-replicating programs competing for resources
  • AChem connects biology, chemistry, and computer science, providing formal tools to study the most fundamental question: how does life begin?