Artificial Chemistry and Self-Replication: From Von Neumann to the Origin of Life
How computational models of chemistry explore self-reproducing automata, autocatalytic sets, and the fundamental question of how life bootstraps itself.
Terminology
| Term | Definition |
|---|---|
| Artificial Chemistry (AChem) | A computational framework consisting of molecules, reaction rules, and a reactor algorithm that models chemical-like interactions in silico |
| Self-Replication | The ability of a structure to produce a copy of itself, including the instructions for copying |
| Von Neumann's Universal Constructor | A theoretical self-reproducing automaton that contains both a construction mechanism and a description (blueprint) of itself, predating the discovery of DNA's role |
| Autocatalytic Set | A set of molecules where every molecule's formation is catalyzed by at least one other molecule in the set, creating a self-sustaining reaction network |
| Metabolism | A network of chemical reactions that converts raw materials (food) into energy and building blocks, maintaining the organism far from thermodynamic equilibrium |
| Quine | A program that outputs its own source code, the computational analog of self-replication |
| Reactor Algorithm | The rule governing how molecules are selected for interaction in an artificial chemistry (e.g., well-stirred reactor picks random pairs) |
| Catalysis | A process where a molecule (catalyst) speeds up a reaction without being consumed, enabling reaction networks that would otherwise be too slow |
| Autopoiesis | A system that continuously produces and maintains itself, creating a boundary between self and environment (proposed by Maturana and Varela as a definition of life) |
What & Why
How does life start from non-life? This is arguably the deepest open question in science. Artificial chemistry attacks it computationally: define a set of abstract molecules, specify how they react, and simulate the system to see if self-replicating, self-maintaining structures emerge spontaneously.
The intellectual lineage begins with John von Neumann in the 1940s. Before anyone knew that DNA was the molecule of heredity, von Neumann proved that a self-reproducing automaton must contain two components: a universal constructor (the machinery that builds things) and a description (the blueprint of what to build). The constructor reads the description, builds a copy of itself plus the description, and attaches the copied description to the new constructor. This is exactly how biological cells work: ribosomes (constructor) read DNA (description) to build proteins, and DNA polymerase copies the DNA for the daughter cell.
Artificial chemistry matters because:
- It provides a formal framework for studying the origin of life without needing actual wet chemistry.
- It connects to theoretical computer science (self-replicating programs, quines, fixed-point theorems).
- It informs synthetic biology, where researchers design artificial metabolisms and self-replicating molecular systems.
- It raises deep questions about what "life" means: is a self-replicating computer program alive?
How It Works
The Three Components of an Artificial Chemistry
Every AChem is defined by:
- Molecules (
S): The set of possible molecular species (strings, graphs, lambda expressions, or any formal object). - Reaction rules (
R): How molecules interact. Given inputsa, b \in S, produce outputsc, d \in S(or nothing). Rules can be explicit (a lookup table) or algorithmic (e.g., string rewriting). - Reactor algorithm (
A): How molecules are selected for interaction. The simplest is the well-stirred reactor: pick two molecules uniformly at random, apply the reaction rules, replace them with the products.
Von Neumann's Self-Reproducing Automaton
Von Neumann's design operates on a 2D cellular automaton grid with 29 states per cell. The machine has three parts:
Constructor (C): Reads instructions and builds structures cell by cell on the grid.
Description (D): A tape of instructions encoding the entire machine (C + D + copier).
Copier: Duplicates the description tape and attaches it to the newly constructed machine.
The key insight: the description serves a dual role. It is both interpreted (read by the constructor as instructions) and copied (duplicated verbatim for the offspring). DNA works the same way: it is transcribed into mRNA (interpreted) and replicated during cell division (copied).
Autocatalytic Sets
Stuart Kauffman proposed that life may have begun not with a single self-replicating molecule, but with a collectively self-sustaining network. In an autocatalytic set:
- Every reaction is catalyzed by at least one molecule in the set.
- Every molecule is produced by at least one reaction in the set.
- The set is self-sustaining given a "food set" of simple molecules from the environment.
Kauffman showed that as the diversity of molecules in a primordial soup increases, the probability of an autocatalytic set forming undergoes a phase transition: below a critical diversity, no self-sustaining network exists; above it, one almost certainly does.
Tierra and Avida
Tom Ray's Tierra (1991) placed self-replicating programs in a shared memory space. Programs competed for CPU time and memory, evolved through copying errors (mutations), and produced parasites, hyperparasites, and symbiotic relationships. Avida extended this with a more controlled environment, becoming a major platform for studying digital evolution.
Complexity Analysis
| Operation | Time | Notes |
|---|---|---|
| Well-stirred reactor step | $O(C_r)$ | $C_r$ = cost of one reaction rule evaluation |
| Autocatalytic set detection (RAF) | $O(|R| \cdot |S|)$ | $|R|$ = reactions, $|S|$ = species; polynomial algorithm by Hordijk & Steel |
| Kauffman phase transition | N/A | Occurs at $p \cdot M^2 \approx M$ where $p$ = reaction probability, $M$ = molecule types |
| Von Neumann constructor simulation | $O(N^2)$ per CA step | $N \times N$ grid, 29-state cellular automaton |
For a well-stirred reactor with $M$ molecule instances running for $T$ reaction events, total cost is $O(T \cdot C_r)$. The critical question is not computational cost but whether the system produces interesting dynamics (self-replication, metabolism, evolution) within a feasible number of steps.
Kauffman's phase transition threshold for autocatalytic sets:
where $p$ is the probability that a random molecule catalyzes a random reaction and $M$ is the number of molecular species. Above this threshold, autocatalytic sets appear with high probability.
Implementation
ALGORITHM WellStirredReactor(molecules, rules, maxSteps)
INPUT: molecules: multiset of molecular species,
rules: function (a, b) -> (c, d) or NULL,
maxSteps: integer
OUTPUT: final multiset of molecules
BEGIN
FOR step FROM 1 TO maxSteps DO
a <- REMOVE random molecule from molecules
b <- REMOVE random molecule from molecules
products <- rules(a, b)
IF products != NULL THEN
(c, d) <- products
ADD c TO molecules
ADD d TO molecules
ELSE
// No reaction: put them back
ADD a TO molecules
ADD b TO molecules
END IF
END FOR
RETURN molecules
END
ALGORITHM DetectRAF(species, reactions, foodSet, catalysts)
INPUT: species: set of molecule types,
reactions: list of {inputs, outputs, catalyzedBy},
foodSet: set of molecule types available from environment,
catalysts: map of reaction -> set of catalyzing species
OUTPUT: rafSet: the maximal reflexively autocatalytic food-generated set, or EMPTY
BEGIN
// Start with all reactions and species
activeReactions <- COPY of reactions
activeSpecies <- COPY of species
changed <- TRUE
WHILE changed DO
changed <- FALSE
// Remove reactions whose catalyst is not in activeSpecies
FOR EACH reaction IN activeReactions DO
IF catalysts[reaction] INTERSECT activeSpecies = EMPTY THEN
REMOVE reaction FROM activeReactions
changed <- TRUE
END IF
END FOR
// Remove species not producible from foodSet + activeReactions
producible <- COPY of foodSet
FOR EACH reaction IN activeReactions DO
IF ALL inputs of reaction ARE IN producible THEN
ADD ALL outputs of reaction TO producible
END IF
END FOR
// Repeat until stable
REPEAT producible expansion until no new species added
FOR EACH s IN activeSpecies DO
IF s NOT IN producible AND s NOT IN foodSet THEN
REMOVE s FROM activeSpecies
changed <- TRUE
END IF
END FOR
END WHILE
IF LENGTH(activeReactions) > 0 THEN
RETURN activeReactions (the RAF set)
ELSE
RETURN EMPTY
END IF
END
Real-World Applications
- Origin of life research: Artificial chemistries test hypotheses about how self-replicating systems could emerge from prebiotic chemistry, complementing wet-lab experiments
- Synthetic biology: Designing artificial metabolisms and minimal cells draws on AChem principles to engineer self-sustaining biochemical networks
- Drug discovery: Reaction network models help predict metabolic pathways and identify drug targets in complex biochemical systems
- Materials science: Self-assembling molecular systems (DNA origami, supramolecular chemistry) are designed using principles from artificial chemistry
- Digital evolution platforms: Avida and similar systems study fundamental evolutionary dynamics (mutation rates, fitness landscapes, the evolution of complexity) in controlled digital environments
- Theoretical computer science: Self-replicating programs (quines) and von Neumann's constructor connect to fixed-point theorems, Kleene's recursion theorem, and the foundations of computability
Key Takeaways
- Artificial chemistry models life-like dynamics using abstract molecules, reaction rules, and a reactor algorithm
- Von Neumann proved that self-reproduction requires a dual-role description (interpreted as instructions and copied verbatim), predicting DNA's mechanism decades before molecular biology confirmed it
- Autocatalytic sets offer an alternative to "replicator-first" origin-of-life theories: collectively self-sustaining networks can emerge via a phase transition at $p \cdot M \geq 1$
- Digital evolution platforms (Tierra, Avida) demonstrate that open-ended evolution, parasitism, and symbiosis emerge from self-replicating programs competing for resources
- AChem connects biology, chemistry, and computer science, providing formal tools to study the most fundamental question: how does life begin?