Neuroevolution: When Evolution Builds the Brain
Evolving neural network weights and topologies with NEAT, and why evolutionary strategies sometimes beat gradient descent.
Terminology
| Term | Definition |
|---|---|
| Neuroevolution | The use of evolutionary algorithms to optimize neural network weights, architectures, or both |
| NEAT | NeuroEvolution of Augmenting Topologies: an algorithm that evolves both the weights and the structure of neural networks, starting from minimal networks |
| Topology | The graph structure of a neural network: which nodes exist and how they are connected |
| Innovation Number | A global counter in NEAT that assigns a unique ID to each new structural mutation, enabling meaningful crossover between networks of different topologies |
| Speciation | Grouping individuals into species based on structural similarity, protecting novel topologies from being outcompeted before they can be optimized |
| Evolution Strategy (ES) | A family of evolutionary algorithms that use Gaussian perturbations of parameters and selection to optimize, without crossover |
| Coevolution | Simultaneous evolution of multiple populations that interact, such as predators and prey or competing game agents |
| Fitness Landscape | A mapping from genotype space to fitness values, visualized as a surface where peaks represent high-fitness solutions |
| Complexification | The NEAT principle of starting with minimal networks and gradually adding complexity through structural mutations |
What & Why
Standard neural network training uses backpropagation: compute gradients of a loss function and nudge weights downhill. This works brilliantly when you have a differentiable loss, labeled data, and a fixed architecture. But what if you do not have any of those?
Neuroevolution sidesteps all three requirements. Instead of computing gradients, it treats the neural network's parameters (and optionally its structure) as a genome and evolves them using selection and mutation. Fitness is measured by how well the network performs at a task, not by a differentiable loss.
This approach has several advantages:
- It works with non-differentiable reward signals (game scores, physical simulations, survival time).
- It can evolve the network architecture alongside the weights, discovering novel topologies.
- It is embarrassingly parallel: each individual can be evaluated independently on a separate CPU.
- It avoids local minima traps that plague gradient descent in deceptive landscapes.
NEAT (NeuroEvolution of Augmenting Topologies), introduced by Kenneth Stanley in 2002, is the most influential neuroevolution algorithm. It solved three key problems: how to cross over networks with different structures, how to protect structural innovations from premature elimination, and how to minimize the search space by starting simple.
How It Works
Weight Evolution (Simple Case)
The simplest form of neuroevolution fixes the network architecture and evolves only the weight vector \mathbf{w} \in \mathbb{R}^d. An evolution strategy approach:
- Start with a parameter vector
\mathbf{w}. - Generate
Pperturbations:\mathbf{w}_i = \mathbf{w} + \sigma \cdot \boldsymbol{\epsilon}_iwhere\boldsymbol{\epsilon}_i \sim \mathcal{N}(0, I). - Evaluate fitness
F(\mathbf{w}_i)for each perturbation. - Update:
\mathbf{w} \leftarrow \mathbf{w} + \alpha \cdot \frac{1}{P\sigma} \sum_{i=1}^{P} F(\mathbf{w}_i) \cdot \boldsymbol{\epsilon}_i.
This is essentially a finite-difference gradient estimate, but it scales to millions of parameters when parallelized across hundreds of CPUs.
NEAT: Evolving Topology
NEAT starts every network as a minimal graph (inputs directly connected to outputs, no hidden nodes) and grows complexity through two structural mutations:
Add Connection: Pick two unconnected nodes, add an edge with a random weight.
Add Node: Pick an existing connection, split it by inserting a new hidden node. The old connection is disabled, and two new connections are created (one into the new node, one out).
Innovation Numbers and Crossover
Each new structural mutation gets a globally unique innovation number. When crossing over two networks, NEAT aligns genes by innovation number (like aligning DNA sequences). Matching genes are inherited randomly from either parent. Disjoint and excess genes come from the fitter parent. This solves the "competing conventions" problem that plagued earlier topology-evolving approaches.
Speciation
NEAT groups individuals into species based on a compatibility distance that measures structural and weight differences. Fitness sharing within species prevents a single dominant topology from taking over the population, giving novel structures time to optimize their weights before competing globally.
Complexity Analysis
| Operation | Time | Notes |
|---|---|---|
| ES weight update (per generation) | $O(P \cdot C_f + P \cdot d)$ | $P$ perturbations, $d$ parameters, $C_f$ = eval cost |
| NEAT fitness evaluation | $O(P \cdot C_f)$ | Parallelizable across $P$ workers |
| NEAT crossover (per pair) | $O(G_{\max})$ | $G_{\max}$ = max genome length (connections) |
| Speciation (per generation) | $O(P \cdot S)$ | $S$ = number of species, compare each individual to species representatives |
| Network forward pass | $O(E)$ | $E$ = number of enabled connections in the genome |
The key advantage of ES over backpropagation is parallelism. With $P$ workers, wall-clock time per generation is:
OpenAI showed in 2017 that ES with 1,440 CPUs could match the performance of A3C (a gradient-based RL algorithm) on Atari games, with linear speedup in the number of workers.
Implementation
ALGORITHM EvolutionStrategy(network, popSize, sigma, alpha, maxGen, evalFn)
INPUT: network with parameter vector w of dimension d,
popSize: integer, sigma: noise scale, alpha: learning rate,
maxGen: integer, evalFn: fitness function
OUTPUT: optimized parameter vector w
BEGIN
w <- InitializeRandomWeights(d)
FOR gen FROM 1 TO maxGen DO
epsilons <- ARRAY of popSize vectors, each sampled from N(0, I) of dimension d
fitnesses <- ARRAY of popSize floats
FOR i FROM 0 TO popSize - 1 DO // parallelizable
wPerturbed <- w + sigma * epsilons[i]
fitnesses[i] <- evalFn(wPerturbed)
END FOR
// Rank-normalize fitnesses for stability
ranks <- RankNormalize(fitnesses)
// Weighted update
gradient <- (0, 0, ..., 0) of dimension d
FOR i FROM 0 TO popSize - 1 DO
gradient <- gradient + ranks[i] * epsilons[i]
END FOR
gradient <- gradient / (popSize * sigma)
w <- w + alpha * gradient
END FOR
RETURN w
END
ALGORITHM NEAT_Evolve(popSize, maxGen, evalFn)
INPUT: popSize: integer, maxGen: integer, evalFn: fitness function
OUTPUT: best genome found
BEGIN
innovationCounter <- 0
population <- InitMinimalGenomes(popSize, numInputs, numOutputs, innovationCounter)
species <- AssignSpecies(population)
FOR gen FROM 1 TO maxGen DO
// Evaluate
FOR EACH genome IN population DO
genome.fitness <- evalFn(BuildNetwork(genome))
END FOR
bestGenome <- genome with highest fitness in population
// Reproduce within species
nextPop <- EMPTY LIST
FOR EACH sp IN species DO
sp.adjustedFitness <- SUM(g.fitness for g in sp) / LENGTH(sp)
numOffspring <- ROUND(sp.adjustedFitness / totalAdjustedFitness * popSize)
SORT sp BY fitness DESCENDING
APPEND COPY of sp[0] TO nextPop // elitism per species
FOR j FROM 1 TO numOffspring - 1 DO
parent1 <- TournamentSelect(sp)
parent2 <- TournamentSelect(sp)
child <- CrossoverByInnovation(parent1, parent2)
child <- MutateWeights(child, weightMutRate)
child <- MaybeAddConnection(child, addConnRate, innovationCounter)
child <- MaybeAddNode(child, addNodeRate, innovationCounter)
APPEND child TO nextPop
END FOR
END FOR
population <- nextPop
species <- AssignSpecies(population)
END FOR
RETURN bestGenome
END
Real-World Applications
- Game playing agents: NEAT evolved controllers for pole balancing, car racing (TORCS), and real-time strategy games where reward signals are sparse and non-differentiable
- Robotics: Evolved neural controllers for legged locomotion, where the morphology and brain co-evolve to produce novel gaits
- Reinforcement learning at scale: OpenAI's ES approach trained Atari and MuJoCo agents competitively with gradient-based RL, using only fitness signals and massive parallelism
- Neural architecture search (NAS): Evolutionary methods discover novel network architectures (AmoebaNet) that rival hand-designed and RL-discovered architectures on ImageNet
- Content generation: Picbreeder and similar systems use interactive neuroevolution to evolve images, music, and 3D shapes guided by human aesthetic preferences
- Coevolutionary arms races: Predator-prey simulations where both populations evolve simultaneously, producing increasingly sophisticated strategies
Key Takeaways
- Neuroevolution optimizes neural networks through evolution rather than gradients, enabling optimization of non-differentiable objectives and network architectures
- NEAT solves topology evolution through innovation numbers (enabling crossover), speciation (protecting novelty), and complexification (starting minimal)
- Evolution strategies estimate gradients via finite differences: $\nabla_w F \approx \frac{1}{P\sigma} \sum F(\mathbf{w} + \sigma\boldsymbol{\epsilon}_i) \cdot \boldsymbol{\epsilon}_i$
- ES scales linearly with workers, making it competitive with gradient-based RL when enough CPUs are available
- Neuroevolution shines in sparse-reward environments, architecture search, and co-evolutionary settings where backpropagation struggles