Back to Blog

Neuroevolution: When Evolution Builds the Brain

Evolving neural network weights and topologies with NEAT, and why evolutionary strategies sometimes beat gradient descent.

2025-09-19
Share
Artificial Lifeneuroevolutionneatevolutionary-strategies

Terminology

Term Definition
Neuroevolution The use of evolutionary algorithms to optimize neural network weights, architectures, or both
NEAT NeuroEvolution of Augmenting Topologies: an algorithm that evolves both the weights and the structure of neural networks, starting from minimal networks
Topology The graph structure of a neural network: which nodes exist and how they are connected
Innovation Number A global counter in NEAT that assigns a unique ID to each new structural mutation, enabling meaningful crossover between networks of different topologies
Speciation Grouping individuals into species based on structural similarity, protecting novel topologies from being outcompeted before they can be optimized
Evolution Strategy (ES) A family of evolutionary algorithms that use Gaussian perturbations of parameters and selection to optimize, without crossover
Coevolution Simultaneous evolution of multiple populations that interact, such as predators and prey or competing game agents
Fitness Landscape A mapping from genotype space to fitness values, visualized as a surface where peaks represent high-fitness solutions
Complexification The NEAT principle of starting with minimal networks and gradually adding complexity through structural mutations

What & Why

Standard neural network training uses backpropagation: compute gradients of a loss function and nudge weights downhill. This works brilliantly when you have a differentiable loss, labeled data, and a fixed architecture. But what if you do not have any of those?

Neuroevolution sidesteps all three requirements. Instead of computing gradients, it treats the neural network's parameters (and optionally its structure) as a genome and evolves them using selection and mutation. Fitness is measured by how well the network performs at a task, not by a differentiable loss.

This approach has several advantages:

  • It works with non-differentiable reward signals (game scores, physical simulations, survival time).
  • It can evolve the network architecture alongside the weights, discovering novel topologies.
  • It is embarrassingly parallel: each individual can be evaluated independently on a separate CPU.
  • It avoids local minima traps that plague gradient descent in deceptive landscapes.

NEAT (NeuroEvolution of Augmenting Topologies), introduced by Kenneth Stanley in 2002, is the most influential neuroevolution algorithm. It solved three key problems: how to cross over networks with different structures, how to protect structural innovations from premature elimination, and how to minimize the search space by starting simple.

How It Works

Weight Evolution (Simple Case)

The simplest form of neuroevolution fixes the network architecture and evolves only the weight vector \mathbf{w} \in \mathbb{R}^d. An evolution strategy approach:

  1. Start with a parameter vector \mathbf{w}.
  2. Generate P perturbations: \mathbf{w}_i = \mathbf{w} + \sigma \cdot \boldsymbol{\epsilon}_i where \boldsymbol{\epsilon}_i \sim \mathcal{N}(0, I).
  3. Evaluate fitness F(\mathbf{w}_i) for each perturbation.
  4. Update: \mathbf{w} \leftarrow \mathbf{w} + \alpha \cdot \frac{1}{P\sigma} \sum_{i=1}^{P} F(\mathbf{w}_i) \cdot \boldsymbol{\epsilon}_i.

This is essentially a finite-difference gradient estimate, but it scales to millions of parameters when parallelized across hundreds of CPUs.

NEAT: Evolving Topology

NEAT starts every network as a minimal graph (inputs directly connected to outputs, no hidden nodes) and grows complexity through two structural mutations:

Add Connection: Pick two unconnected nodes, add an edge with a random weight.

Add Node: Pick an existing connection, split it by inserting a new hidden node. The old connection is disabled, and two new connections are created (one into the new node, one out).

Before mutation In1 In2 Out After "add node" mutation In1 In2 H1 Out Yellow node = new hidden node from structural mutation

Innovation Numbers and Crossover

Each new structural mutation gets a globally unique innovation number. When crossing over two networks, NEAT aligns genes by innovation number (like aligning DNA sequences). Matching genes are inherited randomly from either parent. Disjoint and excess genes come from the fitter parent. This solves the "competing conventions" problem that plagued earlier topology-evolving approaches.

Speciation

NEAT groups individuals into species based on a compatibility distance that measures structural and weight differences. Fitness sharing within species prevents a single dominant topology from taking over the population, giving novel structures time to optimize their weights before competing globally.

Complexity Analysis

Operation Time Notes
ES weight update (per generation) $O(P \cdot C_f + P \cdot d)$ $P$ perturbations, $d$ parameters, $C_f$ = eval cost
NEAT fitness evaluation $O(P \cdot C_f)$ Parallelizable across $P$ workers
NEAT crossover (per pair) $O(G_{\max})$ $G_{\max}$ = max genome length (connections)
Speciation (per generation) $O(P \cdot S)$ $S$ = number of species, compare each individual to species representatives
Network forward pass $O(E)$ $E$ = number of enabled connections in the genome

The key advantage of ES over backpropagation is parallelism. With $P$ workers, wall-clock time per generation is:

$T_{\text{wall}} = \frac{P \cdot C_f}{P_{\text{workers}}} + O(d)$

OpenAI showed in 2017 that ES with 1,440 CPUs could match the performance of A3C (a gradient-based RL algorithm) on Atari games, with linear speedup in the number of workers.

Implementation

ALGORITHM EvolutionStrategy(network, popSize, sigma, alpha, maxGen, evalFn)
INPUT: network with parameter vector w of dimension d,
       popSize: integer, sigma: noise scale, alpha: learning rate,
       maxGen: integer, evalFn: fitness function
OUTPUT: optimized parameter vector w

BEGIN
  w <- InitializeRandomWeights(d)

  FOR gen FROM 1 TO maxGen DO
    epsilons <- ARRAY of popSize vectors, each sampled from N(0, I) of dimension d
    fitnesses <- ARRAY of popSize floats

    FOR i FROM 0 TO popSize - 1 DO  // parallelizable
      wPerturbed <- w + sigma * epsilons[i]
      fitnesses[i] <- evalFn(wPerturbed)
    END FOR

    // Rank-normalize fitnesses for stability
    ranks <- RankNormalize(fitnesses)

    // Weighted update
    gradient <- (0, 0, ..., 0) of dimension d
    FOR i FROM 0 TO popSize - 1 DO
      gradient <- gradient + ranks[i] * epsilons[i]
    END FOR
    gradient <- gradient / (popSize * sigma)

    w <- w + alpha * gradient
  END FOR

  RETURN w
END
ALGORITHM NEAT_Evolve(popSize, maxGen, evalFn)
INPUT: popSize: integer, maxGen: integer, evalFn: fitness function
OUTPUT: best genome found

BEGIN
  innovationCounter <- 0
  population <- InitMinimalGenomes(popSize, numInputs, numOutputs, innovationCounter)
  species <- AssignSpecies(population)

  FOR gen FROM 1 TO maxGen DO
    // Evaluate
    FOR EACH genome IN population DO
      genome.fitness <- evalFn(BuildNetwork(genome))
    END FOR

    bestGenome <- genome with highest fitness in population

    // Reproduce within species
    nextPop <- EMPTY LIST
    FOR EACH sp IN species DO
      sp.adjustedFitness <- SUM(g.fitness for g in sp) / LENGTH(sp)
      numOffspring <- ROUND(sp.adjustedFitness / totalAdjustedFitness * popSize)

      SORT sp BY fitness DESCENDING
      APPEND COPY of sp[0] TO nextPop  // elitism per species

      FOR j FROM 1 TO numOffspring - 1 DO
        parent1 <- TournamentSelect(sp)
        parent2 <- TournamentSelect(sp)
        child <- CrossoverByInnovation(parent1, parent2)
        child <- MutateWeights(child, weightMutRate)
        child <- MaybeAddConnection(child, addConnRate, innovationCounter)
        child <- MaybeAddNode(child, addNodeRate, innovationCounter)
        APPEND child TO nextPop
      END FOR
    END FOR

    population <- nextPop
    species <- AssignSpecies(population)
  END FOR

  RETURN bestGenome
END

Real-World Applications

  • Game playing agents: NEAT evolved controllers for pole balancing, car racing (TORCS), and real-time strategy games where reward signals are sparse and non-differentiable
  • Robotics: Evolved neural controllers for legged locomotion, where the morphology and brain co-evolve to produce novel gaits
  • Reinforcement learning at scale: OpenAI's ES approach trained Atari and MuJoCo agents competitively with gradient-based RL, using only fitness signals and massive parallelism
  • Neural architecture search (NAS): Evolutionary methods discover novel network architectures (AmoebaNet) that rival hand-designed and RL-discovered architectures on ImageNet
  • Content generation: Picbreeder and similar systems use interactive neuroevolution to evolve images, music, and 3D shapes guided by human aesthetic preferences
  • Coevolutionary arms races: Predator-prey simulations where both populations evolve simultaneously, producing increasingly sophisticated strategies

Key Takeaways

  • Neuroevolution optimizes neural networks through evolution rather than gradients, enabling optimization of non-differentiable objectives and network architectures
  • NEAT solves topology evolution through innovation numbers (enabling crossover), speciation (protecting novelty), and complexification (starting minimal)
  • Evolution strategies estimate gradients via finite differences: $\nabla_w F \approx \frac{1}{P\sigma} \sum F(\mathbf{w} + \sigma\boldsymbol{\epsilon}_i) \cdot \boldsymbol{\epsilon}_i$
  • ES scales linearly with workers, making it competitive with gradient-based RL when enough CPUs are available
  • Neuroevolution shines in sparse-reward environments, architecture search, and co-evolutionary settings where backpropagation struggles