Genetic Algorithms

A Genetic Algorithm (GA) is a population-based stochastic search method that imitates natural selection: maintain a population of candidate solutions (“chromosomes”), score each by a fitness function, then evolve the next generation through three operators — selection, crossover, and mutation. Over generations the average fitness rises and (with luck) the best individual approaches the optimum.

GAs were developed in the 1970s (Holland, 1975) and dominated optimization research for decades. Today they’re niche — but they survive in problems where gradients aren’t available and where creative recombination matters (see Where GAs are still used today).

When to reach for a GA

A GA is worth considering when:

  • The search space is huge and discrete (combinatorial)
  • The objective is non-differentiable or expensive to evaluate
  • You can score a candidate but can’t compute the gradient
  • You don’t need the optimum — a good solution is enough
  • You can encode candidate solutions as fixed-length strings (chromosomes)

If you have gradients → use gradient descent. If you have a probabilistic model → Bayesian Optimization. If linear/integer → Gurobi/CPLEX. A GA is the “no math required” default for ugly discrete problems.

Worked example: the knapsack problem

You’re packing for a mountain trip. Your backpack holds 40 L. You have 8 items, each with a volume and a value (how useful it would be on the trip):

ItemVolume (L)Value
Tent1810
Sleeping bag129
Stove46
Food (3 days)88
Water filter25
Camera34
Book12
First-aid kit27

Goal: maximize total value while keeping total volume ≤ 40 L.

Encoding as a chromosome

Represent each candidate packing list as an 8-bit binary string, one bit per item:

  • 1 = take this item
  • 0 = leave it behind

[1, 1, 1, 1, 1, 0, 0, 1] → tent + sleeping bag + stove + food + water filter + first-aid kit. Volume = 18+12+4+8+2+2 = 46 L → invalid (over 40 L).

[1, 1, 0, 1, 1, 0, 0, 1] → 18+12+8+2+2 = 42 L → still over.

[1, 1, 0, 1, 0, 0, 0, 1] → 18+12+8+2 = 40 L ✓, value = 10+9+8+7 = 34.

The full search space has 2⁸ = 256 candidates — small enough to brute-force, but the GA scales to 2⁵⁰ items where brute force fails.

Encoding the fitness function

Now any standard GA loop works on it. This is what makes GAs flexible: swap the fitness function, keep the algorithm.

The algorithm in 5 steps

1. INITIALIZE — generate a random population of N chromosomes
2. EVALUATE   — compute fitness of every chromosome
3. SELECT     — pick parents proportional to fitness
4. RECOMBINE  — apply crossover + mutation to produce children
5. REPLACE    — children become the new population
   → repeat 2–5 until termination (time / generations / plateau)

That’s it. The genius is that no part requires gradients or domain knowledge — only a way to score candidates.

The three operators

Selection — who gets to reproduce

  • Fitness-proportional (roulette wheel) — probability ∝ fitness. Simple but breaks when fitness values are very close (no selection pressure) or when one “freak” dominates.
  • Tournament selection — pick k chromosomes at random, keep the best. More robust to fitness scaling. Most common in practice.
  • Rank selection — assign probabilities by rank (not raw fitness). Avoids domination by outliers.
  • Elitism — unconditionally keep the top e chromosomes. Guarantees best fitness is monotone non-decreasing.

Crossover — combining good chunks

Takes two parents → produces two children. Variants:

  • Single-point — pick a random index, swap suffixes between parents.
  • Multi-point — multiple swap points; more mixing.
  • Uniform — for each gene independently, copy from either parent (50/50). Maximum mixing.

Crossover does the heavy lifting — it recombines good chunks from different solutions into novel combinations. But it can only mix genes that already exist in the parent pool.

Mutation — introducing genuinely new genes

For bit strings: flip each bit with probability μ.

  • Too low → population gets stuck (no new genetic material)
  • Too high → algorithm degenerates into random search
  • Typical: μ ≈ 1 / chromosome_length (i.e. expect ~1 mutation per chromosome)

Mutation is the only mechanism that can introduce a gene value not currently present anywhere in the population. Without it, evolution halts at whatever gene combinations existed at initialization (see also Q108 in Questions for Methods of AI).

Demo: 6 GA variants side-by-side on the OneMax problem

Instead of one run we sweep 6 parameter configurations and plot them on a 2×3 grid + a comparison overlay. Each curve is averaged across 5 random seeds (shaded band = ±1 std) so you see the characteristic behavior of each setting, not a single anecdotal run.

Problem: OneMax with chrom_len = 20 (count the 1s in a 20-bit string; max fitness = 20). Harder than the chrom_len=8 toy — differences between configurations become visible.

What each variant teaches

#VariantCharacteristic behaviorLesson
Baseline — pop=30, elitism=2, μ=0.05Best reaches max quickly and stays; mean plateaus 1–2 below maxHealthy GA: selection lifts mean, elitism locks the best, mutation prevents stagnation. The 1–2 gap between best and mean is the mutation–selection equilibrium.
No elitismRed curve wobbles — best can drop between generationsElitism is the only thing that guarantees monotone improvement. Without it, a great chromosome can be destroyed by one bad crossover or mutation.
μ = 0.00 (no mutation)Best plateaus wherever the initial population happened to sit; never reaches maxMutation is the only operator that can introduce a gene value not present in the initial population. Without it, evolution = pure recombination of the founder gene pool.
μ = 0.30 (too high)Best occasionally reaches max but unstable; mean stuck around ~10–12 (random-bitstring expectation)Too much mutation overwrites everything selection just built. The GA degenerates into random search.
Tiny pop = 5Curves very noisy across seeds (wide std band); often fails to convergeSmall populations have high stochasticity — selection signal drowns in noise. Roulette wheel can repeatedly pick poor chromosomes by luck.
Large pop = 200Smoothest curves; mean climbs highest; converges most reliably to maxBig populations = stable statistics + more diversity = nearly perfect convergence. Cost: 200/5 = 40× more fitness evaluations per generation.

How to read the overlay (bottom panel)

  • Variants ①, ⑥ climb fastest and stay at max — properly balanced GAs do their job.
  • Variant ⑤ (tiny pop) has a wide std band → some seeds get lucky, others don’t.
  • Variant ③ (no mutation) plateaus visibly below the max, no matter how long you run it.
  • Variant ④ (μ=0.30) has the most chaotic trajectory — too much randomness is fundamentally different from too little.

The take-away in one sentence

A good GA balances three forces: selection pressure (push toward fit), mutation (inject novelty), diversity preservation (avoid collapse). Variants ②–⑤ each break one of these and show what happens.

Why a small population fails (the classic Max-result trap)

Run with pop_size=6, generations=10, elitism=0 (the parameters of an earlier toy version) and you typically see:

gen 0:  [3, 3, 4, 4, 5, 7]   mean 4.33, max 7
gen 10: [3, 5, 6, 6, 7, 7]   mean 5.67, max 7  ← max did NOT improve!

Three failure modes visible at once:

  1. No elitism → the best chromosome (fitness 7) can be destroyed by a single bad crossover. Best fitness flat-lines or even drops.
  2. Tiny population (6) → fitness-proportional selection is highly stochastic. A garbage chromosome (fitness 3) can survive 10 generations purely by luck.
  3. Too few mutation events → with pop=6, gens=10, rate=5%, you expect only ~5 mutation flips per generation across the whole population. Half are 1→0 (counterproductive). Not enough material to introduce missing 1s.

Fix recipe (in order of impact):

  • Add elitism → guarantee best fitness is monotone
  • Increase population to ~20–50 → less stochastic selection, more diversity
  • Increase generations to ~50+ → enough time to converge
  • Tune mutation rate — too low = stuck; too high = random search

Key parameters and their trade-offs

ParameterToo lowToo highTypical
Population sizeHigh stochasticity; diversity collapses fastSlow per generation; many wasted evaluations20–200
GenerationsDoesn’t convergeWasted compute after plateau50–500
Mutation ratePopulation freezes (no new alleles)Becomes random search≈ 1 / chromosome_length
Elitism countBest can be lostPremature convergence1–5 % of population
Selection pressure (tournament k)Slow convergenceDiversity collapsesk = 2–5

Where GAs are still used today

GAs from the 1970s (Holland) are alive where mutation + crossover deliver something gradient-based methods cannot.

  • NASA ST5 antenna (2006)the iconic example. Lohn et al. evolved a tiny S-band antenna for the ST5 satellite using a GA. The result was a bizarrely bent wire shape no human would design — and it outperformed conventional designs.
  • Neural Architecture Search (NAS) — Google’s AmoebaNet (Real et al., 2018) evolved CNN architectures that beat human-designed ones on ImageNet. An entire research direction in AutoML.
  • Game AI / NEATNeuroEvolution of Augmenting Topologies evolves neural networks for game agents. Famous via “MarI/O” (Super Mario), used in research and indie games.
  • Drug discovery — molecular structure generation uses GAs to evolve candidate compounds with desired pharmacological properties.
  • Industrial scheduling — job-shop scheduling in semiconductor fabs and other manufacturing lines uses GA hybrids.
  • Wind-farm layout — turbine placement to maximize energy while minimizing wake interference is a GA classic.
  • Quant trading — hedge funds use GAs to tune parameter sets across thousands of trading strategies.

Why GAs survive: when the search space is huge and discrete and no useful gradient exists — crossover provides a combination operator that can recombine partial solutions. Especially useful when “creativity” matters (antenna design, neural architectures, molecular structures).

Where GAs were replaced — and by what

DomainWas GA, now …Why
Neural network training (Edelman 1980s, NEAT until ~2010)Backpropagation + SGD / AdamOnce the function is differentiable, gradients are exponentially more informative than blind mutation. GA-as-NN-trainer is now nostalgia.
Hyperparameter tuningBayesian Optimization → Optuna / Ray TuneBayesOpt is more sample-efficient; GAs often need 100× more evaluations
Reinforcement learning policy searchPPO, SAC, A3C (Policy Gradients)Policy-gradient RL uses reward signals directly, while GAs see only the final fitness
Neural Architecture SearchDARTS (Differentiable Architecture Search, Liu 2019)DARTS makes architecture search differentiable → 1000× faster than evolutionary NAS
Drug discovery (molecular generation)Diffusion models & SMILES TransformersGenerative models learn the distribution of “good” molecules from data instead of mutating blindly
Chess / Go strategyAlphaZero (MCTS + DNN)Self-play with a neural policy function beats evolved strategies by light-years

Where GAs still stand alone: when (a) the function is not differentiable, (b) domain knowledge is hard to encode in a generative model, and (c) the problem is genuinely discrete-combinatorial (e.g. antenna geometry, layout problems). The NASA ST5 antenna is exactly such a case — no differentiable formulation possible.

See also

Sibling algorithms (also Local Search, MoAI VL02)

Course material

Related concepts

Tags: methods-of-ai science genetic-algorithms
Created: 28-11-24 12:06 · Restructured: 18-05-26