A Genetic Algorithm (GA) is a population-based stochastic search method that imitates natural selection: maintain a population of candidate solutions (“chromosomes”), score each by a fitness function, then evolve the next generation through three operators — selection, crossover, and mutation. Over generations the average fitness rises and (with luck) the best individual approaches the optimum.
GAs were developed in the 1970s (Holland, 1975) and dominated optimization research for decades. Today they’re niche — but they survive in problems where gradients aren’t available and where creative recombination matters (see Where GAs are still used today).
When to reach for a GA
A GA is worth considering when:
The search space is huge and discrete (combinatorial)
The objective is non-differentiable or expensive to evaluate
You can score a candidate but can’t compute the gradient
You don’t need the optimum — a good solution is enough
You can encode candidate solutions as fixed-length strings (chromosomes)
If you have gradients → use gradient descent. If you have a probabilistic model → Bayesian Optimization. If linear/integer → Gurobi/CPLEX. A GA is the “no math required” default for ugly discrete problems.
Worked example: the knapsack problem
You’re packing for a mountain trip. Your backpack holds 40 L. You have 8 items, each with a volume and a value (how useful it would be on the trip):
Item
Volume (L)
Value
Tent
18
10
Sleeping bag
12
9
Stove
4
6
Food (3 days)
8
8
Water filter
2
5
Camera
3
4
Book
1
2
First-aid kit
2
7
Goal: maximize total value while keeping total volume ≤ 40 L.
Encoding as a chromosome
Represent each candidate packing list as an 8-bit binary string, one bit per item:
[1, 1, 0, 1, 1, 0, 0, 1] → 18+12+8+2+2 = 42 L → still over.
[1, 1, 0, 1, 0, 0, 0, 1] → 18+12+8+2 = 40 L ✓, value = 10+9+8+7 = 34.
The full search space has 2⁸ = 256 candidates — small enough to brute-force, but the GA scales to 2⁵⁰ items where brute force fails.
Encoding the fitness function
🐍 Code anzeigen / ausblenden
volumes = [18, 12, 4, 8, 2, 3, 1, 2]values = [10, 9, 6, 8, 5, 4, 2, 7]def fitness(chromosome, capacity=40): vol = sum(g * v for g, v in zip(chromosome, volumes)) val = sum(g * v for g, v in zip(chromosome, values)) if vol > capacity: return 0 # penalize invalid: simplest valid solution return val
Now any standard GA loop works on it. This is what makes GAs flexible: swap the fitness function, keep the algorithm.
The algorithm in 5 steps
1. INITIALIZE — generate a random population of N chromosomes
2. EVALUATE — compute fitness of every chromosome
3. SELECT — pick parents proportional to fitness
4. RECOMBINE — apply crossover + mutation to produce children
5. REPLACE — children become the new population
→ repeat 2–5 until termination (time / generations / plateau)
That’s it. The genius is that no part requires gradients or domain knowledge — only a way to score candidates.
The three operators
Selection — who gets to reproduce
Fitness-proportional (roulette wheel) — probability ∝ fitness. Simple but breaks when fitness values are very close (no selection pressure) or when one “freak” dominates.
Tournament selection — pick k chromosomes at random, keep the best. More robust to fitness scaling. Most common in practice.
Rank selection — assign probabilities by rank (not raw fitness). Avoids domination by outliers.
Elitism — unconditionally keep the top e chromosomes. Guarantees best fitness is monotone non-decreasing.
Crossover — combining good chunks
Takes two parents → produces two children. Variants:
Single-point — pick a random index, swap suffixes between parents.
Multi-point — multiple swap points; more mixing.
Uniform — for each gene independently, copy from either parent (50/50). Maximum mixing.
Crossover does the heavy lifting — it recombines good chunks from different solutions into novel combinations. But it can only mix genes that already exist in the parent pool.
Mutation — introducing genuinely new genes
For bit strings: flip each bit with probability μ.
Too low → population gets stuck (no new genetic material)
Too high → algorithm degenerates into random search
Mutation is the only mechanism that can introduce a gene value not currently present anywhere in the population. Without it, evolution halts at whatever gene combinations existed at initialization (see also Q108 in Questions for Methods of AI).
Demo: 6 GA variants side-by-side on the OneMax problem
Instead of one run we sweep 6 parameter configurations and plot them on a 2×3 grid + a comparison overlay. Each curve is averaged across 5 random seeds (shaded band = ±1 std) so you see the characteristic behavior of each setting, not a single anecdotal run.
Problem: OneMax with chrom_len = 20 (count the 1s in a 20-bit string; max fitness = 20). Harder than the chrom_len=8 toy — differences between configurations become visible.
🐍 Code anzeigen / ausblenden
# Pyodide environment (Obsidian Execute Code plugin) — install matplotlib first.# If you run in normal Python, delete the next 2 lines.import micropipawait micropip.install("matplotlib")import randomimport numpy as npimport matplotlib.pyplot as pltimport matplotlib.gridspec as gridspec# ════════════════════════════════════════════════════════════════# 🎛️ Try changing these# ════════════════════════════════════════════════════════════════CHROM_LEN = 20 # harder than 8 → variant differences are visibleGENERATIONS = 80SEEDS = [0, 1, 2, 3, 4] # average over 5 runs per variant# ════════════════════════════════════════════════════════════════# --- GA operators ---def fitness(chromosome): return sum(chromosome) # OneMaxdef select(population, k=2): # +0.5 makes weights non-zero even when a chromosome has fitness 0 weights = [fitness(c) + 0.5 for c in population] return random.choices(population, weights=weights, k=k)def crossover(p1, p2): point = random.randint(1, len(p1) - 1) return p1[:point] + p2[point:], p2[:point] + p1[point:]def mutate(chromosome, rate): return [1 - g if random.random() < rate else g for g in chromosome]def run_ga(pop_size, mutation_rate, elitism, chrom_len=CHROM_LEN, generations=GENERATIONS, seed=0): random.seed(seed) pop = [[random.randint(0, 1) for _ in range(chrom_len)] for _ in range(pop_size)] history = [] for gen in range(generations + 1): fits = [fitness(c) for c in pop] history.append((gen, max(fits), sum(fits) / len(fits), min(fits))) ranked = sorted(pop, key=fitness, reverse=True) new_pop = ranked[:elitism] # elitism: keep top-N untouched while len(new_pop) < pop_size: p1, p2 = select(pop) c1, c2 = crossover(p1, p2) new_pop.extend([mutate(c1, mutation_rate), mutate(c2, mutation_rate)]) pop = new_pop[:pop_size] return history# --- 6 variants — each isolates one trade-off ---VARIANTS = [ ("① Baseline\npop=30, elitism=2, μ=0.05", dict(pop_size=30, mutation_rate=0.05, elitism=2)), ("② No elitism\npop=30, μ=0.05", dict(pop_size=30, mutation_rate=0.05, elitism=0)), ("③ μ=0.00 (no mutation)\npop=30, elitism=2", dict(pop_size=30, mutation_rate=0.00, elitism=2)), ("④ μ=0.30 (too high)\npop=30, elitism=2", dict(pop_size=30, mutation_rate=0.30, elitism=2)), ("⑤ Tiny pop=5\nelitism=1, μ=0.05", dict(pop_size=5, mutation_rate=0.05, elitism=1)), ("⑥ Large pop=200\nelitism=10, μ=0.05", dict(pop_size=200, mutation_rate=0.05, elitism=10)),]# Run all variants × all seedsall_results = {}for label, params in VARIANTS: runs = [] for seed in SEEDS: hist = run_ga(seed=seed, **params) g, b, m, w = zip(*hist) runs.append((np.array(g), np.array(b), np.array(m), np.array(w))) all_results[label] = runs# --- Plot: 2×3 grid of variants + wide comparison panel below ---fig = plt.figure(figsize=(16, 11))gs = gridspec.GridSpec(3, 3, height_ratios=[1, 1, 1.3], hspace=0.6, wspace=0.3)for idx, (label, _) in enumerate(VARIANTS): row, col = idx // 3, idx % 3 ax = fig.add_subplot(gs[row, col]) runs = all_results[label] gens = runs[0][0] bests = np.array([r[1] for r in runs]) means = np.array([r[2] for r in runs]) worsts = np.array([r[3] for r in runs]) # Shaded ±std bands ax.fill_between(gens, bests.mean(0) - bests.std(0), bests.mean(0) + bests.std(0), color='crimson', alpha=0.15) ax.fill_between(gens, means.mean(0) - means.std(0), means.mean(0) + means.std(0), color='steelblue', alpha=0.15) ax.fill_between(gens, worsts.mean(0) - worsts.std(0), worsts.mean(0) + worsts.std(0), color='gray', alpha=0.12) ax.plot(gens, bests.mean(0), color='crimson', lw=2.0, label='best') ax.plot(gens, means.mean(0), color='steelblue', lw=1.5, label='mean') ax.plot(gens, worsts.mean(0), color='gray', lw=1.0, ls='--', label='worst') ax.axhline(CHROM_LEN, color='green', ls=':', alpha=0.6) ax.set_title(label, fontsize=9.5) ax.set_xlabel('generation', fontsize=8.5) ax.set_ylabel('fitness', fontsize=8.5) ax.set_ylim(0, CHROM_LEN + 1) ax.grid(alpha=0.3) if idx == 0: ax.legend(loc='lower right', fontsize=7.5, framealpha=0.95)# Bottom: best-fitness overlay for all 6 variantsax_cmp = fig.add_subplot(gs[2, :])palette = plt.cm.tab10(np.linspace(0, 0.9, len(VARIANTS)))for (label, _), color in zip(VARIANTS, palette): runs = all_results[label] gens = runs[0][0] bests = np.array([r[1] for r in runs]) short = label.split('\n')[0] # ①, ②, … ax_cmp.plot(gens, bests.mean(0), color=color, lw=2.4, label=short) ax_cmp.fill_between(gens, bests.mean(0) - bests.std(0), bests.mean(0) + bests.std(0), color=color, alpha=0.12)ax_cmp.axhline(CHROM_LEN, color='green', ls=':', alpha=0.6, label=f'max = {CHROM_LEN}')ax_cmp.set_xlabel('generation')ax_cmp.set_ylabel('best-in-population fitness (mean ± std, 5 seeds)')ax_cmp.set_title('Comparison overlay — best-fitness trajectory across all 6 variants', fontsize=11.5)ax_cmp.set_ylim(0, CHROM_LEN + 1)ax_cmp.legend(loc='lower right', ncol=2, fontsize=9, framealpha=0.95)ax_cmp.grid(alpha=0.3)plt.suptitle(f'Genetic Algorithm — 6 parameter variants on OneMax (chrom_len={CHROM_LEN})', fontsize=13.5, y=0.995)plt.show()# --- Numeric summary ---print(f"\n📊 FINAL BEST FITNESS (averaged across {len(SEEDS)} seeds — max possible = {CHROM_LEN})\n")for label, _ in VARIANTS: runs = all_results[label] finals = np.array([r[1][-1] for r in runs]) print(f" {label.splitlines()[0]:32s}: {finals.mean():5.2f} ± {finals.std():.2f}")
What each variant teaches
#
Variant
Characteristic behavior
Lesson
①
Baseline — pop=30, elitism=2, μ=0.05
Best reaches max quickly and stays; mean plateaus 1–2 below max
Healthy GA: selection lifts mean, elitism locks the best, mutation prevents stagnation. The 1–2 gap between best and mean is the mutation–selection equilibrium.
②
No elitism
Red curve wobbles — best can drop between generations
Elitism is the only thing that guarantees monotone improvement. Without it, a great chromosome can be destroyed by one bad crossover or mutation.
③
μ = 0.00 (no mutation)
Best plateaus wherever the initial population happened to sit; never reaches max
Mutation is the only operator that can introduce a gene value not present in the initial population. Without it, evolution = pure recombination of the founder gene pool.
④
μ = 0.30 (too high)
Best occasionally reaches max but unstable; mean stuck around ~10–12 (random-bitstring expectation)
Too much mutation overwrites everything selection just built. The GA degenerates into random search.
⑤
Tiny pop = 5
Curves very noisy across seeds (wide std band); often fails to converge
Small populations have high stochasticity — selection signal drowns in noise. Roulette wheel can repeatedly pick poor chromosomes by luck.
⑥
Large pop = 200
Smoothest curves; mean climbs highest; converges most reliably to max
Big populations = stable statistics + more diversity = nearly perfect convergence. Cost: 200/5 = 40× more fitness evaluations per generation.
How to read the overlay (bottom panel)
Variants ①, ⑥ climb fastest and stay at max — properly balanced GAs do their job.
Variant ⑤ (tiny pop) has a wide std band → some seeds get lucky, others don’t.
Variant ③ (no mutation) plateaus visibly below the max, no matter how long you run it.
Variant ④ (μ=0.30) has the most chaotic trajectory — too much randomness is fundamentally different from too little.
The take-away in one sentence
A good GA balances three forces: selection pressure (push toward fit), mutation (inject novelty), diversity preservation (avoid collapse). Variants ②–⑤ each break one of these and show what happens.
Why a small population fails (the classic Max-result trap)
Run with pop_size=6, generations=10, elitism=0 (the parameters of an earlier toy version) and you typically see:
gen 0: [3, 3, 4, 4, 5, 7] mean 4.33, max 7
gen 10: [3, 5, 6, 6, 7, 7] mean 5.67, max 7 ← max did NOT improve!
Three failure modes visible at once:
No elitism → the best chromosome (fitness 7) can be destroyed by a single bad crossover. Best fitness flat-lines or even drops.
Tiny population (6) → fitness-proportional selection is highly stochastic. A garbage chromosome (fitness 3) can survive 10 generations purely by luck.
Too few mutation events → with pop=6, gens=10, rate=5%, you expect only ~5 mutation flips per generation across the whole population. Half are 1→0 (counterproductive). Not enough material to introduce missing 1s.
Fix recipe (in order of impact):
Add elitism → guarantee best fitness is monotone
Increase population to ~20–50 → less stochastic selection, more diversity
Increase generations to ~50+ → enough time to converge
Tune mutation rate — too low = stuck; too high = random search
Key parameters and their trade-offs
Parameter
Too low
Too high
Typical
Population size
High stochasticity; diversity collapses fast
Slow per generation; many wasted evaluations
20–200
Generations
Doesn’t converge
Wasted compute after plateau
50–500
Mutation rate
Population freezes (no new alleles)
Becomes random search
≈ 1 / chromosome_length
Elitism count
Best can be lost
Premature convergence
1–5 % of population
Selection pressure (tournament k)
Slow convergence
Diversity collapses
k = 2–5
Where GAs are still used today
GAs from the 1970s (Holland) are alive where mutation + crossover deliver something gradient-based methods cannot.
NASA ST5 antenna (2006) — the iconic example. Lohn et al. evolved a tiny S-band antenna for the ST5 satellite using a GA. The result was a bizarrely bent wire shape no human would design — and it outperformed conventional designs.
Neural Architecture Search (NAS) — Google’s AmoebaNet (Real et al., 2018) evolved CNN architectures that beat human-designed ones on ImageNet. An entire research direction in AutoML.
Game AI / NEAT — NeuroEvolution of Augmenting Topologies evolves neural networks for game agents. Famous via “MarI/O” (Super Mario), used in research and indie games.
Drug discovery — molecular structure generation uses GAs to evolve candidate compounds with desired pharmacological properties.
Industrial scheduling — job-shop scheduling in semiconductor fabs and other manufacturing lines uses GA hybrids.
Wind-farm layout — turbine placement to maximize energy while minimizing wake interference is a GA classic.
Quant trading — hedge funds use GAs to tune parameter sets across thousands of trading strategies.
Why GAs survive: when the search space is huge and discrete and no useful gradient exists — crossover provides a combination operator that can recombine partial solutions. Especially useful when “creativity” matters (antenna design, neural architectures, molecular structures).
Where GAs were replaced — and by what
Domain
Was GA, now …
Why
Neural network training (Edelman 1980s, NEAT until ~2010)
Backpropagation + SGD / Adam
Once the function is differentiable, gradients are exponentially more informative than blind mutation. GA-as-NN-trainer is now nostalgia.
Hyperparameter tuning
Bayesian Optimization → Optuna / Ray Tune
BayesOpt is more sample-efficient; GAs often need 100× more evaluations
Reinforcement learning policy search
PPO, SAC, A3C (Policy Gradients)
Policy-gradient RL uses reward signals directly, while GAs see only the final fitness
Neural Architecture Search
DARTS (Differentiable Architecture Search, Liu 2019)
DARTS makes architecture search differentiable → 1000× faster than evolutionary NAS
Drug discovery (molecular generation)
Diffusion models & SMILES Transformers
Generative models learn the distribution of “good” molecules from data instead of mutating blindly
Chess / Go strategy
AlphaZero (MCTS + DNN)
Self-play with a neural policy function beats evolved strategies by light-years
Where GAs still stand alone: when (a) the function is not differentiable, (b) domain knowledge is hard to encode in a generative model, and (c) the problem is genuinely discrete-combinatorial (e.g. antenna geometry, layout problems). The NASA ST5 antenna is exactly such a case — no differentiable formulation possible.