Local search keeps only the current state — no path history, no frontier. Goal: find a good state, not a route.
Works by moving to neighboring states according to an objective functionf(s) to maximize (minimize via −f(s)).
Useful when the solution is the state (e.g. 8-Queens, TSP, scheduling), not the path to it.
All variants trade exploration (escape local optima) against exploitation (climb toward improvement).
Plain Hill-Climbing only finds local optima; SA, beam search and GAs add randomness or population to escape them.
Glossary — important Local Search vocabulary ⭐
These are the words you will see in exam questions. Most are illustrated with the 8-Queens example (state = one queen per column).
The building blocks
State — one complete configuration (not a partial assignment, not a path).
8-Queens example: one full board with one queen per column, e.g. (8,3,7,4,2,5,1,6).
State space — set of all configurations the algorithm could be in. Usually exponentially large.
Neighborhoodneighbors(s) — the states reachable from s in one step. The choice of neighborhood is a design decision.
8-Queens large neighborhood: move any queen to any other field in the same column ⇒ 8 × 7 = 56 neighbours per state.
8-Queens small (toroidal) neighborhood: move any queen down by one field ⇒ 8 neighbours per state.
Objective functionf(s) (a.k.a. value, fitness) — the number we want to maximize.
8-Queens: negative number of attacking queen pairs (we maximize toward 0), or 28 − attacks for a non-negative GA fitness.
Properties of states
Local optimum — a state where no neighbour is strictly better. Plain HC stops here regardless of whether it’s a global optimum.
Global optimum — best state in the whole state space.
Plateau — a flat region: many neighbours have the same value as the current state.
Ridge — a sequence of local maxima where no single neighbour improves, but a longer detour would.
Search-strategy vocabulary
Uphill / downhill move — moving to a neighbour with higher / lower value (assuming maximization).
Exploration vs. exploitation — willingness to take downhill / random moves vs. greediness about improvement.
TemperatureT — in SA, scales the willingness to take downhill moves. High T → almost random walk; T → 0 → behaves like First-Choice HC.
Cooling scheduleschedule(t) — function that decreases T over time. Slides mention Stepwise Linear and Delayed Stepwise Linear only.
Beam — in beam search, the set of k currently active states. All k beam states share the same successor pool each round.
Genetic-algorithm vocabulary
Population — current set of k states (chromosomes).
Chromosome — a state encoded as a string of genes (e.g. 83742516 for an 8-Queens board).
Fitness — the GA name for the objective function. Selection probability scales with fitness.
Reproduction / Crossover — combine two parent chromosomes into one (or two) children.
1-point crossover: pick a split point, swap tails.
Uniform-order crossover: for permutations — fill child by binary template from parent 1, fill remaining slots in the order they appear in parent 2.
Mutation — randomly flip one gene with small probability. Maintains diversity.
Memetic algorithm — GA + a local-search step (e.g. HC) applied to each new child.
Continuous vs. Discrete state spaces
Hill-Climbing is the discrete analogue of gradient ascent. The key differences (slide 10):
Aspect
Continuous space
Discrete space
Neighborhood
ε-neighborhood (uncountably many)
Discrete, often finite
Best step
Gradient = direction of steepest ascent
Found by enumeration of neighbours
Cost per step
Compute gradient
Enumerate & compare neighbours
⚠️ In discrete spaces, the state space is typically exponentially large, but the neighborhood at any single state is small — that’s why local search works at all.
Neighborhood-size tradeoff
A design choice with direct consequences (slide 19):
Small neighborhood
Large neighborhood
✅ Faster to enumerate per step
❌ Slower to enumerate per step
❌ Greater risk of getting stuck in local optima
✅ Smaller risk of getting stuck in local optima
8-Queens illustration: the toroidal “move-one-down” neighborhood is only 8 states per step (cheap) but offers very few escape routes. The “any-queen-to-any-row” neighborhood is 56 states per step (expensive) but offers many escape routes.
Cooling schedules & Boltzmann intuition for SA
Acceptance probability for a downhill move:P = exp(Δ/T), where Δ = value(neighbor) − value(current) < 0 and T > 0.
Because Δ ≤ 0 for downhill, 0 ≤ exp(Δ/T) ≤ 1 always.
Effect of Δ (at fixed T = 1):
Δ
exp(Δ/T)
0
1.00
−1
0.37
−2
0.14
Effect of T (at fixed Δ = −2):
T
exp(Δ/T)
∞
1.00
100
0.98
10
0.82
1
0.14
0.1
2 × 10⁻⁹ ≈ 0
So at high T SA behaves like a random walk (almost everything accepted); at low T it behaves like First-Choice HC (only improvements accepted). The temperature interpolates smoothly between exploration and exploitation.
Cooling schedules in the slides (slide 27):
Stepwise Linear — start with arbitrary T, decrease by a constant c every step.
Delayed Stepwise Linear — start with arbitrary T, decrease by c every k-th step.
⚠️ Exponential cooling (T ← α·T) is a common textbook choice but is NOT in these slides. Don’t cite it in the exam.
Effect of schedule speed: A slower cooling schedule increases the probability of finding a high-quality solution — but increases runtime. The slides do NOT claim a global-optimum guarantee. (The Geman & Geman log-cooling theorem is a textbook fact from R&N, not from these slides.)
🐍 Figure — Simulated Annealing acceptance probability exp(Δ/T) across a cooling schedule
# Pyodide / Obsidian Execute Code: install matplotlib first.import micropipawait micropip.install("matplotlib")import matplotlib.pyplot as pltimport numpy as np# Cooling schedule: T decreasing from 10 to 0.1 (linear in log space for clarity)T = np.linspace(10.0, 0.1, 400)deltas = [-1, -5, -10]colors = {"-1": "#27ae60", "-5": "#f39c12", "-10": "#c0392b"}fig, ax = plt.subplots(figsize=(9, 5.5))for d in deltas: P = np.exp(d / T) ax.plot(T, P, lw=2.5, color=colors[str(d)], label=f"Δ = {d}")# Reference markers: T values from the slides' tablefor Tref, col in [(10, "#27ae60"), (1, "#f39c12"), (0.1, "#c0392b")]: ax.axvline(Tref, color=col, ls=":", lw=1, alpha=0.5)ax.set_xlabel("Temperature T (cooling →)", fontsize=11)ax.set_ylabel("Acceptance probability exp(Δ/T)", fontsize=11)ax.set_title("Simulated Annealing — acceptance probability for downhill moves\n" "High T = exploration (almost all moves accepted) · " "Low T = exploitation (only improvements)", fontsize=11)ax.set_xscale("log")ax.invert_xaxis() # T decreases over time → time flows left→rightax.set_ylim(-0.02, 1.05)ax.grid(True, alpha=0.3)ax.legend(loc="upper right", fontsize=11, framealpha=0.95, title="Energy change of move")ax.text(8, 0.05, "exploration\n(random walk)", ha="center", fontsize=9, color="#27ae60", style="italic")ax.text(0.15, 0.85, "exploitation\n(greedy HC)", ha="center", fontsize=9, color="#c0392b", style="italic")plt.tight_layout(); plt.show()
What this graph shows. Acceptance probability P = exp(Δ/T) for three downhill move sizes (Δ = −1 mild, −5 medium, −10 severe) as the temperature T cools from 10 down to 0.1. Time flows left → right (T decreasing).
Things to notice.
At high T (left), even severe Δ = −10 moves are accepted with ~37% probability — the search behaves like an almost-random walk.
At low T (right), only Δ = −1 still has a tiny chance; Δ = −10 collapses to effectively zero — the search has frozen into First-Choice HC behaviour.
The gap between the curves widens as T cools: temperature selectively filters out bigger downhill moves first. This is exactly the exploration→exploitation interpolation Boltzmann gives us for free.
Local Beam Search vs. Parallel Hill-Climbing
Single most common exam trap. Both maintain k states; they differ in what gets compared.
Parallel Hill-Climbing
Local Beam Search
Per iteration
Each of the k states independently picks its own best neighbor
Pool ALL neighbours of ALL k states, then pick the global top k
Information sharing
None — k truly independent HCs
States compete for slots → search concentrates on the most promising region
Risk
Same as HC per chain (k local optima)
Premature convergence: all k states may collapse into one region of the space
Why it matters. Local Beam Search is not “k HCs in parallel” — it is one coordinated search where good states crowd out bad ones. Stochastic Beam Search fixes the early-convergence issue by sampling the k successors with probability proportional to value.
Genetic Algorithms representation & operators
A GA needs four design choices (slide 42):
Representation — encode each state as a chromosome (string of genes).
8-Queens: string of 8 digits from {1,…,8}, i-th digit = row of i-th queen.
TSP on {0,1,2,3,4}: string of 4 digits (start fixed at 0), each digit = next node visited.
Fitness function — non-negative version of the objective.
Reproduction (crossover) rule — combine two parents.
General: 1-point crossover.
Permutations: naive crossover yields invalid solutions (duplicate / missing nodes) → use uniform-order crossover or a repair operation.
Mutation rule — random small change.
8-Queens: replace one digit with a random row.
TSP: swap two genes.
Fitness-proportionate selection. Parents are picked with probability proportional to fitness — better individuals reproduce more often, but worse ones still survive sometimes (preserving diversity).
Uniform-order crossover (for permutations):
Pick two parents.
Generate a random binary template the length of the chromosome.
Child 1: at every 1-position, copy parent 1’s gene. Fill the remaining slots with the missing values in the order they appear in parent 2.
Child 2: swap parent roles.
Formulas & Notation
Symbol
Meaning
f(s)
Objective function / value of state s
neighbors(s)
Set of states reachable in one step
Δ = f(s') − f(s)
Value change (negative = downhill)
exp(Δ/T)
SA acceptance probability for a downhill move
schedule(t)
Cooling function (Stepwise Linear / Delayed Stepwise Linear)
T
Temperature (SA)
k
Beam width (Beam Search) / population size (GA)
fitness(c)
GA fitness of chromosome c
Common Exam Traps ⚠️
Local Beam Search ≠ k parallel HCs. In beam search all k states share their successor pool and compete; in parallel HC each chain is independent.
Plain HC never accepts a downhill move. Stochastic HC (per slides) requires a strictly improving neighbour too — so it also does not escape plateaus by itself. Only SA and (indirectly) GAs accept worse states.
Random-restart HC is called “Random-restart (or parallel) Hill-Climbing” in the slides (slide 22). It guarantees finding the global optimum only as n → ∞ — not with a finite number of restarts.
Cooling schedule slow-down: the slides only claim it increases the probability of a high-quality solution — they do not state a global-optimum guarantee. The slow-cooling global-optimum theorem (Geman & Geman) is R&N, not these slides.
Cooling schedule list: slides give only Stepwise Linear and Delayed Stepwise Linear. Do not write “exponential cooling” in an exam answer about this lecture.
Acceptance probability sign: in exp(Δ/T) the Δ for downhill is negative, so the exponent is non-positive and the probability lies in (0, 1].
Neighborhood-size tradeoff: small = fast per step but more local optima; large = slower per step but fewer local optima.
Permutation crossover: naive 1-point crossover on a permutation produces invalid solutions (duplicates / missing values). Use uniform-order crossover or a repair operation.
First-Choice HC is useful when the neighborhood is too large to enumerate — it picks the first improving neighbour rather than the best.
SA at high T behaves like random walk; at low T like First-Choice HC. This interpolation is the whole point of the temperature schedule.
ALGORITHMS (full reference) ⭐
Pseudocode + intuition + complexity for every algorithm you might trace by hand in the exam. Organised by family:
Population family: #8 Genetic Algorithm · #9 Memetic / Hybrid
1. Hill-Climbing (steepest ascent)
Intuition. At each step, jump to the neighbour with the highest value. Stop when no neighbour improves.
HILL-CLIMBING(start): current ← start while not termination(): neighbor ← argmax over n ∈ neighbors(current) of value(n) if value(neighbor) ≤ value(current): return current # local optimum / plateau current ← neighbor
Complexity per step. O(|neighbors(s)|) for the argmax. Number of steps depends on the landscape.
Pros. ✅ Trivial to implement. ✅ Constant memory (just the current state). Cons. ❌ Gets stuck at any local optimum, plateau, or ridge. ❌ No way to escape.
Termination. Naturally on a local maximum, or on a bound (max steps / wall-clock time).
🐍 Figure — Hill-Climbing on a multi-modal 1D landscape (why HC gets stuck)
# Pyodide / Obsidian Execute Code: install matplotlib first.import micropipawait micropip.install("matplotlib")import matplotlib.pyplot as pltimport numpy as np# Multi-modal landscape with 3 peaks (global ≈ x=7.0)def f(x): return (np.sin(x) * 2.0 + np.sin(2.1 * x + 0.5) * 1.4 + np.exp(-((x - 7.0) ** 2) / 4.0) * 3.0 - 0.05 * (x - 5) ** 2)xs = np.linspace(0, 12, 1000)ys = f(xs)# Discrete hill-climbing: step size 0.15, compare left/right neighbourdef hill_climb(start, step=0.15, max_iters=200): x = start traj = [x] for _ in range(max_iters): left, right = x - step, x + step best = max([left, x, right], key=f) if best == x or best < 0 or best > 12: break x = best traj.append(x) return np.array(traj)starts = [0.5, 3.5, 5.8, 9.5]colors = ["#e74c3c", "#f39c12", "#27ae60", "#3498db"]fig, ax = plt.subplots(figsize=(10, 5.5))ax.plot(xs, ys, color="#2c3e50", lw=2.5, label="objective f(x)")ax.fill_between(xs, ys, ys.min() - 0.5, color="#ecf0f1", alpha=0.5)# Mark global maximumgx = xs[np.argmax(ys)]; gy = ys.max()ax.plot(gx, gy, "*", color="#f1c40f", markersize=22, markeredgecolor="black", markeredgewidth=1.2, label="global maximum", zorder=5)for s, c in zip(starts, colors): traj = hill_climb(s) ax.plot(traj, [f(x) for x in traj], "o-", color=c, lw=1.5, markersize=5, alpha=0.85, label=f"start x={s} → end x={traj[-1]:.2f}") ax.plot(traj[0], f(traj[0]), "s", color=c, markersize=10, markeredgecolor="black", zorder=4) ax.plot(traj[-1], f(traj[-1]), "X", color=c, markersize=12, markeredgecolor="black", zorder=4)ax.set_xlabel("x (state)", fontsize=11)ax.set_ylabel("f(x) (objective)", fontsize=11)ax.set_title("Hill-Climbing on a multi-modal landscape\n" "Squares = starts · ✕ = where HC got stuck · ★ = true global max", fontsize=11)ax.legend(loc="lower right", fontsize=9, framealpha=0.95)ax.grid(True, alpha=0.3)plt.tight_layout(); plt.show()
What this graph shows. Four hill-climbing runs from different starting positions on the same multi-modal 1D landscape. Each chain greedily steps to whichever neighbour (x − step, x + step) is higher and stops when neither is.
Things to notice.
The green run (start x=5.8) is the lucky one — it climbs the basin of the global peak around x ≈ 7.
The red run (start x=0.5) climbs the first local maximum it finds and stops. It has no idea a much higher peak exists three valleys to the right.
The orange and blue runs illustrate the same trap from different sides — each finds its closest local optimum and freezes.
HC’s quality depends entirely on where you start. This is exactly the motivation for Random-Restart HC (try many starts), Simulated Annealing (occasionally accept downhill), and Genetic Algorithms (population diversity).
2. Stochastic Hill-Climbing
Intuition. Instead of the best neighbour, pick a random strictly improving neighbour. Optionally weight selection probability by how much better each neighbour is.
STOCHASTIC-HC(start): current ← start while not termination(): candidates ← { n ∈ neighbors(current) : value(n) > value(current) } if candidates = ∅: return current # no strictly improving neighbour neighbor ← random.choice(candidates) # may weight by value current ← neighbor
Pros. ✅ Cheaper per step than steepest ascent (no full argmax needed). ✅ Adds some path diversity over multiple runs. Cons. ❌ Per slides, still requires strictly improving neighbours — so it does NOT escape plateaus or local optima. ❌ Compromise between random search and Hill-Climbing.
3. First-Choice Hill-Climbing
Intuition. Generate neighbours one at a time and accept the first one that improves. Useful when the neighbourhood is too large to enumerate.
FIRST-CHOICE-HC(start): current ← start while not termination(): for neighbor in random_order(neighbors(current)): if value(neighbor) > value(current): current ← neighbor break else: return current # no improving neighbour found
Pros. ✅ O(1) per step when neighbourhoods are huge. ✅ Very simple. Cons. ❌ Same getting-stuck problem as plain HC.
4. Random-Restart (or Parallel) Hill-Climbing
Intuition. Run n independent HCs from random starting states; return the best result. Each restart escapes a different local optimum.
RANDOM-RESTART-HC(n): best ← None for i in 1..n: s ← HILL-CLIMBING(random_initial_state()) if best is None or value(s) > value(best): best ← s return best
Guarantee. P(finding global optimum) → 1 as n → ∞. With finite n, no guarantee.
Pros. ✅ Embarrassingly parallel. ✅ Each restart is cheap. Cons. ❌ Independent restarts duplicate effort (no information shared). ❌ No global guarantee for finite n.
⚠️ The slides specifically call this “Random-restart (or parallel) Hill-Climbing” (slide 22).
🐍 Figure — 8-Queens: attacking-pairs count along Hill-Climbing trajectories
# Pyodide / Obsidian Execute Code: install matplotlib first.import micropipawait micropip.install("matplotlib")import matplotlib.pyplot as pltimport numpy as nprng = np.random.default_rng(7)N = 8 # 8-Queens; state = tuple of length 8, board[col] = rowdef attacking_pairs(board): n = len(board); count = 0 for i in range(n): for j in range(i + 1, n): if board[i] == board[j]: count += 1 elif abs(board[i] - board[j]) == j - i: count += 1 return countdef best_neighbour(board): # Large neighbourhood: move any queen to any other row in its column best, best_val = board, attacking_pairs(board) n = len(board) for col in range(n): for row in range(n): if row == board[col]: continue cand = list(board); cand[col] = row v = attacking_pairs(cand) if v < best_val: best, best_val = cand, v return best, best_valdef hill_climb(start, max_iter=50): board = list(start) traj = [attacking_pairs(board)] for _ in range(max_iter): nb, v = best_neighbour(board) if v >= traj[-1]: break # local optimum (no strict improvement) board = nb; traj.append(v) return board, traj# Run 5 random restartsNUM_RUNS = 5runs = []for _ in range(NUM_RUNS): start = [int(r) for r in rng.integers(0, N, size=N)] final_board, traj = hill_climb(start) runs.append((start, final_board, traj))fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5.5), gridspec_kw={"width_ratios": [1.4, 1]})# --- Left: trajectories ---palette = ["#27ae60", "#e74c3c", "#3498db", "#f39c12", "#9b59b6"]for i, (s, f, traj) in enumerate(runs): marker = "★" if traj[-1] == 0 else "✗" status = "solved" if traj[-1] == 0 else f"stuck @ {traj[-1]}" ax1.plot(traj, "o-", color=palette[i], lw=2, markersize=6, label=f"run {i+1}: {status} {marker}")ax1.axhline(0, color="#27ae60", ls=":", lw=1.5, alpha=0.7, label="goal: 0 attacks")ax1.set_xlabel("HC iteration", fontsize=11)ax1.set_ylabel("Attacking-queens pairs (lower = better)", fontsize=11)ax1.set_title("8-Queens — steepest-ascent HC from 5 random starts\n" "Some runs find a solution; others get stuck at a local optimum", fontsize=10.5)ax1.grid(True, alpha=0.3)ax1.legend(loc="upper right", fontsize=9, framealpha=0.95)ax1.set_ylim(bottom=-0.5)# --- Right: best final board ---best_idx = min(range(NUM_RUNS), key=lambda i: runs[i][2][-1])_, best_board, best_traj = runs[best_idx]ax2.set_xlim(0, N); ax2.set_ylim(0, N); ax2.set_aspect("equal")for r in range(N): for c in range(N): color = "#ecf0f1" if (r + c) % 2 == 0 else "#bdc3c7" ax2.add_patch(plt.Rectangle((c, r), 1, 1, facecolor=color))for c, r in enumerate(best_board): ax2.text(c + 0.5, r + 0.5, "♛", ha="center", va="center", fontsize=26, color="#2c3e50")status = "SOLVED (0 attacks)" if best_traj[-1] == 0 \ else f"local optimum ({best_traj[-1]} attacks)"ax2.set_title(f"Best of {NUM_RUNS} runs (run {best_idx+1})\n{status}", fontsize=10.5)ax2.set_xticks([]); ax2.set_yticks([])plt.tight_layout(); plt.show()
What these panels show.Left: the number of attacking-queens pairs (objective we want to drive to 0) along five independent steepest-ascent HC runs, each starting from a fresh random board. Right: the final board from the best run.
Things to notice.
Each curve monotonically decreases — steepest-ascent HC never accepts a worse neighbour.
Most curves flatline before reaching 0 — they hit a local optimum where no single-queen move reduces conflicts. Plain HC has no escape.
Random-Restart HC exploits exactly this: with enough restarts, at least one trajectory will dive all the way to 0 (the global optimum). Russell & Norvig report ~14% of random 8-Queens HCs solve the puzzle — so ~7 restarts give you a >95% success rate. This is the empirical version of “P(success) → 1 as n → ∞“.
5. Simulated Annealing (SA) ⭐
Intuition. Like HC, but accept downhill moves with probability exp(Δ/T). T decreases over time (cooling), so the search starts exploratory and becomes greedy.
SIMULATED-ANNEALING(start, schedule): current ← start for time in 1, 2, 3, ...: T ← schedule(time) if T == 0: return current neighbor ← random.choice(neighbors(current)) Δ ← value(neighbor) − value(current) if Δ > 0: current ← neighbor else if random.bernoulli(exp(Δ/T)): current ← neighbor
Key properties.
High T: nearly all moves accepted ⇒ random walk (exploration).
Low T: only improving moves accepted ⇒ behaves like First-Choice HC (exploitation).
T → 0: search freezes at whatever state it’s in.
Cooling schedules (slides only):
Stepwise Linear — T ← T − c every step.
Delayed Stepwise Linear — T ← T − c every k-th step.
Guarantees per slides. A slower cooling schedule increases the probability of finding a high-quality solution. The slides do NOT claim a global-optimum guarantee. (Textbook R&N + Geman & Geman do, but only with log-cooling and infinite time — not in this lecture.)
Pros. ✅ Can escape local optima. ✅ Anytime — return the best state seen so far. ✅ Simple. Cons. ❌ Performance depends strongly on the cooling schedule. ❌ No global guarantee in the slides’ setting. ❌ Random neighbour selection ignores neighbour quality.
6. Local Beam Search
Intuition. Maintain k states. At each step, generate all successors of all k states, then keep the best k overall.
LOCAL-BEAM-SEARCH(k): k_current ← { random_initial_state() for _ in range(k) } current_value ← max value over k_current while not termination(): all_neighbors ← union(neighbors(c) for c in k_current) k_best ← argmax_top_k over all_neighbors of value neighbor_value ← max value over k_best if neighbor_value ≤ current_value: return argmax over k_current of value k_current, current_value ← k_best, neighbor_value
Key idea. ⚠️ NOT k independent HCs! All k states share their successors and compete — it’s a single coordinated search that concentrates on promising regions.
Pros. ✅ Information sharing across the k chains. ✅ Faster convergence than k independent HCs when one region is clearly best. Cons. ❌ Premature convergence: all k beam slots can be taken by neighbours of the same region. ❌ Memory grows linearly with k.
7. Stochastic Beam Search
Intuition. Same skeleton as Local Beam Search, but pick the k next states probabilistically (probability increases with value) instead of deterministically taking the top k.
STOCHASTIC-BEAM-SEARCH(k): k_current ← { random_initial_state() for _ in range(k) } current_value ← max value over k_current while not termination(): all_neighbors ← union(neighbors(c) for c in k_current) k_best ← random.choice(all_neighbors, count=k, weight=value) # higher value ⇒ higher P neighbor_value ← max value over k_best if neighbor_value ≤ current_value: return argmax over k_current of value k_current, current_value ← k_best, neighbor_value
Pros. ✅ Avoids the early-convergence problem of deterministic beam search. ✅ Balances diversity (exploration) and intensification. Cons. ❌ Choice of weighting function matters. ❌ Slightly more bookkeeping.
8. Genetic Algorithm (simple)
Intuition. Maintain a population of k chromosomes. Each generation, repeatedly pick two parents (proportional to fitness), produce a child via crossover, mutate with small probability, then evict a low-fitness chromosome.
GENETIC-ALGORITHM(k, mutation_probability): population ← { random_chromosome() for _ in range(k) } while not termination(population): for i in 1..k: x, y ← random.choice(population, count=2, weight=fitness) child ← reproduce(x, y) # e.g. 1-point crossover if random() < mutation_probability: child ← mutate(child) population ← population ∪ { child } remove_random_chromosome(population, weight=1/fitness) return argmax over population of fitness
Pros. ✅ Population diversity → can escape many local optima. ✅ Very flexible — applies to any problem with a fitness function. ✅ Implicit parallel exploration of building blocks. Cons. ❌ Many hyperparameters (k, mutation rate, selection scheme). ❌ Slow to converge precisely. ❌ Naive crossover breaks for permutations — must use uniform-order crossover or a repair operation.
# Pyodide / Obsidian Execute Code: install matplotlib first.import micropipawait micropip.install("matplotlib")import matplotlib.pyplot as pltimport numpy as nprng = np.random.default_rng(42)# --- OneMax problem: maximize the number of 1s in a binary string ---BITS = 40 # chromosome length (max fitness = 40)POP_SIZE = 30GENERATIONS = 50MUT_RATE = 0.02def fitness(chrom): return int(chrom.sum())# Initial random population (very low fitness expected ~ BITS/2)pop = rng.integers(0, 2, size=(POP_SIZE, BITS))best_hist, mean_hist, worst_hist = [], [], []def select_parent(pop, fits): # Fitness-proportionate selection (roulette wheel) probs = fits / fits.sum() if fits.sum() > 0 else None idx = rng.choice(len(pop), p=probs) return pop[idx].copy()for gen in range(GENERATIONS): fits = np.array([fitness(c) for c in pop]) best_hist.append(fits.max()) mean_hist.append(fits.mean()) worst_hist.append(fits.min()) new_pop = [] for _ in range(POP_SIZE): p1 = select_parent(pop, fits) p2 = select_parent(pop, fits) # 1-point crossover cut = rng.integers(1, BITS) child = np.concatenate([p1[:cut], p2[cut:]]) # Mutation: flip each bit with prob MUT_RATE flips = rng.random(BITS) < MUT_RATE child = np.where(flips, 1 - child, child) new_pop.append(child) pop = np.array(new_pop)gens = np.arange(GENERATIONS)fig, ax = plt.subplots(figsize=(9, 5.5))ax.plot(gens, best_hist, color="#27ae60", lw=2.5, label="best fitness")ax.plot(gens, mean_hist, color="#3498db", lw=2.0, label="mean fitness")ax.plot(gens, worst_hist, color="#e74c3c", lw=1.5, label="worst fitness", linestyle="--")ax.fill_between(gens, worst_hist, best_hist, color="#3498db", alpha=0.1)ax.axhline(BITS, color="#f39c12", lw=1.2, ls=":", label=f"global optimum = {BITS}")ax.set_xlabel("Generation", fontsize=11)ax.set_ylabel("Fitness (number of 1s in chromosome)", fontsize=11)ax.set_title(f"Genetic Algorithm on OneMax — population={POP_SIZE}, " f"chromosome={BITS} bits, mutation={MUT_RATE}\n" "Best/mean climb steadily; worst stays low due to mutation + diversity", fontsize=10.5)ax.grid(True, alpha=0.3)ax.legend(loc="lower right", fontsize=10, framealpha=0.95)ax.set_ylim(0, BITS + 2)plt.tight_layout(); plt.show()print(f"Final best : {best_hist[-1]}/{BITS}")print(f"Final mean : {mean_hist[-1]:.1f}/{BITS}")print(f"Final worst: {worst_hist[-1]}/{BITS}")
What this graph shows. Per-generation best, mean and worst fitness of a tiny GA solving the OneMax problem (maximise the number of 1s in a 40-bit string; global optimum = 40). Population of 30 chromosomes, 1-point crossover, mutation rate 2 %.
Things to notice.
The best fitness rises sharply in the first ~15 generations, then asymptotes near the global optimum — the classic GA convergence curve.
The mean tracks the best with a small lag — fitness-proportionate selection pulls the whole population toward high-fitness individuals.
The worst stays well below the best for the entire run. This is healthy — it shows mutation and crossover keep producing low-fitness outliers, preserving the diversity that lets the GA escape local optima.
If the gap between best/worst collapsed to zero, the population would be homogenised (everyone in the same basin) — the GA’s main failure mode.
9. Memetic Hybrid GA
Intuition. A Genetic Algorithm where each newly created child is immediately refined by a local search (typically HC) until it reaches a local optimum.
MEMETIC-ALGORITHM(k, mutation_probability): population ← { random_chromosome() for _ in range(k) } while not termination(population): for i in 1..k: x, y ← random.choice(population, count=2, weight=fitness) child ← reproduce(x, y) if random() < mutation_probability: child ← mutate(child) child ← HILL-CLIMBING(child) # local refinement population ← population ∪ { child } remove_random_chromosome(population, weight=1/fitness) return argmax over population of fitness
Pros. ✅ Each individual is locally optimal → GA explores between local optima rather than the raw state space. ✅ Faster convergence in practice. Cons. ❌ Each generation is much more expensive (HC per child). ❌ Risk of homogeneous population (everyone in the same basin).
Algorithm comparison table — pros & cons
Hill-Climbing family
Algorithm
What it does
Pros
Cons
Hill-Climbing (steepest)
Always jump to best neighbour
✅ Trivial ✅ O(1) memory
❌ Stuck at any local optimum / plateau / ridge
Stochastic HC
Random strictly-improving neighbour
✅ Cheaper than full argmax ✅ Run-to-run diversity
❌ Still requires strict improvement (per slides) → does NOT escape plateaus
First-Choice HC
First improving neighbour found
✅ Great when neighbourhood is huge
❌ Same getting-stuck problem
Random-Restart HC
n independent HCs from random starts
✅ Embarrassingly parallel ✅ P(success) → 1 as n → ∞
❌ Independent restarts duplicate work ❌ No finite-n guarantee
Annealing & Beam family
Algorithm
What it does
Pros
Cons
Simulated Annealing
Accept downhill with exp(Δ/T); T cools over time
✅ Escapes local optima ✅ Anytime
❌ Schedule-sensitive ❌ No global guarantee in slides
Local Beam Search
k states, share successor pool, keep best k
✅ Information sharing across chains
❌ Premature convergence to one region
Stochastic Beam Search
k states, sample successors weighted by fitness
✅ Fixes beam search’s convergence problem
❌ Weighting design matters
Population family
Algorithm
What it does
Pros
Cons
Genetic Algorithm
Population + crossover + mutation
✅ Very flexible ✅ Implicit parallel exploration
❌ Many hyperparameters ❌ Slow to fine-tune ❌ Naive crossover fails for permutations
Memetic / Hybrid GA
GA + HC on every new child
✅ Each individual is locally optimal ✅ Faster convergence
❌ Expensive per generation ❌ Risk of population homogenization
Big-picture comparison
Algorithm
Explores via
Downhill?
Memory
Escapes local optima?
Guarantees
Hill-Climbing
Greedy best neighbour
No
Current state
No
Local optimum
Stochastic HC
Random uphill
No
Current state
No (slides require strict improvement)
Local optimum
First-Choice HC
First uphill found
No
Current state
No
Local optimum
Random-Restart HC
Multiple independent runs
No
Current state per chain
Yes (across restarts)
P(global) → 1 as n → ∞
Simulated Annealing
Random neighbour, downhill OK with exp(Δ/T)
Yes (prob.)
Current state + T
Yes
Slides: slower cooling ⇒ higher probability of high-quality solution. No global guarantee in slides.
Local Beam Search
All successors of all k states
No
k states
Limited (within beam)
Local optimum (joint)
Stochastic Beam Search
Weighted random k successors
No
k states
Better than Local Beam
Heuristic
Genetic Algorithm
Crossover + mutation on population
Yes (indirectly via mutation/crossover)
Population k
Yes
Heuristic
Memetic GA
GA + per-child HC
Yes (via GA layer)
Population k
Yes
Heuristic
Quick algorithm chooser
Situation
Recommended algorithm
Smooth landscape, want fast greedy convergence
Hill-Climbing (steepest ascent)
Neighborhood too large to enumerate
First-Choice HC
Want some path diversity across runs
Stochastic HC
Many local optima, parallel resources available
Random-Restart (Parallel) HC
Many local optima, need to escape them in a single run
Simulated Annealing
Want coordinated multi-state search, share information
Local Beam Search (or Stochastic Beam if premature convergence is a risk)
Huge problem, want population diversity, fitness is easy to evaluate
Genetic Algorithm
GA but want each individual locally optimal
Memetic / Hybrid GA
Permutation problem (TSP) with GA
GA + uniform-order crossover (not naive 1-point)
Need to prove a state space has no solution
None of these — local search is incomplete. Use systematic search.
One-sentence summary of each algorithm
Hill-Climbing — always jump to the best neighbour; stop at any local optimum.
Stochastic HC — pick a random strictly-improving neighbour; still stuck at plateaus per slides.
First-Choice HC — accept the first improving neighbour; great for huge neighbourhoods.
Random-Restart (Parallel) HC — n independent HCs from random starts; P(success) → 1 as n → ∞.
Simulated Annealing — accept downhill with exp(Δ/T); T cools from exploratory to greedy.
Local Beam Search — k states share a successor pool and the top k advance; coordinated, not independent.
Stochastic Beam Search — same but sample successors with probability weighted by value; avoids premature convergence.
Search algorithms split into two broad families. The lecture session above focuses on local search; the hub material below ties it back into the wider picture of classical (path-finding) search.
Basics of Search Algorithms
Search Problem: Defined as a 4-tuple consisting of:
Search space (state space)
Successor relation (transition relation)
Initial state
Goal predicate that ends the search
Optional: Transition costs
Solution: A sequence of states from the initial state to a goal state.
Search Algorithm: Takes a search problem as input and returns a solution or failure indication by creating a search tree over the state space graph.
State Space Representation: As a graph with nodes as states and directed edges as actions.
Classical Search
Goal: Find a sequence of actions from an initial state to a goal state.
📖 BFS / DFS / Uniform-Cost / A* are defined properly in Uninformed Search — mechanics, completeness/optimality/memory comparison, and the contrast to the local-search methods in this note.
Comparison of Classical and Local Search
Classical Search
Local Search
Goal
Find a sequence of actions leading from an initial state to a goal state
Find a “nearly” achievable/optimal state
Focus
Sequence of actions
State
Application
When the sequence of actions is important
When the sequence of actions is not important and the search space is too large for exhaustive search
Search space
Considers the entire search space
Considers surrounding states and selects the best one to proceed
Optimization
Finds an optimal solution if one exists
Often finds a suboptimal solution, but can also provide results for inconsistent problems
Complexity
Can be computationally intensive in large search spaces
Can be faster as it does not search the entire space
State Space and Search Tree
The state space can be represented as a graph where nodes represent states and directed edges represent actions.
The search tree corresponds to a tree where each node corresponds to a state in the state space, and edges represent actions. The root of the tree corresponds to the initial state.
The search tree is generated by adding the successors of a state starting from an initial state until a solution is found.
The size of the search space is generally exponential.
Summary
The choice of search algorithm heavily depends on the nature of the problem. When the path to the solution is important, classical search is preferred. If only the solution is important or the search space is too large for classical search, local search might be the better choice. The size of the search space can be very large for both approaches, making heuristics and termination conditions crucial.
Where classical search is used today
A* in video games — nearly every game with NPCs navigating a map uses A* or a variant (HPA*, Theta*). The industry-standard pathfinding algorithm.
Google Maps / GPS routing — modern routers use Contraction Hierarchies (bidirectional Dijkstra with precomputed shortcuts), but A* and ALT (A* with Landmarks) remain in the toolkit for dynamic traffic conditions.
Robotic motion planning — A* on configuration-space grids is the baseline for collision-free robot-arm and mobile-robot paths. Used in factory robots (Fanuc, ABB) and autonomous vehicles.
Package managers — npm, cargo, pip use DFS variants for dependency resolution.
Social network analysis — BFS for “degrees of separation” (Facebook’s friend recommendations historically used BFS-based scoring).
Web crawlers — Google’s crawler uses prioritized BFS/DFS to discover pages.
Chess and Go engines (classical) — Minimax with alpha-beta pruning is a classical search variant. Modern engines (Stockfish, AlphaZero) use MCTS, but alpha-beta is still inside Stockfish’s NNUE evaluation.
SAT solvers — modern SAT solvers (MiniSat, Glucose, Z3) use CDCL (Conflict-Driven Clause Learning), a variant of backtracking search with learning.
Why classical search survives: when you need the optimal path (not just any solution), and the state space is structured enough to apply admissible heuristics — A* with a good heuristic is unbeatable for shortest-path-style problems.
Where classical search was replaced — and by what
Domain
Was classical search, now …
Why
Game AI for complex games (Go, complex strategy)
MCTS + neural networks (AlphaZero, MuZero)
Alpha-beta exploded combinatorially on Go (branching factor ~250). MCTS samples promising lines; neural value/policy nets replace handcrafted heuristics.
LLMs suggest plausible next tactics, pruning the search tree massively.
Where classical search still stands alone: when (a) the state space is small enough to enumerate, (b) you need a provably optimal solution with admissibility guarantees, and (c) the heuristic is well-understood. Pathfinding on a 2D grid will never need a neural network — A* solves it optimally in microseconds.