Search Algorithms

chatbot methods-of-ai

Companion files for exam prep:

📋 lernzettel_local-search_30-04-26 — compact 1-page exam sheet

✏️ quiz_local-search_30-04-26 — 8 Qs, mixed difficulty

✏️ quiz_local-search_2026-04-30 — additional quiz

Core Ideas
Glossary
Continuous vs. Discrete state spaces
Neighborhood-size tradeoff
Cooling schedules & Boltzmann intuition for SA
Local Beam Search vs. Parallel Hill-Climbing
Genetic Algorithms: representation & operators
Formulas & Notation
Common Exam Traps
ALGORITHMS ⭐ — pseudocode for all variants
- [[#1-hill-climbing-steepest-ascent|Hill-Climbing (steepest) ⭐]] · Stochastic HC · First-Choice HC · Random-Restart / Parallel HC
- [[#5-simulated-annealing-sa-|Simulated Annealing ⭐]]
- Local Beam Search · Stochastic Beam Search
- [[#8-genetic-algorithm-simple|Genetic Algorithm]] · Memetic / Hybrid GA
Algorithm comparison tables (pros & cons)
Quick algorithm chooser · One-sentence summary of each algorithm
Related Q&A & Notes
Broader context: Classical vs. Local Search
Where classical search is used today · Where classical search was replaced — and by what
See also

Core Ideas

Local search keeps only the current state — no path history, no frontier. Goal: find a good state, not a route.
Works by moving to neighboring states according to an objective function f(s) to maximize (minimize via −f(s)).
Useful when the solution is the state (e.g. 8-Queens, TSP, scheduling), not the path to it.
All variants trade exploration (escape local optima) against exploitation (climb toward improvement).
Plain Hill-Climbing only finds local optima; SA, beam search and GAs add randomness or population to escape them.

Glossary — important Local Search vocabulary ⭐

These are the words you will see in exam questions. Most are illustrated with the 8-Queens example (state = one queen per column).

The building blocks

State — one complete configuration (not a partial assignment, not a path).

8-Queens example: one full board with one queen per column, e.g. (8,3,7,4,2,5,1,6).

State space — set of all configurations the algorithm could be in. Usually exponentially large.

Neighborhood neighbors(s) — the states reachable from s in one step. The choice of neighborhood is a design decision.

8-Queens large neighborhood: move any queen to any other field in the same column ⇒ 8 × 7 = 56 neighbours per state.
8-Queens small (toroidal) neighborhood: move any queen down by one field ⇒ 8 neighbours per state.

Objective function f(s) (a.k.a. value, fitness) — the number we want to maximize.

8-Queens: negative number of attacking queen pairs (we maximize toward 0), or 28 − attacks for a non-negative GA fitness.

Properties of states

Local optimum — a state where no neighbour is strictly better. Plain HC stops here regardless of whether it’s a global optimum.

Global optimum — best state in the whole state space.

Plateau — a flat region: many neighbours have the same value as the current state.

Ridge — a sequence of local maxima where no single neighbour improves, but a longer detour would.

Search-strategy vocabulary

Uphill / downhill move — moving to a neighbour with higher / lower value (assuming maximization).

Exploration vs. exploitation — willingness to take downhill / random moves vs. greediness about improvement.

Temperature T — in SA, scales the willingness to take downhill moves. High T → almost random walk; T → 0 → behaves like First-Choice HC.

Cooling schedule schedule(t) — function that decreases T over time. Slides mention Stepwise Linear and Delayed Stepwise Linear only.

Beam — in beam search, the set of k currently active states. All k beam states share the same successor pool each round.

Genetic-algorithm vocabulary

Population — current set of k states (chromosomes).

Chromosome — a state encoded as a string of genes (e.g. 83742516 for an 8-Queens board).

Fitness — the GA name for the objective function. Selection probability scales with fitness.

Reproduction / Crossover — combine two parent chromosomes into one (or two) children.

1-point crossover: pick a split point, swap tails.
Uniform-order crossover: for permutations — fill child by binary template from parent 1, fill remaining slots in the order they appear in parent 2.

Mutation — randomly flip one gene with small probability. Maintains diversity.

Memetic algorithm — GA + a local-search step (e.g. HC) applied to each new child.

Continuous vs. Discrete state spaces

Hill-Climbing is the discrete analogue of gradient ascent. The key differences (slide 10):

Aspect	Continuous space	Discrete space
Neighborhood	ε-neighborhood (uncountably many)	Discrete, often finite
Best step	Gradient = direction of steepest ascent	Found by enumeration of neighbours
Cost per step	Compute gradient	Enumerate & compare neighbours

⚠️ In discrete spaces, the state space is typically exponentially large, but the neighborhood at any single state is small — that’s why local search works at all.

Neighborhood-size tradeoff

A design choice with direct consequences (slide 19):

Small neighborhood	Large neighborhood
✅ Faster to enumerate per step	❌ Slower to enumerate per step
❌ Greater risk of getting stuck in local optima	✅ Smaller risk of getting stuck in local optima

8-Queens illustration: the toroidal “move-one-down” neighborhood is only 8 states per step (cheap) but offers very few escape routes. The “any-queen-to-any-row” neighborhood is 56 states per step (expensive) but offers many escape routes.

Cooling schedules & Boltzmann intuition for SA

Acceptance probability for a downhill move: P = exp(Δ/T), where Δ = value(neighbor) − value(current) < 0 and T > 0.

Because Δ ≤ 0 for downhill, 0 ≤ exp(Δ/T) ≤ 1 always.

Effect of Δ (at fixed T = 1):

Δ	exp(Δ/T)
0	1.00
−1	0.37
−2	0.14

Effect of T (at fixed Δ = −2):

T	exp(Δ/T)
∞	1.00
100	0.98
10	0.82
1	0.14
0.1	2 × 10⁻⁹ ≈ 0

So at high T SA behaves like a random walk (almost everything accepted); at low T it behaves like First-Choice HC (only improvements accepted). The temperature interpolates smoothly between exploration and exploitation.

Cooling schedules in the slides (slide 27):

Stepwise Linear — start with arbitrary T, decrease by a constant c every step.
Delayed Stepwise Linear — start with arbitrary T, decrease by c every k-th step.

⚠️ Exponential cooling (T ← α·T) is a common textbook choice but is NOT in these slides. Don’t cite it in the exam.

Effect of schedule speed: A slower cooling schedule increases the probability of finding a high-quality solution — but increases runtime. The slides do NOT claim a global-optimum guarantee. (The Geman & Geman log-cooling theorem is a textbook fact from R&N, not from these slides.)

🐍 Figure — Simulated Annealing acceptance probability exp(Δ/T) across a cooling schedule

# Pyodide / Obsidian Execute Code: install matplotlib first.
import micropip
await micropip.install("matplotlib")
 
import matplotlib.pyplot as plt
import numpy as np
 
# Cooling schedule: T decreasing from 10 to 0.1 (linear in log space for clarity)
T = np.linspace(10.0, 0.1, 400)
 
deltas = [-1, -5, -10]
colors = {"-1": "#27ae60", "-5": "#f39c12", "-10": "#c0392b"}
 
fig, ax = plt.subplots(figsize=(9, 5.5))
 
for d in deltas:
    P = np.exp(d / T)
    ax.plot(T, P, lw=2.5, color=colors[str(d)],
            label=f"Δ = {d}")
 
# Reference markers: T values from the slides' table
for Tref, col in [(10, "#27ae60"), (1, "#f39c12"), (0.1, "#c0392b")]:
    ax.axvline(Tref, color=col, ls=":", lw=1, alpha=0.5)
 
ax.set_xlabel("Temperature T  (cooling →)", fontsize=11)
ax.set_ylabel("Acceptance probability  exp(Δ/T)", fontsize=11)
ax.set_title("Simulated Annealing — acceptance probability for downhill moves\n"
             "High T = exploration (almost all moves accepted) · "
             "Low T = exploitation (only improvements)",
             fontsize=11)
ax.set_xscale("log")
ax.invert_xaxis()  # T decreases over time → time flows left→right
ax.set_ylim(-0.02, 1.05)
ax.grid(True, alpha=0.3)
ax.legend(loc="upper right", fontsize=11, framealpha=0.95,
          title="Energy change of move")
 
ax.text(8, 0.05, "exploration\n(random walk)",
        ha="center", fontsize=9, color="#27ae60", style="italic")
ax.text(0.15, 0.85, "exploitation\n(greedy HC)",
        ha="center", fontsize=9, color="#c0392b", style="italic")
 
plt.tight_layout(); plt.show()

What this graph shows. Acceptance probability P = exp(Δ/T) for three downhill move sizes (Δ = −1 mild, −5 medium, −10 severe) as the temperature T cools from 10 down to 0.1. Time flows left → right (T decreasing).

Things to notice.

At high T (left), even severe Δ = −10 moves are accepted with ~37% probability — the search behaves like an almost-random walk.
At low T (right), only Δ = −1 still has a tiny chance; Δ = −10 collapses to effectively zero — the search has frozen into First-Choice HC behaviour.
The gap between the curves widens as T cools: temperature selectively filters out bigger downhill moves first. This is exactly the exploration→exploitation interpolation Boltzmann gives us for free.

Local Beam Search vs. Parallel Hill-Climbing

Single most common exam trap. Both maintain k states; they differ in what gets compared.

	Parallel Hill-Climbing	Local Beam Search
Per iteration	Each of the k states independently picks its own best neighbor	Pool ALL neighbours of ALL k states, then pick the global top k
Information sharing	None — k truly independent HCs	States compete for slots → search concentrates on the most promising region
Risk	Same as HC per chain (k local optima)	Premature convergence: all k states may collapse into one region of the space

Why it matters. Local Beam Search is not “k HCs in parallel” — it is one coordinated search where good states crowd out bad ones. Stochastic Beam Search fixes the early-convergence issue by sampling the k successors with probability proportional to value.

Genetic Algorithms representation & operators

A GA needs four design choices (slide 42):

Representation — encode each state as a chromosome (string of genes).
- 8-Queens: string of 8 digits from {1,…,8}, i-th digit = row of i-th queen.
- TSP on {0,1,2,3,4}: string of 4 digits (start fixed at 0), each digit = next node visited.
Fitness function — non-negative version of the objective.
- 8-Queens: 28 − attacking pairs.
- TSP: upper_bound − tour_cost (e.g. 35 − cost).
Reproduction (crossover) rule — combine two parents.
- General: 1-point crossover.
- Permutations: naive crossover yields invalid solutions (duplicate / missing nodes) → use uniform-order crossover or a repair operation.
Mutation rule — random small change.
- 8-Queens: replace one digit with a random row.
- TSP: swap two genes.

Fitness-proportionate selection. Parents are picked with probability proportional to fitness — better individuals reproduce more often, but worse ones still survive sometimes (preserving diversity).

Uniform-order crossover (for permutations):

Pick two parents.
Generate a random binary template the length of the chromosome.
Child 1: at every 1-position, copy parent 1’s gene. Fill the remaining slots with the missing values in the order they appear in parent 2.
Child 2: swap parent roles.

Formulas & Notation

Symbol	Meaning
`f(s)`	Objective function / value of state s
`neighbors(s)`	Set of states reachable in one step
`Δ = f(s') − f(s)`	Value change (negative = downhill)
`exp(Δ/T)`	SA acceptance probability for a downhill move
`schedule(t)`	Cooling function (Stepwise Linear / Delayed Stepwise Linear)
`T`	Temperature (SA)
`k`	Beam width (Beam Search) / population size (GA)
`fitness(c)`	GA fitness of chromosome c

Common Exam Traps ⚠️

Local Beam Search ≠ k parallel HCs. In beam search all k states share their successor pool and compete; in parallel HC each chain is independent.
Plain HC never accepts a downhill move. Stochastic HC (per slides) requires a strictly improving neighbour too — so it also does not escape plateaus by itself. Only SA and (indirectly) GAs accept worse states.
Random-restart HC is called “Random-restart (or parallel) Hill-Climbing” in the slides (slide 22). It guarantees finding the global optimum only as n → ∞ — not with a finite number of restarts.
Cooling schedule slow-down: the slides only claim it increases the probability of a high-quality solution — they do not state a global-optimum guarantee. The slow-cooling global-optimum theorem (Geman & Geman) is R&N, not these slides.
Cooling schedule list: slides give only Stepwise Linear and Delayed Stepwise Linear. Do not write “exponential cooling” in an exam answer about this lecture.
Acceptance probability sign: in exp(Δ/T) the Δ for downhill is negative, so the exponent is non-positive and the probability lies in (0, 1].
Neighborhood-size tradeoff: small = fast per step but more local optima; large = slower per step but fewer local optima.
Permutation crossover: naive 1-point crossover on a permutation produces invalid solutions (duplicates / missing values). Use uniform-order crossover or a repair operation.
First-Choice HC is useful when the neighborhood is too large to enumerate — it picks the first improving neighbour rather than the best.
SA at high T behaves like random walk; at low T like First-Choice HC. This interpolation is the whole point of the temperature schedule.

ALGORITHMS (full reference) ⭐

Pseudocode + intuition + complexity for every algorithm you might trace by hand in the exam. Organised by family:

Hill-Climbing family: #1 Steepest · #2 Stochastic · #3 First-Choice · #4 Random-Restart / Parallel

Annealing: #5 Simulated Annealing

Beam family: #6 Local Beam · #7 Stochastic Beam

Population family: #8 Genetic Algorithm · #9 Memetic / Hybrid

1. Hill-Climbing (steepest ascent)

Intuition. At each step, jump to the neighbour with the highest value. Stop when no neighbour improves.

HILL-CLIMBING(start):
    current ← start
    while not termination():
        neighbor ← argmax over n ∈ neighbors(current) of value(n)
        if value(neighbor) ≤ value(current):
            return current        # local optimum / plateau
        current ← neighbor

Complexity per step. O(|neighbors(s)|) for the argmax. Number of steps depends on the landscape.

Pros. ✅ Trivial to implement. ✅ Constant memory (just the current state).
Cons. ❌ Gets stuck at any local optimum, plateau, or ridge. ❌ No way to escape.

Termination. Naturally on a local maximum, or on a bound (max steps / wall-clock time).

🐍 Figure — Hill-Climbing on a multi-modal 1D landscape (why HC gets stuck)

# Pyodide / Obsidian Execute Code: install matplotlib first.
import micropip
await micropip.install("matplotlib")
 
import matplotlib.pyplot as plt
import numpy as np
 
# Multi-modal landscape with 3 peaks (global ≈ x=7.0)
def f(x):
    return (np.sin(x) * 2.0
            + np.sin(2.1 * x + 0.5) * 1.4
            + np.exp(-((x - 7.0) ** 2) / 4.0) * 3.0
            - 0.05 * (x - 5) ** 2)
 
xs = np.linspace(0, 12, 1000)
ys = f(xs)
 
# Discrete hill-climbing: step size 0.15, compare left/right neighbour
def hill_climb(start, step=0.15, max_iters=200):
    x = start
    traj = [x]
    for _ in range(max_iters):
        left, right = x - step, x + step
        best = max([left, x, right], key=f)
        if best == x or best < 0 or best > 12:
            break
        x = best
        traj.append(x)
    return np.array(traj)
 
starts = [0.5, 3.5, 5.8, 9.5]
colors = ["#e74c3c", "#f39c12", "#27ae60", "#3498db"]
 
fig, ax = plt.subplots(figsize=(10, 5.5))
ax.plot(xs, ys, color="#2c3e50", lw=2.5, label="objective f(x)")
ax.fill_between(xs, ys, ys.min() - 0.5, color="#ecf0f1", alpha=0.5)
 
# Mark global maximum
gx = xs[np.argmax(ys)]; gy = ys.max()
ax.plot(gx, gy, "*", color="#f1c40f", markersize=22,
        markeredgecolor="black", markeredgewidth=1.2,
        label="global maximum", zorder=5)
 
for s, c in zip(starts, colors):
    traj = hill_climb(s)
    ax.plot(traj, [f(x) for x in traj], "o-", color=c, lw=1.5,
            markersize=5, alpha=0.85,
            label=f"start x={s} → end x={traj[-1]:.2f}")
    ax.plot(traj[0], f(traj[0]), "s", color=c, markersize=10,
            markeredgecolor="black", zorder=4)
    ax.plot(traj[-1], f(traj[-1]), "X", color=c, markersize=12,
            markeredgecolor="black", zorder=4)
 
ax.set_xlabel("x  (state)", fontsize=11)
ax.set_ylabel("f(x)  (objective)", fontsize=11)
ax.set_title("Hill-Climbing on a multi-modal landscape\n"
             "Squares = starts · ✕ = where HC got stuck · ★ = true global max",
             fontsize=11)
ax.legend(loc="lower right", fontsize=9, framealpha=0.95)
ax.grid(True, alpha=0.3)
plt.tight_layout(); plt.show()

What this graph shows. Four hill-climbing runs from different starting positions on the same multi-modal 1D landscape. Each chain greedily steps to whichever neighbour (x − step, x + step) is higher and stops when neither is.

Things to notice.

The green run (start x=5.8) is the lucky one — it climbs the basin of the global peak around x ≈ 7.
The red run (start x=0.5) climbs the first local maximum it finds and stops. It has no idea a much higher peak exists three valleys to the right.
The orange and blue runs illustrate the same trap from different sides — each finds its closest local optimum and freezes.
HC’s quality depends entirely on where you start. This is exactly the motivation for Random-Restart HC (try many starts), Simulated Annealing (occasionally accept downhill), and Genetic Algorithms (population diversity).

2. Stochastic Hill-Climbing

Intuition. Instead of the best neighbour, pick a random strictly improving neighbour. Optionally weight selection probability by how much better each neighbour is.

STOCHASTIC-HC(start):
    current ← start
    while not termination():
        candidates ← { n ∈ neighbors(current) : value(n) > value(current) }
        if candidates = ∅:
            return current        # no strictly improving neighbour
        neighbor ← random.choice(candidates)   # may weight by value
        current ← neighbor

Pros. ✅ Cheaper per step than steepest ascent (no full argmax needed). ✅ Adds some path diversity over multiple runs.
Cons. ❌ Per slides, still requires strictly improving neighbours — so it does NOT escape plateaus or local optima. ❌ Compromise between random search and Hill-Climbing.

3. First-Choice Hill-Climbing

Intuition. Generate neighbours one at a time and accept the first one that improves. Useful when the neighbourhood is too large to enumerate.

FIRST-CHOICE-HC(start):
    current ← start
    while not termination():
        for neighbor in random_order(neighbors(current)):
            if value(neighbor) > value(current):
                current ← neighbor
                break
        else:
            return current        # no improving neighbour found

Pros. ✅ O(1) per step when neighbourhoods are huge. ✅ Very simple.
Cons. ❌ Same getting-stuck problem as plain HC.

4. Random-Restart (or Parallel) Hill-Climbing

Intuition. Run n independent HCs from random starting states; return the best result. Each restart escapes a different local optimum.

RANDOM-RESTART-HC(n):
    best ← None
    for i in 1..n:
        s ← HILL-CLIMBING(random_initial_state())
        if best is None or value(s) > value(best):
            best ← s
    return best

Guarantee. P(finding global optimum) → 1 as n → ∞. With finite n, no guarantee.

Pros. ✅ Embarrassingly parallel. ✅ Each restart is cheap.
Cons. ❌ Independent restarts duplicate effort (no information shared). ❌ No global guarantee for finite n.

⚠️ The slides specifically call this “Random-restart (or parallel) Hill-Climbing” (slide 22).

🐍 Figure — 8-Queens: attacking-pairs count along Hill-Climbing trajectories

# Pyodide / Obsidian Execute Code: install matplotlib first.
import micropip
await micropip.install("matplotlib")
 
import matplotlib.pyplot as plt
import numpy as np
 
rng = np.random.default_rng(7)
 
N = 8  # 8-Queens; state = tuple of length 8, board[col] = row
 
def attacking_pairs(board):
    n = len(board); count = 0
    for i in range(n):
        for j in range(i + 1, n):
            if board[i] == board[j]: count += 1
            elif abs(board[i] - board[j]) == j - i: count += 1
    return count
 
def best_neighbour(board):
    # Large neighbourhood: move any queen to any other row in its column
    best, best_val = board, attacking_pairs(board)
    n = len(board)
    for col in range(n):
        for row in range(n):
            if row == board[col]: continue
            cand = list(board); cand[col] = row
            v = attacking_pairs(cand)
            if v < best_val:
                best, best_val = cand, v
    return best, best_val
 
def hill_climb(start, max_iter=50):
    board = list(start)
    traj = [attacking_pairs(board)]
    for _ in range(max_iter):
        nb, v = best_neighbour(board)
        if v >= traj[-1]: break   # local optimum (no strict improvement)
        board = nb; traj.append(v)
    return board, traj
 
# Run 5 random restarts
NUM_RUNS = 5
runs = []
for _ in range(NUM_RUNS):
    start = [int(r) for r in rng.integers(0, N, size=N)]
    final_board, traj = hill_climb(start)
    runs.append((start, final_board, traj))
 
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5.5),
                               gridspec_kw={"width_ratios": [1.4, 1]})
 
# --- Left: trajectories ---
palette = ["#27ae60", "#e74c3c", "#3498db", "#f39c12", "#9b59b6"]
for i, (s, f, traj) in enumerate(runs):
    marker = "★" if traj[-1] == 0 else "✗"
    status = "solved" if traj[-1] == 0 else f"stuck @ {traj[-1]}"
    ax1.plot(traj, "o-", color=palette[i], lw=2, markersize=6,
             label=f"run {i+1}: {status} {marker}")
 
ax1.axhline(0, color="#27ae60", ls=":", lw=1.5, alpha=0.7,
            label="goal: 0 attacks")
ax1.set_xlabel("HC iteration", fontsize=11)
ax1.set_ylabel("Attacking-queens pairs  (lower = better)", fontsize=11)
ax1.set_title("8-Queens — steepest-ascent HC from 5 random starts\n"
              "Some runs find a solution; others get stuck at a local optimum",
              fontsize=10.5)
ax1.grid(True, alpha=0.3)
ax1.legend(loc="upper right", fontsize=9, framealpha=0.95)
ax1.set_ylim(bottom=-0.5)
 
# --- Right: best final board ---
best_idx = min(range(NUM_RUNS), key=lambda i: runs[i][2][-1])
_, best_board, best_traj = runs[best_idx]
 
ax2.set_xlim(0, N); ax2.set_ylim(0, N); ax2.set_aspect("equal")
for r in range(N):
    for c in range(N):
        color = "#ecf0f1" if (r + c) % 2 == 0 else "#bdc3c7"
        ax2.add_patch(plt.Rectangle((c, r), 1, 1, facecolor=color))
for c, r in enumerate(best_board):
    ax2.text(c + 0.5, r + 0.5, "♛", ha="center", va="center",
             fontsize=26, color="#2c3e50")
 
status = "SOLVED (0 attacks)" if best_traj[-1] == 0 \
         else f"local optimum ({best_traj[-1]} attacks)"
ax2.set_title(f"Best of {NUM_RUNS} runs (run {best_idx+1})\n{status}",
              fontsize=10.5)
ax2.set_xticks([]); ax2.set_yticks([])
 
plt.tight_layout(); plt.show()

What these panels show. Left: the number of attacking-queens pairs (objective we want to drive to 0) along five independent steepest-ascent HC runs, each starting from a fresh random board. Right: the final board from the best run.

Things to notice.

Each curve monotonically decreases — steepest-ascent HC never accepts a worse neighbour.
Most curves flatline before reaching 0 — they hit a local optimum where no single-queen move reduces conflicts. Plain HC has no escape.
Random-Restart HC exploits exactly this: with enough restarts, at least one trajectory will dive all the way to 0 (the global optimum). Russell & Norvig report ~14% of random 8-Queens HCs solve the puzzle — so ~7 restarts give you a >95% success rate. This is the empirical version of “P(success) → 1 as n → ∞“.

5. Simulated Annealing (SA) ⭐

Intuition. Like HC, but accept downhill moves with probability exp(Δ/T). T decreases over time (cooling), so the search starts exploratory and becomes greedy.

SIMULATED-ANNEALING(start, schedule):
    current ← start
    for time in 1, 2, 3, ...:
        T ← schedule(time)
        if T == 0:
            return current
        neighbor ← random.choice(neighbors(current))
        Δ ← value(neighbor) − value(current)
        if Δ > 0:
            current ← neighbor
        else if random.bernoulli(exp(Δ/T)):
            current ← neighbor

Key properties.

High T: nearly all moves accepted ⇒ random walk (exploration).
Low T: only improving moves accepted ⇒ behaves like First-Choice HC (exploitation).
T → 0: search freezes at whatever state it’s in.

Cooling schedules (slides only):

Stepwise Linear — T ← T − c every step.
Delayed Stepwise Linear — T ← T − c every k-th step.

Guarantees per slides. A slower cooling schedule increases the probability of finding a high-quality solution. The slides do NOT claim a global-optimum guarantee. (Textbook R&N + Geman & Geman do, but only with log-cooling and infinite time — not in this lecture.)

Pros. ✅ Can escape local optima. ✅ Anytime — return the best state seen so far. ✅ Simple.
Cons. ❌ Performance depends strongly on the cooling schedule. ❌ No global guarantee in the slides’ setting. ❌ Random neighbour selection ignores neighbour quality.

6. Local Beam Search

Intuition. Maintain k states. At each step, generate all successors of all k states, then keep the best k overall.

LOCAL-BEAM-SEARCH(k):
    k_current ← { random_initial_state() for _ in range(k) }
    current_value ← max value over k_current
    while not termination():
        all_neighbors ← union(neighbors(c) for c in k_current)
        k_best ← argmax_top_k over all_neighbors of value
        neighbor_value ← max value over k_best
        if neighbor_value ≤ current_value:
            return argmax over k_current of value
        k_current, current_value ← k_best, neighbor_value

Key idea. ⚠️ NOT k independent HCs! All k states share their successors and compete — it’s a single coordinated search that concentrates on promising regions.

Pros. ✅ Information sharing across the k chains. ✅ Faster convergence than k independent HCs when one region is clearly best.
Cons. ❌ Premature convergence: all k beam slots can be taken by neighbours of the same region. ❌ Memory grows linearly with k.

7. Stochastic Beam Search

Intuition. Same skeleton as Local Beam Search, but pick the k next states probabilistically (probability increases with value) instead of deterministically taking the top k.

STOCHASTIC-BEAM-SEARCH(k):
    k_current ← { random_initial_state() for _ in range(k) }
    current_value ← max value over k_current
    while not termination():
        all_neighbors ← union(neighbors(c) for c in k_current)
        k_best ← random.choice(all_neighbors, count=k,
                               weight=value)             # higher value ⇒ higher P
        neighbor_value ← max value over k_best
        if neighbor_value ≤ current_value:
            return argmax over k_current of value
        k_current, current_value ← k_best, neighbor_value

Pros. ✅ Avoids the early-convergence problem of deterministic beam search. ✅ Balances diversity (exploration) and intensification.
Cons. ❌ Choice of weighting function matters. ❌ Slightly more bookkeeping.

8. Genetic Algorithm (simple)

Intuition. Maintain a population of k chromosomes. Each generation, repeatedly pick two parents (proportional to fitness), produce a child via crossover, mutate with small probability, then evict a low-fitness chromosome.

GENETIC-ALGORITHM(k, mutation_probability):
    population ← { random_chromosome() for _ in range(k) }
    while not termination(population):
        for i in 1..k:
            x, y ← random.choice(population, count=2, weight=fitness)
            child ← reproduce(x, y)            # e.g. 1-point crossover
            if random() < mutation_probability:
                child ← mutate(child)
            population ← population ∪ { child }
            remove_random_chromosome(population, weight=1/fitness)
    return argmax over population of fitness

Design choices — see Genetic Algorithms representation & operators for representation, fitness, crossover, mutation.

Pros. ✅ Population diversity → can escape many local optima. ✅ Very flexible — applies to any problem with a fitness function. ✅ Implicit parallel exploration of building blocks.
Cons. ❌ Many hyperparameters (k, mutation rate, selection scheme). ❌ Slow to converge precisely. ❌ Naive crossover breaks for permutations — must use uniform-order crossover or a repair operation.

🐍 Figure — Genetic Algorithm fitness convergence on OneMax (50 generations)

# Pyodide / Obsidian Execute Code: install matplotlib first.
import micropip
await micropip.install("matplotlib")
 
import matplotlib.pyplot as plt
import numpy as np
 
rng = np.random.default_rng(42)
 
# --- OneMax problem: maximize the number of 1s in a binary string ---
BITS         = 40   # chromosome length (max fitness = 40)
POP_SIZE     = 30
GENERATIONS  = 50
MUT_RATE     = 0.02
 
def fitness(chrom): return int(chrom.sum())
 
# Initial random population (very low fitness expected ~ BITS/2)
pop = rng.integers(0, 2, size=(POP_SIZE, BITS))
 
best_hist, mean_hist, worst_hist = [], [], []
 
def select_parent(pop, fits):
    # Fitness-proportionate selection (roulette wheel)
    probs = fits / fits.sum() if fits.sum() > 0 else None
    idx = rng.choice(len(pop), p=probs)
    return pop[idx].copy()
 
for gen in range(GENERATIONS):
    fits = np.array([fitness(c) for c in pop])
    best_hist.append(fits.max())
    mean_hist.append(fits.mean())
    worst_hist.append(fits.min())
 
    new_pop = []
    for _ in range(POP_SIZE):
        p1 = select_parent(pop, fits)
        p2 = select_parent(pop, fits)
        # 1-point crossover
        cut = rng.integers(1, BITS)
        child = np.concatenate([p1[:cut], p2[cut:]])
        # Mutation: flip each bit with prob MUT_RATE
        flips = rng.random(BITS) < MUT_RATE
        child = np.where(flips, 1 - child, child)
        new_pop.append(child)
    pop = np.array(new_pop)
 
gens = np.arange(GENERATIONS)
fig, ax = plt.subplots(figsize=(9, 5.5))
ax.plot(gens, best_hist,  color="#27ae60", lw=2.5, label="best fitness")
ax.plot(gens, mean_hist,  color="#3498db", lw=2.0, label="mean fitness")
ax.plot(gens, worst_hist, color="#e74c3c", lw=1.5, label="worst fitness",
        linestyle="--")
ax.fill_between(gens, worst_hist, best_hist, color="#3498db", alpha=0.1)
 
ax.axhline(BITS, color="#f39c12", lw=1.2, ls=":",
           label=f"global optimum = {BITS}")
 
ax.set_xlabel("Generation", fontsize=11)
ax.set_ylabel("Fitness  (number of 1s in chromosome)", fontsize=11)
ax.set_title(f"Genetic Algorithm on OneMax — population={POP_SIZE}, "
             f"chromosome={BITS} bits, mutation={MUT_RATE}\n"
             "Best/mean climb steadily; worst stays low due to mutation + diversity",
             fontsize=10.5)
ax.grid(True, alpha=0.3)
ax.legend(loc="lower right", fontsize=10, framealpha=0.95)
ax.set_ylim(0, BITS + 2)
plt.tight_layout(); plt.show()
 
print(f"Final best : {best_hist[-1]}/{BITS}")
print(f"Final mean : {mean_hist[-1]:.1f}/{BITS}")
print(f"Final worst: {worst_hist[-1]}/{BITS}")

What this graph shows. Per-generation best, mean and worst fitness of a tiny GA solving the OneMax problem (maximise the number of 1s in a 40-bit string; global optimum = 40). Population of 30 chromosomes, 1-point crossover, mutation rate 2 %.

Things to notice.

The best fitness rises sharply in the first ~15 generations, then asymptotes near the global optimum — the classic GA convergence curve.
The mean tracks the best with a small lag — fitness-proportionate selection pulls the whole population toward high-fitness individuals.
The worst stays well below the best for the entire run. This is healthy — it shows mutation and crossover keep producing low-fitness outliers, preserving the diversity that lets the GA escape local optima.
If the gap between best/worst collapsed to zero, the population would be homogenised (everyone in the same basin) — the GA’s main failure mode.

9. Memetic Hybrid GA

Intuition. A Genetic Algorithm where each newly created child is immediately refined by a local search (typically HC) until it reaches a local optimum.

MEMETIC-ALGORITHM(k, mutation_probability):
    population ← { random_chromosome() for _ in range(k) }
    while not termination(population):
        for i in 1..k:
            x, y ← random.choice(population, count=2, weight=fitness)
            child ← reproduce(x, y)
            if random() < mutation_probability:
                child ← mutate(child)
            child ← HILL-CLIMBING(child)        # local refinement
            population ← population ∪ { child }
            remove_random_chromosome(population, weight=1/fitness)
    return argmax over population of fitness

Pros. ✅ Each individual is locally optimal → GA explores between local optima rather than the raw state space. ✅ Faster convergence in practice.
Cons. ❌ Each generation is much more expensive (HC per child). ❌ Risk of homogeneous population (everyone in the same basin).

Algorithm comparison table — pros & cons

Hill-Climbing family

Algorithm	What it does	Pros	Cons
Hill-Climbing (steepest)	Always jump to best neighbour	✅ Trivial ✅ O(1) memory	❌ Stuck at any local optimum / plateau / ridge
Stochastic HC	Random strictly-improving neighbour	✅ Cheaper than full argmax ✅ Run-to-run diversity	❌ Still requires strict improvement (per slides) → does NOT escape plateaus
First-Choice HC	First improving neighbour found	✅ Great when neighbourhood is huge	❌ Same getting-stuck problem
Random-Restart HC	n independent HCs from random starts	✅ Embarrassingly parallel ✅ P(success) → 1 as n → ∞	❌ Independent restarts duplicate work ❌ No finite-n guarantee

Annealing & Beam family

Algorithm	What it does	Pros	Cons
Simulated Annealing	Accept downhill with `exp(Δ/T)`; T cools over time	✅ Escapes local optima ✅ Anytime	❌ Schedule-sensitive ❌ No global guarantee in slides
Local Beam Search	k states, share successor pool, keep best k	✅ Information sharing across chains	❌ Premature convergence to one region
Stochastic Beam Search	k states, sample successors weighted by fitness	✅ Fixes beam search’s convergence problem	❌ Weighting design matters

Population family

Algorithm	What it does	Pros	Cons
Genetic Algorithm	Population + crossover + mutation	✅ Very flexible ✅ Implicit parallel exploration	❌ Many hyperparameters ❌ Slow to fine-tune ❌ Naive crossover fails for permutations
Memetic / Hybrid GA	GA + HC on every new child	✅ Each individual is locally optimal ✅ Faster convergence	❌ Expensive per generation ❌ Risk of population homogenization

Big-picture comparison

Algorithm	Explores via	Downhill?	Memory	Escapes local optima?	Guarantees
Hill-Climbing	Greedy best neighbour	No	Current state	No	Local optimum
Stochastic HC	Random uphill	No	Current state	No (slides require strict improvement)	Local optimum
First-Choice HC	First uphill found	No	Current state	No	Local optimum
Random-Restart HC	Multiple independent runs	No	Current state per chain	Yes (across restarts)	P(global) → 1 as n → ∞
Simulated Annealing	Random neighbour, downhill OK with `exp(Δ/T)`	Yes (prob.)	Current state + T	Yes	Slides: slower cooling ⇒ higher probability of high-quality solution. No global guarantee in slides.
Local Beam Search	All successors of all k states	No	k states	Limited (within beam)	Local optimum (joint)
Stochastic Beam Search	Weighted random k successors	No	k states	Better than Local Beam	Heuristic
Genetic Algorithm	Crossover + mutation on population	Yes (indirectly via mutation/crossover)	Population k	Yes	Heuristic
Memetic GA	GA + per-child HC	Yes (via GA layer)	Population k	Yes	Heuristic

Quick algorithm chooser

Situation	Recommended algorithm
Smooth landscape, want fast greedy convergence	Hill-Climbing (steepest ascent)
Neighborhood too large to enumerate	First-Choice HC
Want some path diversity across runs	Stochastic HC
Many local optima, parallel resources available	Random-Restart (Parallel) HC
Many local optima, need to escape them in a single run	Simulated Annealing
Want coordinated multi-state search, share information	Local Beam Search (or Stochastic Beam if premature convergence is a risk)
Huge problem, want population diversity, fitness is easy to evaluate	Genetic Algorithm
GA but want each individual locally optimal	Memetic / Hybrid GA
Permutation problem (TSP) with GA	GA + uniform-order crossover (not naive 1-point)
Need to prove a state space has no solution	None of these — local search is incomplete. Use systematic search.

One-sentence summary of each algorithm

Hill-Climbing — always jump to the best neighbour; stop at any local optimum.
Stochastic HC — pick a random strictly-improving neighbour; still stuck at plateaus per slides.
First-Choice HC — accept the first improving neighbour; great for huge neighbourhoods.
Random-Restart (Parallel) HC — n independent HCs from random starts; P(success) → 1 as n → ∞.
Simulated Annealing — accept downhill with exp(Δ/T); T cools from exploratory to greedy.
Local Beam Search — k states share a successor pool and the top k advance; coordinated, not independent.
Stochastic Beam Search — same but sample successors with probability weighted by value; avoids premature convergence.
Genetic Algorithm — population + fitness-weighted reproduction + crossover + mutation.
Memetic / Hybrid GA — GA with a local-search refinement on every new child.

Quizzes

quiz_local-search_30-04-26 — 8 Qs, mixed difficulty
quiz_local-search_2026-04-30 — additional quiz

Targeted exam questions in Questions for Methods of AI

Q1–16 (basic) · Q85 (classical vs. local search) · Q90 (Simulated Annealing) · Q105–109 (deep / exam-trap: exp(Δ/T), Beam Search edge cases, Parallel HC vs. Beam, stalled GA, convergence guarantees)

Atomic notes

Hill Climbing · Local Beam Search · Simulated Annealing · Temperature in neighbour selection · Genetic Algorithms · Paper Review Genetic Algorithms · Stochastic Diffusion Search · cul-de-sac obstacles

Broader context — Classical vs. Local Search

Search algorithms split into two broad families. The lecture session above focuses on local search; the hub material below ties it back into the wider picture of classical (path-finding) search.

Basics of Search Algorithms

Search Problem: Defined as a 4-tuple consisting of:
- Search space (state space)
- Successor relation (transition relation)
- Initial state
- Goal predicate that ends the search
- Optional: Transition costs
Solution: A sequence of states from the initial state to a goal state.
Search Algorithm: Takes a search problem as input and returns a solution or failure indication by creating a search tree over the state space graph.
State Space Representation: As a graph with nodes as states and directed edges as actions.

Classical Search

Goal: Find a sequence of actions from an initial state to a goal state.
Categories:
- Uninformed (blind) search:
  - Random search
  - Systematic search: Depth-first search (DFS), Breadth-first search (BFS); variations: Uniform-cost search, Depth-limited search.
- Informed (heuristic) search: Greedy best-first search, A*-search.

📖 BFS / DFS / Uniform-Cost / A* are defined properly in Uninformed Search — mechanics, completeness/optimality/memory comparison, and the contrast to the local-search methods in this note.

Comparison of Classical and Local Search

	Classical Search	Local Search
Goal	Find a sequence of actions leading from an initial state to a goal state	Find a “nearly” achievable/optimal state
Focus	Sequence of actions	State
Application	When the sequence of actions is important	When the sequence of actions is not important and the search space is too large for exhaustive search
Search space	Considers the entire search space	Considers surrounding states and selects the best one to proceed
Optimization	Finds an optimal solution if one exists	Often finds a suboptimal solution, but can also provide results for inconsistent problems
Complexity	Can be computationally intensive in large search spaces	Can be faster as it does not search the entire space

State Space and Search Tree

The state space can be represented as a graph where nodes represent states and directed edges represent actions.
The search tree corresponds to a tree where each node corresponds to a state in the state space, and edges represent actions. The root of the tree corresponds to the initial state.
The search tree is generated by adding the successors of a state starting from an initial state until a solution is found.
The size of the search space is generally exponential.

Summary

The choice of search algorithm heavily depends on the nature of the problem. When the path to the solution is important, classical search is preferred. If only the solution is important or the search space is too large for classical search, local search might be the better choice. The size of the search space can be very large for both approaches, making heuristics and termination conditions crucial.

Where classical search is used today

A* in video games — nearly every game with NPCs navigating a map uses A* or a variant (HPA*, Theta*). The industry-standard pathfinding algorithm.
Google Maps / GPS routing — modern routers use Contraction Hierarchies (bidirectional Dijkstra with precomputed shortcuts), but A* and ALT (A* with Landmarks) remain in the toolkit for dynamic traffic conditions.
Robotic motion planning — A* on configuration-space grids is the baseline for collision-free robot-arm and mobile-robot paths. Used in factory robots (Fanuc, ABB) and autonomous vehicles.
Package managers — npm, cargo, pip use DFS variants for dependency resolution.
Social network analysis — BFS for “degrees of separation” (Facebook’s friend recommendations historically used BFS-based scoring).
Web crawlers — Google’s crawler uses prioritized BFS/DFS to discover pages.
Chess and Go engines (classical) — Minimax with alpha-beta pruning is a classical search variant. Modern engines (Stockfish, AlphaZero) use MCTS, but alpha-beta is still inside Stockfish’s NNUE evaluation.
SAT solvers — modern SAT solvers (MiniSat, Glucose, Z3) use CDCL (Conflict-Driven Clause Learning), a variant of backtracking search with learning.

Why classical search survives: when you need the optimal path (not just any solution), and the state space is structured enough to apply admissible heuristics — A* with a good heuristic is unbeatable for shortest-path-style problems.

Where classical search was replaced — and by what

Domain	Was classical search, now …	Why
Game AI for complex games (Go, complex strategy)	MCTS + neural networks (AlphaZero, MuZero)	Alpha-beta exploded combinatorially on Go (branching factor ~250). MCTS samples promising lines; neural value/policy nets replace handcrafted heuristics.
Very large road-network routing	Contraction Hierarchies (CH), Customizable Route Planning (CRP)	CH precomputes shortcuts → orders of magnitude faster than plain A* for continent-scale graphs. Used by Google Maps, Bing Maps, OSRM.
Robot motion planning in high-DOF spaces	*Sampling-based planners: RRT, RRT, PRM**	A* on a grid fails when the configuration space is 7-dimensional (a robot arm). Sampling planners scale better with dimension.
Speech recognition decoding (formerly Viterbi/A*)	End-to-end neural decoders (CTC, RNN-T, Whisper)	Neural decoders learn alignment + language model jointly; classical decoding pipelines are mostly retired.
Information retrieval (formerly DFS/BFS through inverted indexes)	Dense retrieval with transformer embeddings (ColBERT, DPR)	Semantic similarity beats keyword traversal for most user queries.
Theorem proving (formerly tree search + heuristics)	LLM-guided proof search (AlphaProof, Lean Copilot)	LLMs suggest plausible next tactics, pruning the search tree massively.

Where classical search still stands alone: when (a) the state space is small enough to enumerate, (b) you need a provably optimal solution with admissibility guarantees, and (c) the heuristic is well-understood. Pathfinding on a 2D grid will never need a neural network — A* solves it optimally in microseconds.

Brain Online

Explorer

Search Algorithms

Search Algorithms

Table of contents

Core Ideas

Glossary — important Local Search vocabulary ⭐

The building blocks

Properties of states

Search-strategy vocabulary

Genetic-algorithm vocabulary

Continuous vs. Discrete state spaces

Neighborhood-size tradeoff

Cooling schedules & Boltzmann intuition for SA

Local Beam Search vs. Parallel Hill-Climbing

Genetic Algorithms representation & operators

Formulas & Notation

Common Exam Traps ⚠️

ALGORITHMS (full reference) ⭐

1. Hill-Climbing (steepest ascent)

2. Stochastic Hill-Climbing

3. First-Choice Hill-Climbing

4. Random-Restart (or Parallel) Hill-Climbing

5. Simulated Annealing (SA) ⭐

6. Local Beam Search

7. Stochastic Beam Search

8. Genetic Algorithm (simple)

9. Memetic Hybrid GA

Algorithm comparison table — pros & cons

Hill-Climbing family

Annealing & Beam family

Population family

Big-picture comparison

Quick algorithm chooser

One-sentence summary of each algorithm

Related Q&A & Notes

Broader context — Classical vs. Local Search

Basics of Search Algorithms

Classical Search

Comparison of Classical and Local Search

State Space and Search Tree

Summary

Where classical search is used today

Where classical search was replaced — and by what

See also

Sources:

Quellen

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis