Search Algorithms

chatbot methods-of-ai

Companion files for exam prep:

Table of contents

  1. Core Ideas
  2. Glossary
  3. Continuous vs. Discrete state spaces
  4. Neighborhood-size tradeoff
  5. Cooling schedules & Boltzmann intuition for SA
  6. Local Beam Search vs. Parallel Hill-Climbing
  7. Genetic Algorithms: representation & operators
  8. Formulas & Notation
  9. Common Exam Traps
  10. ALGORITHMS ⭐ — pseudocode for all variants
  11. Algorithm comparison tables (pros & cons)
  12. Quick algorithm chooser · One-sentence summary of each algorithm
  13. Related Q&A & Notes
  14. Broader context: Classical vs. Local Search
  15. Where classical search is used today · Where classical search was replaced — and by what
  16. See also

Core Ideas

  • Local search keeps only the current state — no path history, no frontier. Goal: find a good state, not a route.
  • Works by moving to neighboring states according to an objective function f(s) to maximize (minimize via −f(s)).
  • Useful when the solution is the state (e.g. 8-Queens, TSP, scheduling), not the path to it.
  • All variants trade exploration (escape local optima) against exploitation (climb toward improvement).
  • Plain Hill-Climbing only finds local optima; SA, beam search and GAs add randomness or population to escape them.

Glossary — important Local Search vocabulary ⭐

These are the words you will see in exam questions. Most are illustrated with the 8-Queens example (state = one queen per column).

The building blocks

State — one complete configuration (not a partial assignment, not a path).

  • 8-Queens example: one full board with one queen per column, e.g. (8,3,7,4,2,5,1,6).

State space — set of all configurations the algorithm could be in. Usually exponentially large.

Neighborhood neighbors(s) — the states reachable from s in one step. The choice of neighborhood is a design decision.

  • 8-Queens large neighborhood: move any queen to any other field in the same column ⇒ 8 × 7 = 56 neighbours per state.
  • 8-Queens small (toroidal) neighborhood: move any queen down by one field ⇒ 8 neighbours per state.

Objective function f(s) (a.k.a. value, fitness) — the number we want to maximize.

  • 8-Queens: negative number of attacking queen pairs (we maximize toward 0), or 28 − attacks for a non-negative GA fitness.

Properties of states

Local optimum — a state where no neighbour is strictly better. Plain HC stops here regardless of whether it’s a global optimum.

Global optimum — best state in the whole state space.

Plateau — a flat region: many neighbours have the same value as the current state.

Ridge — a sequence of local maxima where no single neighbour improves, but a longer detour would.

Search-strategy vocabulary

Uphill / downhill move — moving to a neighbour with higher / lower value (assuming maximization).

Exploration vs. exploitation — willingness to take downhill / random moves vs. greediness about improvement.

Temperature T — in SA, scales the willingness to take downhill moves. High T → almost random walk; T → 0 → behaves like First-Choice HC.

Cooling schedule schedule(t) — function that decreases T over time. Slides mention Stepwise Linear and Delayed Stepwise Linear only.

Beam — in beam search, the set of k currently active states. All k beam states share the same successor pool each round.

Genetic-algorithm vocabulary

Population — current set of k states (chromosomes).

Chromosome — a state encoded as a string of genes (e.g. 83742516 for an 8-Queens board).

Fitness — the GA name for the objective function. Selection probability scales with fitness.

Reproduction / Crossover — combine two parent chromosomes into one (or two) children.

  • 1-point crossover: pick a split point, swap tails.
  • Uniform-order crossover: for permutations — fill child by binary template from parent 1, fill remaining slots in the order they appear in parent 2.

Mutation — randomly flip one gene with small probability. Maintains diversity.

Memetic algorithm — GA + a local-search step (e.g. HC) applied to each new child.


Continuous vs. Discrete state spaces

Hill-Climbing is the discrete analogue of gradient ascent. The key differences (slide 10):

AspectContinuous spaceDiscrete space
Neighborhoodε-neighborhood (uncountably many)Discrete, often finite
Best stepGradient = direction of steepest ascentFound by enumeration of neighbours
Cost per stepCompute gradientEnumerate & compare neighbours

⚠️ In discrete spaces, the state space is typically exponentially large, but the neighborhood at any single state is small — that’s why local search works at all.


Neighborhood-size tradeoff

A design choice with direct consequences (slide 19):

Small neighborhoodLarge neighborhood
✅ Faster to enumerate per step❌ Slower to enumerate per step
❌ Greater risk of getting stuck in local optima✅ Smaller risk of getting stuck in local optima

8-Queens illustration: the toroidal “move-one-down” neighborhood is only 8 states per step (cheap) but offers very few escape routes. The “any-queen-to-any-row” neighborhood is 56 states per step (expensive) but offers many escape routes.


Cooling schedules & Boltzmann intuition for SA

Acceptance probability for a downhill move: P = exp(Δ/T), where Δ = value(neighbor) − value(current) < 0 and T > 0.

Because Δ ≤ 0 for downhill, 0 ≤ exp(Δ/T) ≤ 1 always.

Effect of Δ (at fixed T = 1):

Δexp(Δ/T)
01.00
−10.37
−20.14

Effect of T (at fixed Δ = −2):

Texp(Δ/T)
1.00
1000.98
100.82
10.14
0.12 × 10⁻⁹ ≈ 0

So at high T SA behaves like a random walk (almost everything accepted); at low T it behaves like First-Choice HC (only improvements accepted). The temperature interpolates smoothly between exploration and exploitation.

Cooling schedules in the slides (slide 27):

  • Stepwise Linear — start with arbitrary T, decrease by a constant c every step.
  • Delayed Stepwise Linear — start with arbitrary T, decrease by c every k-th step.

⚠️ Exponential cooling (T ← α·T) is a common textbook choice but is NOT in these slides. Don’t cite it in the exam.

Effect of schedule speed: A slower cooling schedule increases the probability of finding a high-quality solution — but increases runtime. The slides do NOT claim a global-optimum guarantee. (The Geman & Geman log-cooling theorem is a textbook fact from R&N, not from these slides.)

What this graph shows. Acceptance probability P = exp(Δ/T) for three downhill move sizes (Δ = −1 mild, −5 medium, −10 severe) as the temperature T cools from 10 down to 0.1. Time flows left → right (T decreasing).

Things to notice.

  • At high T (left), even severe Δ = −10 moves are accepted with ~37% probability — the search behaves like an almost-random walk.
  • At low T (right), only Δ = −1 still has a tiny chance; Δ = −10 collapses to effectively zero — the search has frozen into First-Choice HC behaviour.
  • The gap between the curves widens as T cools: temperature selectively filters out bigger downhill moves first. This is exactly the exploration→exploitation interpolation Boltzmann gives us for free.

Local Beam Search vs. Parallel Hill-Climbing

Single most common exam trap. Both maintain k states; they differ in what gets compared.

Parallel Hill-ClimbingLocal Beam Search
Per iterationEach of the k states independently picks its own best neighborPool ALL neighbours of ALL k states, then pick the global top k
Information sharingNone — k truly independent HCsStates compete for slots → search concentrates on the most promising region
RiskSame as HC per chain (k local optima)Premature convergence: all k states may collapse into one region of the space

Why it matters. Local Beam Search is not “k HCs in parallel” — it is one coordinated search where good states crowd out bad ones. Stochastic Beam Search fixes the early-convergence issue by sampling the k successors with probability proportional to value.


Genetic Algorithms representation & operators

A GA needs four design choices (slide 42):

  1. Representation — encode each state as a chromosome (string of genes).
    • 8-Queens: string of 8 digits from {1,…,8}, i-th digit = row of i-th queen.
    • TSP on {0,1,2,3,4}: string of 4 digits (start fixed at 0), each digit = next node visited.
  2. Fitness function — non-negative version of the objective.
    • 8-Queens: 28 − attacking pairs.
    • TSP: upper_bound − tour_cost (e.g. 35 − cost).
  3. Reproduction (crossover) rule — combine two parents.
    • General: 1-point crossover.
    • Permutations: naive crossover yields invalid solutions (duplicate / missing nodes) → use uniform-order crossover or a repair operation.
  4. Mutation rule — random small change.
    • 8-Queens: replace one digit with a random row.
    • TSP: swap two genes.

Fitness-proportionate selection. Parents are picked with probability proportional to fitness — better individuals reproduce more often, but worse ones still survive sometimes (preserving diversity).

Uniform-order crossover (for permutations):

  1. Pick two parents.
  2. Generate a random binary template the length of the chromosome.
  3. Child 1: at every 1-position, copy parent 1’s gene. Fill the remaining slots with the missing values in the order they appear in parent 2.
  4. Child 2: swap parent roles.

Formulas & Notation

SymbolMeaning
f(s)Objective function / value of state s
neighbors(s)Set of states reachable in one step
Δ = f(s') − f(s)Value change (negative = downhill)
exp(Δ/T)SA acceptance probability for a downhill move
schedule(t)Cooling function (Stepwise Linear / Delayed Stepwise Linear)
TTemperature (SA)
kBeam width (Beam Search) / population size (GA)
fitness(c)GA fitness of chromosome c

Common Exam Traps ⚠️

  • Local Beam Search ≠ k parallel HCs. In beam search all k states share their successor pool and compete; in parallel HC each chain is independent.
  • Plain HC never accepts a downhill move. Stochastic HC (per slides) requires a strictly improving neighbour too — so it also does not escape plateaus by itself. Only SA and (indirectly) GAs accept worse states.
  • Random-restart HC is called “Random-restart (or parallel) Hill-Climbing” in the slides (slide 22). It guarantees finding the global optimum only as n → ∞ — not with a finite number of restarts.
  • Cooling schedule slow-down: the slides only claim it increases the probability of a high-quality solution — they do not state a global-optimum guarantee. The slow-cooling global-optimum theorem (Geman & Geman) is R&N, not these slides.
  • Cooling schedule list: slides give only Stepwise Linear and Delayed Stepwise Linear. Do not write “exponential cooling” in an exam answer about this lecture.
  • Acceptance probability sign: in exp(Δ/T) the Δ for downhill is negative, so the exponent is non-positive and the probability lies in (0, 1].
  • Neighborhood-size tradeoff: small = fast per step but more local optima; large = slower per step but fewer local optima.
  • Permutation crossover: naive 1-point crossover on a permutation produces invalid solutions (duplicates / missing values). Use uniform-order crossover or a repair operation.
  • First-Choice HC is useful when the neighborhood is too large to enumerate — it picks the first improving neighbour rather than the best.
  • SA at high T behaves like random walk; at low T like First-Choice HC. This interpolation is the whole point of the temperature schedule.

ALGORITHMS (full reference) ⭐

Pseudocode + intuition + complexity for every algorithm you might trace by hand in the exam. Organised by family:

  • Hill-Climbing family: #1 Steepest · #2 Stochastic · #3 First-Choice · #4 Random-Restart / Parallel
  • Annealing: #5 Simulated Annealing
  • Beam family: #6 Local Beam · #7 Stochastic Beam
  • Population family: #8 Genetic Algorithm · #9 Memetic / Hybrid

1. Hill-Climbing (steepest ascent)

Intuition. At each step, jump to the neighbour with the highest value. Stop when no neighbour improves.

HILL-CLIMBING(start):
    current ← start
    while not termination():
        neighbor ← argmax over n ∈ neighbors(current) of value(n)
        if value(neighbor) ≤ value(current):
            return current        # local optimum / plateau
        current ← neighbor

Complexity per step. O(|neighbors(s)|) for the argmax. Number of steps depends on the landscape.

Pros. ✅ Trivial to implement. ✅ Constant memory (just the current state).
Cons. ❌ Gets stuck at any local optimum, plateau, or ridge. ❌ No way to escape.

Termination. Naturally on a local maximum, or on a bound (max steps / wall-clock time).

What this graph shows. Four hill-climbing runs from different starting positions on the same multi-modal 1D landscape. Each chain greedily steps to whichever neighbour (x − step, x + step) is higher and stops when neither is.

Things to notice.

  • The green run (start x=5.8) is the lucky one — it climbs the basin of the global peak around x ≈ 7.
  • The red run (start x=0.5) climbs the first local maximum it finds and stops. It has no idea a much higher peak exists three valleys to the right.
  • The orange and blue runs illustrate the same trap from different sides — each finds its closest local optimum and freezes.
  • HC’s quality depends entirely on where you start. This is exactly the motivation for Random-Restart HC (try many starts), Simulated Annealing (occasionally accept downhill), and Genetic Algorithms (population diversity).

2. Stochastic Hill-Climbing

Intuition. Instead of the best neighbour, pick a random strictly improving neighbour. Optionally weight selection probability by how much better each neighbour is.

STOCHASTIC-HC(start):
    current ← start
    while not termination():
        candidates ← { n ∈ neighbors(current) : value(n) > value(current) }
        if candidates = ∅:
            return current        # no strictly improving neighbour
        neighbor ← random.choice(candidates)   # may weight by value
        current ← neighbor

Pros. ✅ Cheaper per step than steepest ascent (no full argmax needed). ✅ Adds some path diversity over multiple runs.
Cons. ❌ Per slides, still requires strictly improving neighbours — so it does NOT escape plateaus or local optima. ❌ Compromise between random search and Hill-Climbing.


3. First-Choice Hill-Climbing

Intuition. Generate neighbours one at a time and accept the first one that improves. Useful when the neighbourhood is too large to enumerate.

FIRST-CHOICE-HC(start):
    current ← start
    while not termination():
        for neighbor in random_order(neighbors(current)):
            if value(neighbor) > value(current):
                current ← neighbor
                break
        else:
            return current        # no improving neighbour found

Pros. ✅ O(1) per step when neighbourhoods are huge. ✅ Very simple.
Cons. ❌ Same getting-stuck problem as plain HC.


4. Random-Restart (or Parallel) Hill-Climbing

Intuition. Run n independent HCs from random starting states; return the best result. Each restart escapes a different local optimum.

RANDOM-RESTART-HC(n):
    best ← None
    for i in 1..n:
        s ← HILL-CLIMBING(random_initial_state())
        if best is None or value(s) > value(best):
            best ← s
    return best

Guarantee. P(finding global optimum) → 1 as n → ∞. With finite n, no guarantee.

Pros. ✅ Embarrassingly parallel. ✅ Each restart is cheap.
Cons. ❌ Independent restarts duplicate effort (no information shared). ❌ No global guarantee for finite n.

⚠️ The slides specifically call this “Random-restart (or parallel) Hill-Climbing” (slide 22).

What these panels show. Left: the number of attacking-queens pairs (objective we want to drive to 0) along five independent steepest-ascent HC runs, each starting from a fresh random board. Right: the final board from the best run.

Things to notice.

  • Each curve monotonically decreases — steepest-ascent HC never accepts a worse neighbour.
  • Most curves flatline before reaching 0 — they hit a local optimum where no single-queen move reduces conflicts. Plain HC has no escape.
  • Random-Restart HC exploits exactly this: with enough restarts, at least one trajectory will dive all the way to 0 (the global optimum). Russell & Norvig report ~14% of random 8-Queens HCs solve the puzzle — so ~7 restarts give you a >95% success rate. This is the empirical version of “P(success) → 1 as n → ∞“.

5. Simulated Annealing (SA) ⭐

Intuition. Like HC, but accept downhill moves with probability exp(Δ/T). T decreases over time (cooling), so the search starts exploratory and becomes greedy.

SIMULATED-ANNEALING(start, schedule):
    current ← start
    for time in 1, 2, 3, ...:
        T ← schedule(time)
        if T == 0:
            return current
        neighbor ← random.choice(neighbors(current))
        Δ ← value(neighbor) − value(current)
        if Δ > 0:
            current ← neighbor
        else if random.bernoulli(exp(Δ/T)):
            current ← neighbor

Key properties.

  • High T: nearly all moves accepted ⇒ random walk (exploration).
  • Low T: only improving moves accepted ⇒ behaves like First-Choice HC (exploitation).
  • T → 0: search freezes at whatever state it’s in.

Cooling schedules (slides only):

  • Stepwise LinearT ← T − c every step.
  • Delayed Stepwise LinearT ← T − c every k-th step.

Guarantees per slides. A slower cooling schedule increases the probability of finding a high-quality solution. The slides do NOT claim a global-optimum guarantee. (Textbook R&N + Geman & Geman do, but only with log-cooling and infinite time — not in this lecture.)

Pros. ✅ Can escape local optima. ✅ Anytime — return the best state seen so far. ✅ Simple.
Cons. ❌ Performance depends strongly on the cooling schedule. ❌ No global guarantee in the slides’ setting. ❌ Random neighbour selection ignores neighbour quality.


Intuition. Maintain k states. At each step, generate all successors of all k states, then keep the best k overall.

LOCAL-BEAM-SEARCH(k):
    k_current ← { random_initial_state() for _ in range(k) }
    current_value ← max value over k_current
    while not termination():
        all_neighbors ← union(neighbors(c) for c in k_current)
        k_best ← argmax_top_k over all_neighbors of value
        neighbor_value ← max value over k_best
        if neighbor_value ≤ current_value:
            return argmax over k_current of value
        k_current, current_value ← k_best, neighbor_value

Key idea. ⚠️ NOT k independent HCs! All k states share their successors and compete — it’s a single coordinated search that concentrates on promising regions.

Pros. ✅ Information sharing across the k chains. ✅ Faster convergence than k independent HCs when one region is clearly best.
Cons. ❌ Premature convergence: all k beam slots can be taken by neighbours of the same region. ❌ Memory grows linearly with k.


Intuition. Same skeleton as Local Beam Search, but pick the k next states probabilistically (probability increases with value) instead of deterministically taking the top k.

STOCHASTIC-BEAM-SEARCH(k):
    k_current ← { random_initial_state() for _ in range(k) }
    current_value ← max value over k_current
    while not termination():
        all_neighbors ← union(neighbors(c) for c in k_current)
        k_best ← random.choice(all_neighbors, count=k,
                               weight=value)             # higher value ⇒ higher P
        neighbor_value ← max value over k_best
        if neighbor_value ≤ current_value:
            return argmax over k_current of value
        k_current, current_value ← k_best, neighbor_value

Pros. ✅ Avoids the early-convergence problem of deterministic beam search. ✅ Balances diversity (exploration) and intensification.
Cons. ❌ Choice of weighting function matters. ❌ Slightly more bookkeeping.


8. Genetic Algorithm (simple)

Intuition. Maintain a population of k chromosomes. Each generation, repeatedly pick two parents (proportional to fitness), produce a child via crossover, mutate with small probability, then evict a low-fitness chromosome.

GENETIC-ALGORITHM(k, mutation_probability):
    population ← { random_chromosome() for _ in range(k) }
    while not termination(population):
        for i in 1..k:
            x, y ← random.choice(population, count=2, weight=fitness)
            child ← reproduce(x, y)            # e.g. 1-point crossover
            if random() < mutation_probability:
                child ← mutate(child)
            population ← population ∪ { child }
            remove_random_chromosome(population, weight=1/fitness)
    return argmax over population of fitness

Design choices — see Genetic Algorithms representation & operators for representation, fitness, crossover, mutation.

Pros. ✅ Population diversity → can escape many local optima. ✅ Very flexible — applies to any problem with a fitness function. ✅ Implicit parallel exploration of building blocks.
Cons. ❌ Many hyperparameters (k, mutation rate, selection scheme). ❌ Slow to converge precisely. ❌ Naive crossover breaks for permutations — must use uniform-order crossover or a repair operation.

What this graph shows. Per-generation best, mean and worst fitness of a tiny GA solving the OneMax problem (maximise the number of 1s in a 40-bit string; global optimum = 40). Population of 30 chromosomes, 1-point crossover, mutation rate 2 %.

Things to notice.

  • The best fitness rises sharply in the first ~15 generations, then asymptotes near the global optimum — the classic GA convergence curve.
  • The mean tracks the best with a small lag — fitness-proportionate selection pulls the whole population toward high-fitness individuals.
  • The worst stays well below the best for the entire run. This is healthy — it shows mutation and crossover keep producing low-fitness outliers, preserving the diversity that lets the GA escape local optima.
  • If the gap between best/worst collapsed to zero, the population would be homogenised (everyone in the same basin) — the GA’s main failure mode.

9. Memetic Hybrid GA

Intuition. A Genetic Algorithm where each newly created child is immediately refined by a local search (typically HC) until it reaches a local optimum.

MEMETIC-ALGORITHM(k, mutation_probability):
    population ← { random_chromosome() for _ in range(k) }
    while not termination(population):
        for i in 1..k:
            x, y ← random.choice(population, count=2, weight=fitness)
            child ← reproduce(x, y)
            if random() < mutation_probability:
                child ← mutate(child)
            child ← HILL-CLIMBING(child)        # local refinement
            population ← population ∪ { child }
            remove_random_chromosome(population, weight=1/fitness)
    return argmax over population of fitness

Pros. ✅ Each individual is locally optimal → GA explores between local optima rather than the raw state space. ✅ Faster convergence in practice.
Cons. ❌ Each generation is much more expensive (HC per child). ❌ Risk of homogeneous population (everyone in the same basin).


Algorithm comparison table — pros & cons

Hill-Climbing family

AlgorithmWhat it doesProsCons
Hill-Climbing (steepest)Always jump to best neighbour✅ Trivial
✅ O(1) memory
❌ Stuck at any local optimum / plateau / ridge
Stochastic HCRandom strictly-improving neighbour✅ Cheaper than full argmax
✅ Run-to-run diversity
❌ Still requires strict improvement (per slides) → does NOT escape plateaus
First-Choice HCFirst improving neighbour found✅ Great when neighbourhood is huge❌ Same getting-stuck problem
Random-Restart HCn independent HCs from random starts✅ Embarrassingly parallel
✅ P(success) → 1 as n → ∞
❌ Independent restarts duplicate work
❌ No finite-n guarantee

Annealing & Beam family

AlgorithmWhat it doesProsCons
Simulated AnnealingAccept downhill with exp(Δ/T); T cools over time✅ Escapes local optima
✅ Anytime
❌ Schedule-sensitive
❌ No global guarantee in slides
Local Beam Searchk states, share successor pool, keep best k✅ Information sharing across chains❌ Premature convergence to one region
Stochastic Beam Searchk states, sample successors weighted by fitness✅ Fixes beam search’s convergence problem❌ Weighting design matters

Population family

AlgorithmWhat it doesProsCons
Genetic AlgorithmPopulation + crossover + mutation✅ Very flexible
✅ Implicit parallel exploration
❌ Many hyperparameters
❌ Slow to fine-tune
❌ Naive crossover fails for permutations
Memetic / Hybrid GAGA + HC on every new child✅ Each individual is locally optimal
✅ Faster convergence
❌ Expensive per generation
❌ Risk of population homogenization

Big-picture comparison

AlgorithmExplores viaDownhill?MemoryEscapes local optima?Guarantees
Hill-ClimbingGreedy best neighbourNoCurrent stateNoLocal optimum
Stochastic HCRandom uphillNoCurrent stateNo (slides require strict improvement)Local optimum
First-Choice HCFirst uphill foundNoCurrent stateNoLocal optimum
Random-Restart HCMultiple independent runsNoCurrent state per chainYes (across restarts)P(global) → 1 as n → ∞
Simulated AnnealingRandom neighbour, downhill OK with exp(Δ/T)Yes (prob.)Current state + TYesSlides: slower cooling ⇒ higher probability of high-quality solution. No global guarantee in slides.
Local Beam SearchAll successors of all k statesNok statesLimited (within beam)Local optimum (joint)
Stochastic Beam SearchWeighted random k successorsNok statesBetter than Local BeamHeuristic
Genetic AlgorithmCrossover + mutation on populationYes (indirectly via mutation/crossover)Population kYesHeuristic
Memetic GAGA + per-child HCYes (via GA layer)Population kYesHeuristic

Quick algorithm chooser

SituationRecommended algorithm
Smooth landscape, want fast greedy convergenceHill-Climbing (steepest ascent)
Neighborhood too large to enumerateFirst-Choice HC
Want some path diversity across runsStochastic HC
Many local optima, parallel resources availableRandom-Restart (Parallel) HC
Many local optima, need to escape them in a single runSimulated Annealing
Want coordinated multi-state search, share informationLocal Beam Search (or Stochastic Beam if premature convergence is a risk)
Huge problem, want population diversity, fitness is easy to evaluateGenetic Algorithm
GA but want each individual locally optimalMemetic / Hybrid GA
Permutation problem (TSP) with GAGA + uniform-order crossover (not naive 1-point)
Need to prove a state space has no solutionNone of these — local search is incomplete. Use systematic search.

One-sentence summary of each algorithm

  • Hill-Climbing — always jump to the best neighbour; stop at any local optimum.
  • Stochastic HC — pick a random strictly-improving neighbour; still stuck at plateaus per slides.
  • First-Choice HC — accept the first improving neighbour; great for huge neighbourhoods.
  • Random-Restart (Parallel) HC — n independent HCs from random starts; P(success) → 1 as n → ∞.
  • Simulated Annealing — accept downhill with exp(Δ/T); T cools from exploratory to greedy.
  • Local Beam Search — k states share a successor pool and the top k advance; coordinated, not independent.
  • Stochastic Beam Search — same but sample successors with probability weighted by value; avoids premature convergence.
  • Genetic Algorithm — population + fitness-weighted reproduction + crossover + mutation.
  • Memetic / Hybrid GA — GA with a local-search refinement on every new child.

Quizzes

Targeted exam questions in Questions for Methods of AI

  • Q1–16 (basic) · Q85 (classical vs. local search) · Q90 (Simulated Annealing) · Q105–109 (deep / exam-trap: exp(Δ/T), Beam Search edge cases, Parallel HC vs. Beam, stalled GA, convergence guarantees)

Atomic notes


Search algorithms split into two broad families. The lecture session above focuses on local search; the hub material below ties it back into the wider picture of classical (path-finding) search.

Basics of Search Algorithms

  • Search Problem: Defined as a 4-tuple consisting of:

    • Search space (state space)
    • Successor relation (transition relation)
    • Initial state
    • Goal predicate that ends the search
    • Optional: Transition costs
  • Solution: A sequence of states from the initial state to a goal state.

  • Search Algorithm: Takes a search problem as input and returns a solution or failure indication by creating a search tree over the state space graph.

  • State Space Representation: As a graph with nodes as states and directed edges as actions.

  • Goal: Find a sequence of actions from an initial state to a goal state.
  • Categories:
    • Uninformed (blind) search:
      • Random search
      • Systematic search: Depth-first search (DFS), Breadth-first search (BFS); variations: Uniform-cost search, Depth-limited search.
    • Informed (heuristic) search: Greedy best-first search, A*-search.

📖 BFS / DFS / Uniform-Cost / A* are defined properly in Uninformed Search — mechanics, completeness/optimality/memory comparison, and the contrast to the local-search methods in this note.

Classical SearchLocal Search
GoalFind a sequence of actions leading from an initial state to a goal stateFind a “nearly” achievable/optimal state
FocusSequence of actionsState
ApplicationWhen the sequence of actions is importantWhen the sequence of actions is not important and the search space is too large for exhaustive search
Search spaceConsiders the entire search spaceConsiders surrounding states and selects the best one to proceed
OptimizationFinds an optimal solution if one existsOften finds a suboptimal solution, but can also provide results for inconsistent problems
ComplexityCan be computationally intensive in large search spacesCan be faster as it does not search the entire space

State Space and Search Tree

  • The state space can be represented as a graph where nodes represent states and directed edges represent actions.
  • The search tree corresponds to a tree where each node corresponds to a state in the state space, and edges represent actions. The root of the tree corresponds to the initial state.
  • The search tree is generated by adding the successors of a state starting from an initial state until a solution is found.
  • The size of the search space is generally exponential.

Summary

The choice of search algorithm heavily depends on the nature of the problem. When the path to the solution is important, classical search is preferred. If only the solution is important or the search space is too large for classical search, local search might be the better choice. The size of the search space can be very large for both approaches, making heuristics and termination conditions crucial.


Where classical search is used today

  • A* in video games — nearly every game with NPCs navigating a map uses A* or a variant (HPA*, Theta*). The industry-standard pathfinding algorithm.
  • Google Maps / GPS routing — modern routers use Contraction Hierarchies (bidirectional Dijkstra with precomputed shortcuts), but A* and ALT (A* with Landmarks) remain in the toolkit for dynamic traffic conditions.
  • Robotic motion planning — A* on configuration-space grids is the baseline for collision-free robot-arm and mobile-robot paths. Used in factory robots (Fanuc, ABB) and autonomous vehicles.
  • Package managers — npm, cargo, pip use DFS variants for dependency resolution.
  • Social network analysis — BFS for “degrees of separation” (Facebook’s friend recommendations historically used BFS-based scoring).
  • Web crawlers — Google’s crawler uses prioritized BFS/DFS to discover pages.
  • Chess and Go engines (classical) — Minimax with alpha-beta pruning is a classical search variant. Modern engines (Stockfish, AlphaZero) use MCTS, but alpha-beta is still inside Stockfish’s NNUE evaluation.
  • SAT solvers — modern SAT solvers (MiniSat, Glucose, Z3) use CDCL (Conflict-Driven Clause Learning), a variant of backtracking search with learning.

Why classical search survives: when you need the optimal path (not just any solution), and the state space is structured enough to apply admissible heuristics — A* with a good heuristic is unbeatable for shortest-path-style problems.


Where classical search was replaced — and by what

DomainWas classical search, now …Why
Game AI for complex games (Go, complex strategy)MCTS + neural networks (AlphaZero, MuZero)Alpha-beta exploded combinatorially on Go (branching factor ~250). MCTS samples promising lines; neural value/policy nets replace handcrafted heuristics.
Very large road-network routingContraction Hierarchies (CH), Customizable Route Planning (CRP)CH precomputes shortcuts → orders of magnitude faster than plain A* for continent-scale graphs. Used by Google Maps, Bing Maps, OSRM.
Robot motion planning in high-DOF spacesSampling-based planners: RRT, RRT*, PRMA* on a grid fails when the configuration space is 7-dimensional (a robot arm). Sampling planners scale better with dimension.
Speech recognition decoding (formerly Viterbi/A*)End-to-end neural decoders (CTC, RNN-T, Whisper)Neural decoders learn alignment + language model jointly; classical decoding pipelines are mostly retired.
Information retrieval (formerly DFS/BFS through inverted indexes)Dense retrieval with transformer embeddings (ColBERT, DPR)Semantic similarity beats keyword traversal for most user queries.
Theorem proving (formerly tree search + heuristics)LLM-guided proof search (AlphaProof, Lean Copilot)LLMs suggest plausible next tactics, pruning the search tree massively.

Where classical search still stands alone: when (a) the state space is small enough to enumerate, (b) you need a provably optimal solution with admissibility guarantees, and (c) the heuristic is well-understood. Pathfinding on a 2D grid will never need a neural network — A* solves it optimally in microseconds.


See also

Lernzettel: lernzettel_local-search_30-04-26
Quiz: quiz_local-search_30-04-26 · quiz_local-search_2026-04-30
Superlink: Methods of AI Lecture · 611 📠Machine Learning
Questions hub: Questions for Methods of AI

Status:
Tags: science methods-of-ai ai-generated
610 🤖Artificial Intelligence, Künstliche Intelligenz

Local Search family: Hill Climbing · Simulated Annealing · Genetic Algorithms · Stochastic Diffusion Search · Temperature in neighbour selection · Local Beam Search

Related topic-references: Constraint Satisfaction Problems

Sources:

Quellen

Erstellt: 14-02-25 12:06