Local Beam Search

methods-of-ai

Local Beam Search maintains k states simultaneously instead of one. At each step, it generates all successors of all k current states, then keeps the best k overall. This pools information across the k beams — promising regions get more search effort, dead-end beams die out.

It looks like “k parallel Hill Climbs” but is fundamentally different: states share information, so beams concentrate on promising regions. This is also its weakness — beams can collapse onto a single region, losing diversity.


🔦 What IS a “beam” — and why do we need k of them?

A "beam" = one candidate solution being tracked.

Think of a beam as a flashlight pointed at the search space. Hill Climbing has 1 flashlight (one current state). Local Beam Search has k flashlights at once, all moving in parallel.

Why k > 1? A single flashlight (Hill Climbing) gets stuck the moment it walks into a local optimum — there’s nothing better in its immediate neighborhood, so it stops. With k flashlights spread across the landscape, the chance that at least one lands in the global optimum’s basin is dramatically higher.

The concrete object: A beam is just a value — for our example f(x) = x² + 10·sin(x), each beam is a single number x. For k=4, we track 4 such numbers. For an 8-queens problem, each beam would be a full board configuration (a state). For LLM text generation, each beam is a partial sentence.


🎯 How are the next k beams chosen? — the central mechanism

This is the part that confuses people. Here is the exact procedure for one iteration, with k=4 and b=5 (each beam generates 5 successors):

Step 1 — Generate candidates.
Each of the 4 current beams produces b=5 successors (small random perturbations).
→ Total candidate pool: 4 × 5 = 20 candidates.

Step 2 — Pool them ALL together.
Throw all 20 candidates into a single bucket. Forget which parent produced which child. They’re all equal contenders now.

Step 3 — Rank by fitness.
Sort all 20 candidates by f-value (smallest = best, since we’re minimizing).

Step 4 — Keep the top k.
Take the 4 best out of all 20. These become the new beams for the next iteration. The other 16 are discarded.

The single most important consequence

Because all 20 candidates compete in one global ranking, a single “good” parent can have all 4 of its children chosen — wiping out the other 3 lineages entirely. This is the “rich-get-richer” effect and the source of LBS’s diversity-collapse problem.

Example: If Parent A is near the global min (f ≈ −7) and Parents B, C, D are higher up (f ≈ 10, 15, 20), Parent A’s 5 children will likely all have lower f-values than any of B/C/D’s children → all 4 survivors come from Parent A → B, C, D’s lineages are extinct.

Visual cue in the step-by-step plot below:

  • Each candidate dot is colored by its parent.
  • The 4 survivors get a black ring.
  • Count the colors inside the rings — if all 4 are the same color, that parent monopolized the next generation. That’s the collapse mechanism in action.

The algorithm

1. INITIALIZE  — start with k random states
2. EXPAND      — generate ALL successors of ALL k current states
3. SELECT      — pick the best k from the combined successor pool
4. CHECK       — if any state is a goal → return it
                  if no improvement possible → terminate
5. REPEAT

Two key differences from k independent Hill Climbs:

  • Step 3 pools successors across all beams — winning states from one beam can “infect” another
  • A high-fitness region naturally attracts more beams over time → rich get richer

⚠️ Beam Search vs. Parallel Hill Climbing — the classic exam trap

This is one of the most-asked exam comparisons:

Parallel Hill ClimbingLocal Beam Search
k states
Each stepEach state picks ITS OWN best successorAll k·b successors pooled, top k kept
Information sharing between beams❌ — independent searches✓ — globally best states win
Failure modeEach beam stuck in its own local optimumAll beams collapse into one region (diversity collapse)

⚠️ Pure Local Beam Search often degenerates into “k copies of one state” because as soon as one beam finds a hot region, all k beams flock there. The fix is Stochastic Beam Search.

Instead of deterministically picking the top k successors, sample k of them with probability proportional to fitness. This keeps diversity — weaker beams sometimes survive, preserving exploration.

It’s essentially the same trick as fitness-proportional selection in Genetic Algorithms — population-based search relies on stochastic selection to avoid premature convergence.

Step-by-step visualization — the selection mechanism made visible

The single thing that confuses people about LBS is how the next k beams get chosen. Here’s a 6-panel walkthrough on f(x) = x² + 10·sin(x). Each panel shows ONE iteration:

  • Big colored circles = the k current beams
  • Small gray dots = ALL k·b successors generated this step (the candidate pool)
  • Highlighted with black ring = the top-k selected for the next iteration

Watch how the pool of small dots gets filtered down to k big circles — and how those k often all come from one or two parents (rich-get-richer effect).

What to watch for:

  • iter 0: 4 colored beams scattered randomly
  • iter 1: 20 small gray dots appear (each beam shot 5 random neighbors). The 4 with black rings are the survivors — they go on as the colored beams in iter 2.
  • iter 2-3: Notice that often ALL 4 survivors come from just 1-2 parents — the parents in the good region produced the best children, so their lineage dominates.
  • iter 4-5: All 4 beams clustered tightly in one region. Diversity has collapsed.

This is the central mechanism: at each step, all parents’ successors compete globally, not within-parent. So a beam in a bad region whose children are bad → that beam’s lineage dies out. A beam in a great region whose children are great → that beam’s lineage gets multiplied.

Direct comparison: Local Beam Search vs. Parallel Hill Climbing

Same starting points, same neighborhood function, only the selection rule differs:

The decisive contrast:

  • Parallel HC (left): 4 trajectories stay independent. Each beam descends into whichever basin it started in — 2 beams end at the global min, 2 end at the local min. No information transfer.
  • LBS (right): 4 trajectories collapse onto one region — typically the one that produced the best successors earliest. You might end up with all 4 beams in the global min (good!) — or all 4 stuck in the local min (catastrophic!).

This is exactly why Stochastic Beam Search exists — without stochasticity in selection, LBS gambles its diversity away.

See it in code (original convergence-tracking view)

What to see:

  • Left panel: deterministic beams (blue) quickly collapse onto a single region — within 10 iterations all 5 beams are within 0.1 of each other. Stochastic beams (red) maintain spread for much longer.
  • Right panel: spread of deterministic beam drops to ~0 within 5–10 iterations (diversity collapse). Stochastic beam keeps spread > 1 for much of the run.
  • Trade-off: deterministic converges faster on the best basin it finds first; stochastic explores longer but is more likely to find the global optimum.

Variants

VariantTwistWhen to use
Local Beam SearchTop k chosen deterministicallyConvex-ish landscapes; quick convergence
Stochastic Beam SearchTop k chosen probabilisticallyRugged landscapes; preserve exploration
Beam Search (classical, with paths)Same idea but keeps full paths, not just statesSequence generation (decoding)
Diverse Beam SearchPenalize beams that are too similarLLM decoding when you want variety

Where (variants of) Beam Search are used today

The “states + top-k” pattern is everywhere in modern AI:

  • LLM decoding — when GPT/Claude/Gemini generates text, they use beam search (or its successors) to find probable token sequences. Output diversity comes from temperature sampling, top-p, top-k, etc.
  • Speech recognition — Whisper and other ASR systems use beam search to decode token sequences from audio
  • Machine translation — neural translators (early Transformer NMT) use beam search to pick output sentences
  • Image captioning — beam search over caption sequences
  • Code generation (Copilot, CodeLLaMA) — beam search variants when greedy decoding is too brittle
  • Sequence labeling — chunking, POS tagging in NLP
  • Robotic motion planning — variants of beam search to maintain k candidate trajectories

Where Beam Search was challenged — and by what

DomainWas beam search, now …Why
LLM text generationNucleus sampling (top-p), temperatureBeam search produces repetitive, “safe” text; sampling is more creative
Continuous optimizationSimulated Annealing, Bayesian OptimizationWhen the landscape is smooth, gradient-based or model-based methods beat beam expansion
Combinatorial NP-hardMILP solvers, branch-and-boundExhaustive techniques with bounds beat fixed-width beams
Game tree searchMonte Carlo Tree Search (MCTS)MCTS adaptively allocates rollouts; beam search wastes budget on bad lines

Where beam search still shines: sequence decoding when you want diverse but high-probability outputs, and as a baseline for any “best-k of many candidates” problem.

See also

Tags: methods-of-ai local-search beam-search
Created: 18-05-26