Hill Climbing

Hill Climbing (HC) is the simplest local search algorithm: from the current state, move to the best-improving neighbor. Repeat until no neighbor is better. That’s it.

It’s called “hill climbing” by convention — for maximization it climbs up, for minimization it descends down the cost surface. The algorithm is greedy, memoryless, and stunningly fast — and it fails in three famous ways (local maxima, plateaus, ridges).

Hill Climbing is the direct ancestor of gradient descent: when the objective is differentiable, “best-improving neighbor” becomes “step in the gradient direction.” Every neural network ever trained does a version of hill climbing.

When to reach for Hill Climbing

  • The objective function can be evaluated cheaply at neighboring states
  • A good solution is enough — you don’t need the global optimum (or you accept random-restart as a workaround)
  • You have no useful gradient information (otherwise use gradient descent directly)
  • You want a dead-simple baseline before reaching for SA / GA / Bayesian Optimization

If the landscape is convex (one minimum, no traps) → HC finds the optimum. If it’s rugged → HC finds whatever local optimum sits nearest to your start. The starting point matters a lot.

The algorithm

current ← initial state
loop:
    neighbors ← all states reachable from current
    best ← argmax_{n ∈ neighbors} f(n)
    if f(best) ≤ f(current):   # no improvement
        return current
    current ← best

That’s the steepest-ascent variant — the textbook default. Three important alternatives:

Variants

VariantHow it picks the next stateWhen to use
Steepest Ascent HCEvaluates all neighbors, picks the bestDefault — when neighborhood is small
First-Choice HCEvaluates neighbors in random order, takes the first improving oneHuge neighborhoods (e.g. n-Queens with thousands of neighbors) — saves evaluation cost
Stochastic HCPicks a random improving neighbor (probability proportional to improvement)Sometimes escapes shallow plateaus better than steepest
Random-Restart HCRuns vanilla HC from n random starting points; keeps the best resultThe standard fix for local-optima. P(success) → 1 as n → ∞ for finite state spaces

The three famous failure modes ⚠️

Hill climbing is famous in AI courses for getting stuck in three distinct ways:

  1. Local maximum (or minimum) — a state whose neighbors are all worse, but which isn’t the global optimum. HC stops; you’re trapped in a “false summit.”
  2. Plateau — a flat region where all neighbors have the same value as the current state. HC has no gradient to follow → either stops or wanders aimlessly.
  3. Ridge — a sequence of states forming a sloped path, but the path runs diagonally to the move set. Each direct neighbor looks flat or worse, but a sequence of moves would improve. Particularly common in discrete spaces.

The fix to (1) is random restarts or SA’s downhill moves. The fix to (2) is stochastic HC or allowing sideways moves with a step limit. The fix to (3) is richer neighborhoods (more move types).

Demo: HC traps on the same function we used for Simulated Annealing

The same function f(x) = x² + 10·sin(x) with global min at x ≈ −1.31 and a local min at x ≈ 3.85. We show HC starting from different x₀ values and see which ones get trapped. Then we run random-restart HC and watch P(success) climb toward 1.

What the plots show

Panel 1 — Six HCs from six different starts:

  • Starts at x₀ = −4, −2, 0 → all slide LEFT into the global min (x ≈ −1.31). Win.
  • Starts at x₀ = 2, 3, 4 → all slide RIGHT into the local min (x ≈ 3.85). Trapped.
  • Pure HC’s fate is decided entirely by which basin of attraction your starting point sits in. There’s no escape mechanism.

Panel 2 — Random-Restart HC convergence:

  • With 1 restart (basically just plain HC): success rate ~50 % (half the starts are in the wrong basin).
  • With 5 restarts: ~95 %. The probability of all 5 starts landing in the wrong basin is roughly 0.5⁵ ≈ 3 %.
  • With 20 restarts: 100 %. Essentially guaranteed.
  • Formula: if a single random start succeeds with probability p, then n restarts succeed with probability 1 − (1 − p)ⁿ → exponential approach to 1.

This is the central insight: HC + random restart is a surprisingly powerful baseline for problems where you can afford a few cheap restarts. Random-Restart HC often matches or beats SA on simple landscapes.

Hill Climbing vs. Simulated Annealing — when which?

PropertyHill ClimbingSimulated Annealing
Accepts worse moves?NeverYes, with probability exp(−Δ/T)
Escapes local optima?No (without restarts)Yes (when T is warm)
Per-iteration costLower (no probability calc)Slightly higher
Tuning parametersStep size onlyStep size, T₀, cooling schedule
Best whenConvex / few trapsRugged landscapes
ReliabilityDepends on x₀Depends on schedule

The pragmatic recipe: start with Random-Restart HC. If it’s not good enough, upgrade to Simulated Annealing. If SA isn’t good enough, look at Genetic Algorithms or population-based methods.

See Simulated Annealing for the SA mechanism on this exact function — same problem, with the temperature-based escape mechanism added.

Where Hill Climbing is used today

The “vanilla” HC algorithm is rarely deployed standalone, but its concept powers some of the most important algorithms in modern ML.

  • Gradient descent / SGD / Adam — all neural network training is, mathematically, continuous hill climbing where the “best neighbor” is identified via the gradient. Backprop just computes that gradient efficiently. GPT-4 is literally trained by hill climbing.
  • WalkSAT and other SAT solvers — stochastic local search variants of HC are competitive baselines for SAT problems (Selman & Kautz 1993). Used in chip verification and AI planning.
  • Compiler instruction scheduling — many compiler passes use hill-climbing local search to find better code orderings.
  • Local moves in MILP solvers — Gurobi and CPLEX use hill-climbing-style “polish” steps as part of their primal heuristics.
  • Scheduling / Timetabling — local search refinements on partial schedules are standard practice.
  • Sudoku solvers (variant) — assignment-based Sudoku solvers use HC-like moves alongside backtracking.
  • Reinforcement-learning policy refinement — policy iteration’s improvement step is hill climbing on the policy space.

Where Hill Climbing was extended — and by what

HC’s failure modes are well-known, and each has a famous fix:

Limitation of vanilla HCExtensionWhen to use
Gets stuck in local optimaSimulated Annealing (Simulated Annealing)Rugged landscapes — SA’s downhill moves escape traps
Gets stuck, but has multiple starts availableRandom-Restart HCCheap evaluations, finite state space, parallelizable
Stuck on plateaus / ridgesStochastic HC (random uphill neighbor)Discrete spaces with many equal-value neighbors
Single trajectory wastes parallel computeLocal Beam SearchMany cores available; states can share information
Differentiable objectiveGradient Descent / SGD / AdamAll modern ML — gradients give “best neighbor” for free
Need crossover-style recombinationGenetic Algorithms (Genetic Algorithms)Combinatorial problems where partial solutions can be mixed
Need population diversity + explorationParticle Swarm, Differential EvolutionContinuous black-box optimization

Bottom line: vanilla Hill Climbing is the simplest member of an entire family of local search methods. Every more sophisticated method (SA, GA, gradient descent, beam search) can be understood as “HC plus one extra mechanism.” For understanding why local search works, HC is the right starting point.

See also

  • Loss Surface — the landscape HC traverses; why local optima trap HC and how high-D changes the picture

Direct siblings (Local Search family, MoAI VL02)

Modern descendants

Course material

MOCs

Tags: methods-of-ai science local-search hill-climbing
Created: 18-05-26