Temperature in neighbour selection

What is Δ (ΔE)?

Δ = f(x_new) − f(x_current) — the change in the objective function when you move from the current state to a candidate neighbour.

  • For minimization (the SA default): Δ > 0 means the new state is worse (higher cost), Δ < 0 means better.
  • For maximization: flip the sign.
  • The “E” in ΔE is a leftover from the physics analogy (E = energy of a thermodynamic system). In the algorithm it’s just the difference of objective values.

Acceptance rule then reads:

  • Δ < 0 (improvement) → always accept
  • Δ > 0 (worse) → accept with probability P = exp(−Δ / T)

In simulated annealing, the temperature ( T ) plays a crucial role in determining the probability of accepting moves that may worsen the solution, which helps the algorithm avoid getting stuck in local optima. Here’s how temperature affects the process:

  1. High Temperature (e.g., ( T = 10 )):

    • At high temperatures, the probability of accepting a worse move (where (\Delta E > 0)) is relatively high.
    • This means the algorithm is more willing to accept moves that increase the objective function (i.e., moves that might temporarily worsen the solution), allowing it to explore the search space broadly.
    • This broader exploration prevents the algorithm from getting trapped early in local minima, encouraging it to jump out and explore other areas of the search space.
  2. Low Temperature (e.g., ( T = 1 )):

    • At low temperatures, the probability of accepting worse moves decreases. The algorithm becomes more “conservative,” mostly accepting moves that improve or minimally worsen the solution.
    • As ( T ) approaches zero, the probability of accepting any move that increases the objective function (worsens the solution) approaches zero.
    • This allows the algorithm to “settle” around a minimum, refining the solution rather than jumping to more exploratory moves.

Effect of Temperature on Probability (Acceptance of Worse Moves)

The acceptance probability ( P ) of a worse move (when (\Delta E > 0)) is given by:
[P = e^{-\Delta E / T}]

  • At high ( T ): The term (\Delta E / T) is small, so ( e^{-\Delta E / T} ) is closer to 1, meaning higher acceptance probability.
  • At low ( T ): (\Delta E / T) becomes larger, making ( e^{-\Delta E / T} ) closer to 0, resulting in a lower acceptance probability.

Summary

The temperature difference controls the exploration vs. exploitation balance:

  • High temperature favors exploration, allowing the algorithm to escape local optima.
  • Low temperature favors exploitation, encouraging convergence toward a (hopefully global) minimum.

In practice, simulated annealing starts with a high temperature that gradually cools down, allowing broad exploration initially and focused optimization as it nears a solution.

See the curve concretely

Print the acceptance probability P = exp(−Δ/T) for a few typical (Δ, T) pairs — this is the entire mechanism in two lines:

Output:

    Δ | T=10.0 | T= 5.0 | T= 1.0 | T= 0.1
---------------------------------------------
  0.1 | 0.990 | 0.980 | 0.905 | 0.368
  1.0 | 0.905 | 0.819 | 0.368 | 0.000
  5.0 | 0.607 | 0.368 | 0.007 | 0.000

Read the table diagonally:

  • Top-left (high T, small Δ): almost always accept → pure exploration
  • Bottom-right (low T, big Δ): never accept → pure exploitation
  • A bad move with Δ = 5 has 60 % chance at T = 10 but is essentially impossible at T = 0.1. That is the cooling schedule at work.

see also

Type:
Tags:
Status:
Location:
Created: 05-11-24 21:50
611 📠Machine Learning

Source