Monte Carlo Tree Search

methods-of-ai ai-generated

Monte Carlo Tree Search (MCTS) is a search algorithm that builds a search tree selectively and statistically — instead of searching exhaustively, it runs many sampled trial-playouts and concentrates the search on the most promising branches. Famous from AlphaGo / AlphaZero (Go, chess) and AlphaProof / AlphaGeometry (IMO silver 2024).

Why this note exists

It’s the [[Monte Carlo Tree Search]] concept from the exam-paper talk (Selection-Inference · pruefung_paper-transformers-search_25-05-26). MCTS is the cleanest example of externalised search — the search happens in an explicit algorithm around the network, not inside a single forward pass. Exactly what Saparov et al. (2024) imply is necessary, since the bare forward pass can’t robustly search.

The core idea

You have a huge decision tree (e.g. every possible move in Go) that you can never fully search. MCTS does not search exhaustively — it grows the tree incrementally, guided by statistics from random simulations, focusing effort where it pays off.

The 4 steps (looped thousands of times)

Step	What happens
1. Selection	From the root, walk down choosing the “best” child each time (balancing known-good vs. unexplored).
2. Expansion	At the edge of the current tree, add a new node (move).
3. Simulation (rollout)	From there, play randomly to the end → a rough outcome (win/loss).
4. Backpropagation	Propagate that outcome back up the path → update each node’s stats (visit count, win count).

After many iterations, pick the move with the best statistics.

The key trick: exploration vs. exploitation

Selection uses a formula (UCB1 / UCT) balancing:

Exploitation — branches that did well so far.
Exploration — branches tried rarely (might be better).

→ It wastes no time on obviously bad branches, but never forgets to test the unexplored. (Same tension as in Simulated Annealing and local search.)

AlphaGo: MCTS for search, neural nets for evaluation

This is the part that matters for the talk — AlphaGo shows the division of labour between search and network:

        ┌─────────────── MCTS (the search) ───────────────┐
position │  Selection → Expansion → Evaluation → Backprop  │ → best move
        └──────────────────┬──────────────────┬──────────┘
                           │                  │
                    Policy net           Value net
              "which moves good?"   "how good is this position?"
                  (neural nets EVALUATE — they don't search)

System	Year	Searches with
AlphaGo	2016	MCTS + policy net + value net
AlphaGo Zero / AlphaZero	2017	MCTS + one net, no human games, no rollouts
AlphaProof / AlphaGeometry	2024	MCTS-style search over proof steps (IMO silver)

Policy net → “which moves are promising?” → guides Selection so MCTS needn’t try every move.
Value net → “how good is this position?” → replaces the random rollouts with a learned evaluation (AlphaGo’s big innovation over classical MCTS).
The net never searches — it only guides and evaluates. The search is MCTS, outside the net.

Why it matters for the exam paper

This is the strongest historical example for the “modern systems externalise search” argument:

AlphaGo’s neural net doesn’t search — it evaluates. The actual search is MCTS around it. That’s exactly Saparov’s point: a single forward pass doesn’t search robustly, so you offload search to an explicit algorithm and let the net supply the heuristic.

So AlphaGo (2016) is “externalised search” years before o1/o3 and the RL reasoners — it confirms the paper’s conclusion from the other direction.

MCTS vs. the lecture’s classical search

	MCTS	A*	BFS/DFS
Guarantee	probabilistic, anytime (better with more compute)	optimal (admissible h)	complete
How it explores	sampled rollouts + UCB	best-f(n)=g+h	systematic
Heuristic source	random sim / learned value net	hand-designed h	none

⚠️ Unlike A*, MCTS guarantees nothing about optimality — it’s anytime and statistical: more rollouts → better estimate, but no proof. The random rollouts echo stochastic local search; the UCB balance echoes exploration-exploitation.

One-liner for the oral

“MCTS builds a search tree via many sampled playouts, balancing exploration vs. exploitation (UCB). AlphaGo, AlphaZero, and AlphaProof all use it — with neural nets only to evaluate moves/positions, while MCTS does the search. It’s the textbook case of externalised search: the search lives in the algorithm around the net, not in a single forward pass — exactly what Saparov’s result implies you need.”

Brain Online

Explorer

Monte Carlo Tree Search

Monte Carlo Tree Search

The core idea

The 4 steps (looped thousands of times)

The key trick: exploration vs. exploitation

AlphaGo: MCTS for search, neural nets for evaluation

Why it matters for the exam paper

MCTS vs. the lecture’s classical search

See also

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis