ai-generated methods-of-ai exam-prep
Companion to quiz_paper-transformers-search_28-05-26 · Q2 — Lookahead.
Two fully worked graphs for lookahead L, built to drill why the
minand themaxeach matter. Source: “Transformers Struggle to Learn to Search” (Saparov et al. 2024), §3.
The definition (don’t lose this)
With P = the path start→goal and Sᵢ = the paths that leave the start and are otherwise disjoint from P (share only the start vertex):
L = the minimum search depth before you can safely commit to the correct first move. Not “how long is the answer” — “how far must I look before I can’t go wrong.”
Two ways the ambiguity at the start resolves — take whichever is shallower (that’s the min):
- Route A — reach the goal. Walk the true branch to the goal. Cost =
|P|. - Route B — eliminate distractors. Explore the wrong branches until they all dead-end; the remaining branch must be correct. Bottleneck = the longest distractor (that’s the
max). Cost =maxᵢ |Sᵢ|.
Example 1 — the distractor wins (elimination is cheaper)
┌──────────── true path P ────────────┐
▼ ▼
┌──► A ──► B ──► C ──► D ──► G (goal)
│ ▲
S ───┤ │
(start)│ ──► T ──────┘ ← rejoins P at C (TRAP, not a distractor)
│
├──► K ──► L ← distractor, dead end (length 2)
│
└──► M ──► N ──► O ← distractor, dead end (length 3)
Edges: P: S→A→B→C→D→G (|P| = 5) · T: S→T→C · K: S→K→L · M: S→M→N→O
Distractors Sᵢ (leave start, never touch P again):
- S→K→L ✅ length 2
- S→M→N→O ✅ length 3
- S→T→C ❌ disqualified — rejoins P at C, so it’s not a wrong turn (it still reaches the goal). “Otherwise disjoint” excludes it.
Why 3: explore the distractors — K dead-ends at depth 2, M at depth 3. Once both are exhausted (bottleneck = the longest = 3), the correct branch is forced by elimination. You never walk all 5 steps of P. → Route B wins.
Example 2 — the path wins (a huge distractor gets capped)
S ──► A ──► B ──► G ← true path P (|P| = 3)
S ──► K ──► L ──┐
├──► M ──► N ──► O ──► X ← shared dead-end tail
S ──► P ────────┘
Edges: P: S→A→B→G (|P| = 3) · D1: S→K→L→M→N→O→X (dead end) · D2: S→P→M→N→O→X (dead end, merges into D1 at M)
Distractors Sᵢ:
- D1: S→K→L→M→N→O→X ✅ length 6
- D2: S→P→M→N→O→X ✅ length 5
- They share M→N→O→X with each other — fine. The rule is “disjoint from P,” not disjoint from one another.
Why 3: walking the true branch reaches G in 3 steps. The moment you see the goal you’re certain — no need to chase the 6-long tail to its end. A giant distractor cannot push difficulty above |P|; the min caps it. → Route A wins.
Example 3 — a tie (both routes cost the same)
S ──► A ──► B ──► C ──► G ← true path P (|P| = 4)
S ──► K ──► L ──► M ──► N ← distractor, dead end (length 4)
S ──► P ──► Q ← distractor, dead end (length 2)
Edges: P: S→A→B→C→G (|P| = 4) · D1: S→K→L→M→N (dead end) · D2: S→P→Q (dead end)
Children of S: A (correct — only valid first move), K (wrong), P (wrong).
Distractors Sᵢ:
- D1: S→K→L→M→N ✅ length 4
- D2: S→P→Q ✅ length 2
Why 4 (the tie): the two routes resolve at the same depth. Route A reaches G at depth 4; Route B exhausts the longest distractor (K-branch) at depth 4. Neither is cheaper, so L = 4. The short distractor (P, depth 2) is ruled out early but doesn’t help — you still must wait for the depth-4 resolution.
Where the decision lands (ties back to the elimination logic): only A is correct. At depth 4, simultaneously: the K-branch is confirmed dead and A’s branch touches G. So A is forced both by positive confirmation (Route A) and by elimination (Route B) at once — the choice for the first vertex is over-determined exactly at L = 4.
The contrast (this is what scores in the oral)
| Example 1 | Example 2 | Example 3 | |
|---|---|---|---|
|P| | 5 | 3 | 4 |
| longest distractor | 3 | 6 | 4 |
| L | 3 | 3 | 4 |
| binding route | B — exhaust distractors | A — reach goal | A & B tie |
| lesson | distractor governs when path is long | path caps an over-long distractor | routes coincide → either resolves it |
Same L, opposite reasons. The max = “elimination is bottlenecked by the worst distractor.” The min = “do whichever is easier — reach the goal, or kill all distractors.” Neither term is redundant; each is the binding constraint in a different graph shape.
⚠️ Traps baked in
- Rejoining branches aren’t distractors. A branch that touches P again (Ex.1’s T) isn’t a wrong turn — it still leads to the goal. “Otherwise disjoint” is the keyword.
- A long path is a red herring (Ex.2).
|P|only governs difficulty when it’s the shorter route. - Don’t answer “the longest thing in the graph.” The whole point of the
minis that an over-long distractor can’t raise L above|P|.
➕ Limitation worth volunteering
A wrong turn partway along P (e.g. edge B→Z→dead-end) is not counted: the path S→A→B→Z shares A,B with P, so it isn’t “otherwise disjoint.” Lookahead only measures branching that diverges right at the start — a genuine limitation of the metric.