ai-generated methods-of-ai exam-prep

The most important reference papers behind the exam paper — curated for the oral exam (Mon 1 Jun 2026).

Each entry: what it says · why it matters for your paper / your focus topics. #1–#4 are fully worked — that’s the core set you actually need for the oral. #5–#10 stay as one-line stubs (optional breadth; not expanded further).

Companion

Reading order (priority)

✅ Worked in full (the core set for the oral):

  1. 🔥 Merrill & Sabharwal (2024)Expressive Power of Transformers with CoT → learnability ≠ expressivity
  2. 🔥 Sanford, Hsu & Telgarsky (2024)Transformers, Parallel Computation, and Logarithmic Depth → why path-merging is ~log-depth & parallel
  3. 🔥 Kambhampati et al. (2024)LLM-Modulo → the neuro-symbolic context anchor
  4. Saparov & He (2023)Language Models Are Greedy Reasoners → the direct predecessor (same author)

📎 Optional breadth (stubs only — enough secondary lit for now):
5. Bachmann & Nagarajan (2024) — The pitfalls of next-token prediction
6. Jenner et al. (2024) — Evidence of learned look-ahead in a chess-playing NN
7. Creswell et al. (2023) — Selection-Inference
8. Yao et al. (2024) — Tree of Thoughts
9. Heimersheim & Nanda (2024) — How to use and interpret activation patching
10. Wei et al. (2022) Emergent abilities vs. Schaeffer et al. (2023) Emergence a mirage?


1. 🔥 Merrill & Sabharwal (2024) — The Expressive Power of Transformers with Chain of Thought

ICLR 2024 · arXiv:2310.07923 · OpenReview

What it shows (in one breath): standard transformers that answer immediately are stuck in TC⁰ and cannot do graph connectivity. But allowing chain-of-thought (intermediate tokens) expands their power with the number of steps: log steps → L, linear steps → ≈ regular…context-sensitive, polynomial steps → exactly the class P (which includes directed graph connectivity, via steps). I.e. with enough CoT steps a transformer can represent search.

Why it’s relevant to your paper

  • It’s the backbone of your #1 critique: learnability ≠ expressivity. Saparov shows transformers fail to learn search; Merrill & Sabharwal prove transformers can represent it (poly CoT ⇒ P ⊇ connectivity). So Saparov’s failure is not an architectural/representational limit — the capacity provably exists. The wall is trainability, not expressivity.
  • The two papers bracket the whole debate: Merrill & Sabharwal = what’s representable; Saparov = what’s learnable under standard training. That gap is exactly what you’re arguing.
  • Saparov even cites this paper — so naming it signals you read the references, and it pre-empts the rebuttal “but more parameters don’t help” (that’s about trainability at toy scale, not representability).
  • CoT = recurrent state (their core intuition) ↔ Saparov §6: DFS/selection-inference become learnable with constant layers because the depth moves into the trace length. And connectivity needing CoT steps echoes Saparov’s “more steps/FLOPs as the graph grows.”

One-liner if Kühnberger asks "isn't this just an architectural limit of transformers?"

“No — Merrill & Sabharwal (ICLR 2024) prove transformers with chain-of-thought are expressively powerful: polynomial intermediate steps give exactly the class P, which includes graph connectivity. So the architecture can represent search; Saparov’s result is the complementary one — that learning it under standard training fails at scale. The wall is trainability, not expressivity.”


2. 🔥 Sanford, Hsu & Telgarsky (2024) — Transformers, Parallel Computation, and Logarithmic Depth

ICML 2024 · OpenReview

What it shows (in one breath): transformers are best understood as parallel computers, not just sequence models. The authors prove a tight two-way simulation between transformers and the Massively Parallel Computation (MPC) model, and use it to show logarithmic-depth transformers can solve graph problems like connected components (bounded component diameter) with small width — and that this depth is essentially optimal (conditional on a standard MPC hardness conjecture). They also introduce the synthetic -hop induction-heads task (repeated pointer-following), solvable at depth , and find trained transformers match the theory (each extra layer ≈ doubles the they handle) with interpretable attention resembling the constructive proof.

Why it’s relevant to your paper

  • It’s the theoretical backbone of path-merging’s log-depth. Saparov finds the model searches over a number of vertices exponential in the number of layers (≈ layers for connectivity) — and cites Sanford et al. for exactly this. Sanford proves that ~log depth is both sufficient and (conditionally) necessary for connectivity. So your “reachable set doubles per layer ⇒ log-depth transitive closure” isn’t just an empirical observation — it’s the theoretically predicted depth.
  • Same core principle — parallelism. Saparov’s path-merging is parallel (all vertices build reachable sets simultaneously), not serial frontier expansion like BFS. That’s precisely Sanford’s thesis: transformers win on problems with parallel structure. Use this to sharpen your “path-merging vs. BFS/DFS” contrast.
  • Methodological echo. Sanford’s trained transformers learn the parallel pointer-chasing algorithm and the attention patterns match the construction — the same kind of result as Saparov’s mechanistic finding that the trained net implements the path-merging circuit. Two independent papers showing transformers learn the theoretically-predicted parallel algorithm, not a shortcut.
  • The learnability twist (good for discussion). Sanford shows the -hop task is learnable with depth scaling as theory predicts — a positive result. Saparov complicates it: for search at larger graph sizes, training fails. So Sanford = “the parallel algorithm is representable and learnable in clean settings”; Saparov = “but learning it robustly breaks down as the problem grows.” Cite together to show the boundary.

One-liner if asked "why only ~log layers / why is this parallel not like BFS?"

“Sanford, Hsu & Telgarsky (ICML 2024) show transformers are essentially parallel computers (tight link to the MPC model), and that logarithmic depth is sufficient and conditionally optimal for connectivity. That’s why Saparov’s path-merging needs only ~log layers and works on all vertices at once — it’s the parallel, transitive-closure way to do connectivity, not BFS’s serial frontier.”

(Companion: Sanford et al. (2024) — Understanding transformer reasoning capabilities via graph algorithms — same group, graph-task expressivity results in the same spirit.)


3. 🔥 Kambhampati et al. (2024) — Position: LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks

ICML 2024 (Position paper) · arXiv:2402.01817

What it shows (in one breath): as stand-alone systems, autoregressive LLMs cannot robustly plan or self-verify — empirically they produce few executable plans (≈12% for GPT-4 on PlanBench), collapse further when names are obfuscated (“Mystery Blocksworld”), and can’t reliably verify their own solutions (graph coloring, plan validation), so self-critique loops don’t help. The fix is the LLM-Modulo architecture: a generate–test–critique loop where the LLM is only a candidate generator / knowledge source / format-translator, and external model-based critics/verifiers (e.g. VAL) provide the soundness guarantees. With sound critics, this gives big gains (Blocks World ~82%, TravelPlanner ~6×).

Why it’s relevant to your paper

  • It’s your neuro-symbolic context anchor (Kühnberger’s home turf). Saparov’s conclusion — classical search is provably correct and scales while learned search only approximates — argues for hybrid systems. Kambhampati is the concrete proposal for exactly that: let the symbolic verifier guarantee, let the LLM propose. This is the named source for the “LLM-modulo” term you use in Q13/OE5.
  • Same diagnosis, different domain. “LLMs can’t plan/self-verify robustly” mirrors Saparov’s “can’t learn to search robustly” — both conclude LLMs do pattern retrieval / shortcuts, not genuine model-based reasoning.
  • The obfuscation result ≈ the balanced distribution. Performance collapses when action/object names are scrambled (removing familiar-token shortcuts) — the same logic as Saparov engineering shortcuts out via the balanced distribution. Both isolate “real reasoning” by killing the surface cues. Great parallel to draw.
  • CSP bridge (your focus cluster): their verification failure is shown on graph coloring (NP-complete CSP) — connect to backtracking + arc consistency giving exact guarantees a learned model can’t.
  • Confirms the “o3/Gemini search externally” point (Q13). Their critique of Tree-of-Thoughts — “the guarantees come from an external verifier, not the LLM” — is exactly your argument that modern models search by externalizing, not via a smarter forward pass.
  • ⚠️ Honest caveat: this is a Position paper (argument + recap of their benchmarks), not a mechanistic/empirical study like Saparov — cite it for the framework and stance, not for a new controlled result.

One-liner for the neuro-symbolic context block

“Kambhampati et al. (2024) make the constructive version of Saparov’s lesson: since LLMs can’t be trusted to plan or verify, put them in an LLM-Modulo generate–test–critique loop where external sound critics (planners, validators) carry the guarantees and the LLM just proposes candidates. It’s the principled neuro-symbolic answer — the ‘neuro’ part never bears correctness.”

(Companion: Valmeekam et al. (2022) — LLMs still can’t plan (PlanBench) — the benchmark much of the planning evidence rests on.)


4. ⭐ Saparov & He (2023) — Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought

ICLR 2023 · arXiv:2210.01240 · (same first author as the exam paper — the direct predecessor)

What it shows (in one breath): they build PrOntoQA, a synthetic CoT benchmark where every reasoning step maps 1:1 to a formal proof step (modus ponens over small ontologies), so chains can be parsed back into logic and checked. Finding: large LLMs are locally competent but globally greedy — ~93% of individual steps are valid, but at a branching point (multiple valid next steps) the model picks a valid-but-goal-irrelevant (“misleading”) step and drifts away, rarely recovering. Accuracy collapses with proof depth (5-hop ≈ chance) and with unhelpful ordering; self-consistency and DFS-style prompting don’t fix it. World knowledge helps only by letting the model skip steps.

Why it’s relevant to your paper

  • It’s the direct ancestor of your exam paper (same author, Saparov). Greedy-reasoners diagnoses the behavioral failure (“LLMs can’t plan a proof / search”); the 2024 paper asks the deeper mechanistic + learnability question (“can a transformer even learn search, and what algorithm?”). Citing the lineage shows you see the arc.
  • Proof search = graph search — the exact equivalence your exam paper builds on. PrOntoQA is proof search over an ontology; the 2024 paper formalizes it as DAG connectivity. The fictional “wumpus” ontologies (to block pretraining shortcuts) are literally where the 2024 paper’s “If Alex is a wumpus…” natural-language experiment comes from.
  • “Greedy” = no lookahead. The failure at branch points is exactly a lookahead/search failure — the model can’t look ahead to see which valid step leads to the goal. That’s the same competence your exam paper measures via the lookahead parameter. The “wrong turn, can’t recover” finding is the “wrong turn” hallucination your exam paper cites in its intro.
  • Same anti-shortcut methodology. PrOntoQA adds distractor facts and fictional names to kill surface shortcuts — the same logic as the 2024 paper’s balanced distribution (engineering shortcuts out) and ID permutation. Both isolate real reasoning from pattern-matching.
  • CoT doesn’t rescue it — twice. Greedy-reasoners: self-consistency + DFS-style traces don’t help. Exam paper §6: DFS/selection-inference lower depth but don’t beat the size wall. Consistent message across both papers.
  • ⚠️ Caveat: this studies pretrained LLMs (GPT-3 family) via prompting (behavioral); the exam paper trains small transformers from scratch + mechanistic analysis. Different method, same phenomenon — say so.

One-liner linking the two Saparov papers

“Saparov & He (2023) showed LLMs are greedy reasoners — locally valid but unable to plan a proof when several valid steps branch, taking a ‘wrong turn’ they can’t recover from. The 2024 exam paper takes the same author’s question one level deeper: reduce proof search to graph connectivity, and ask not just do LLMs fail, but can a transformer even learn to search, and what algorithm does it learn when it does.”


5.–10. (prioritized stubs — expand on request)

  • Bachmann & Nagarajan (2024) — The pitfalls of next-token prediction. Underpins “shortcut learning vs. real reasoning” → your ML-cluster bridge (inductive bias / spurious correlations).
  • Jenner et al. (2024) — Evidence of learned look-ahead in a chess-playing NN. Counterpoint: look-ahead does emerge in a real net — great for showing nuance in discussion.
  • Creswell et al. (2023) — Selection-Inference. The exact CoT method Saparov tests in §6.
  • Yao et al. (2024) — Tree of Thoughts. Explicit search around the model → the “modern models search externally” argument (Q13/OE5).
  • Heimersheim & Nanda (2024) — How to use and interpret activation patching. The method primer for explaining §3 cleanly.
  • 📎 Wei et al. (2022) — Emergent abilities vs. Schaeffer et al. (2023) — Are emergent abilities a mirage?. The emergence debate Saparov is a counterexample to.

Next step

Tell me which number to expand next (Sanford is the natural #2 — it’s the theoretical backbone of path-merging’s log-depth claim).