Selection-Inference

methods-of-ai ai-generated

Selection-Inference (SI) is a chain-of-thought reasoning framework for LLMs (Creswell, Shanahan & Higgins, 2023) that makes multi-step reasoning interpretable by splitting every reasoning step into two clean, separate subtasks that alternate:

  1. Selection — out of everything currently known, point at the relevant pieces.
  2. Inference — derive one new fact using only what was selected.

Loop selection → inference → selection → inference… and you build a reasoning trace where you can read off exactly which facts were used and what was concluded at each step. The model is never asked to “reason” in one opaque jump — each atomic decision is exposed.

Why this note exists

It’s the [[Selection-Inference]] concept referenced in the exam-paper dossier. Saparov et al. (2024) use SI as one of the two chain-of-thought “prompting” setups in §6 (“Does in-context exploration help?”). See pruefung_paper-transformers-search_25-05-26 · references_paper-transformers-search_28-05-26.

The original idea (Creswell et al. 2023)

The motivation is trustworthy logical reasoning. A monolithic LLM that emits a chain of thought can hallucinate steps that don’t follow from anything. SI constrains the model:

  • Selection step: given a context of facts + a goal, select the specific premises relevant to the next inference (e.g. pick the rule and the fact it applies to).
  • Inference step: feed only those selected facts to the model and have it produce a single new fact — nothing else is in scope, so it can’t smuggle in unsupported claims.

Because each step is small and its inputs are explicit, the whole chain is auditable and far less prone to the “valid-looking but unsupported” steps that plague free-form CoT. (Compare Language Models Are Greedy Reasoners — the failure SI tries to prevent.)

The graph-search version (Saparov et al. 2024)

To test whether a transformer can learn SI, Saparov reduces it to graph connectivity (find a path from start to goal in a DAG). One search step becomes:

SubtaskIn SI termsOn the graph
Selectionpick the relevant known factselect a visited vertex that still has an unvisited child (a frontier vertex)
Inferencederive a new fact from itgiven that vertex, infer one of its unvisited children (add it to visited)

Starting from the start vertex and repeating, the visited set grows by one vertex per step until the goal appears. Mechanically this is BFS-style frontier expansion made fully explicit — every “pick a node / expand a node” decision is its own visible step.

Two variables define the difficulty (and the balanced-style training distribution):

  • Frontier size F — number of visited vertices that currently have unvisited children.
  • Branch count B — number of children of the current vertex at an inference step.

Graphs are sampled so the pair (F, B) is jointly uniform — the same anti-shortcut logic as the balanced distribution: don’t let the model win by exploiting that most graphs are trivially small. Because SI has two subtasks, inputs encode the list of visited edges (not just vertices), so the model can tell selection-context from inference-context.

Worked demo (Python)

A minimal selection-inference loop on a small DAG. It logs every SELECT → INFER decision, exactly the interpretable trace SI is designed to produce:

graph = {"s": ["a", "b"], "a": ["c"], "b": ["d", "e"],
         "c": ["g"], "d": ["f"], "e": [], "f": [], "g": []}
start, goal = "s", "g"
 
def children(v): return graph.get(v, [])
 
def select(visited):                       # subtask 1: a visited vertex with an unvisited child
    frontier = [v for v in visited if any(c not in visited for c in children(v))]
    return frontier[0] if frontier else None
 
def infer(v, visited):                     # subtask 2: one unvisited child of v
    return next((c for c in children(v) if c not in visited), None)
 
def selection_inference():
    visited = [start]
    while goal not in visited:
        sel = select(visited)              # --- SELECT ---
        if sel is None: return None        # frontier empty -> goal unreachable
        child = infer(sel, visited)        # --- INFER ---
        visited.append(child)
        print(f"SELECT {sel!r} -> INFER {child!r} | visited={visited}")
    return visited
 
selection_inference()

Output:

SELECT 's' -> INFER 'a' | visited=['s', 'a']
SELECT 's' -> INFER 'b' | visited=['s', 'a', 'b']
SELECT 'a' -> INFER 'c' | visited=['s', 'a', 'b', 'c']
SELECT 'b' -> INFER 'd' | visited=['s', 'a', 'b', 'c', 'd']
SELECT 'b' -> INFER 'e' | visited=['s', 'a', 'b', 'c', 'd', 'e']
SELECT 'c' -> INFER 'g' | visited=['s', 'a', 'b', 'c', 'd', 'e', 'g']

Each line is one full search step; the visited set is the growing frontier; the goal g is reached via s → a → c → g. A trained transformer would have to learn to produce each SELECT/INFER token, not compute them with this clean code.

Why it matters for the exam paper

  • It’s one of the two CoT setups (alongside DFS) used to ask “does thinking step-by-step rescue search?”
  • Result: with the depth unrolled across SI steps, the model learns the task with only a constant number of layers (4, vs. log-depth for single-pass exponential path-merging) — one more than DFS’s 3 because each SI step composes two subtasks instead of one.
  • But it still fails on larger graphs, and scaling doesn’t help. So SI changes the depth requirement, not the fundamental learnability-at-scale wall. ⚠️ This is the key oral-exam point: CoT helps the representation, not the learning-to-search-at-scale problem.

One-liner if asked "what is selection-inference?"

“A chain-of-thought framework (Creswell et al. 2023) that splits each reasoning step into select the relevant facts, then infer one new fact from only those — making the whole chain interpretable. Saparov tests it on graph search: select a frontier vertex, infer an unvisited child. It becomes learnable with constant depth but still breaks on big graphs.”

See also

Created: 31/05/26