Why AI Struggles with Planning

A connective note: it links the exam machinery of Classical Planning + MDPs to the bigger question — why is planning a genuinely hard problem for AI, classical and modern alike? Useful both for the Methods of AI exam (the “three problems” + complexity) and for understanding why today’s LLMs are surprisingly bad at it.

What planning actually demands

Planning = find a sequence of actions that turns an initial state into a goal state. That sounds simple, but it secretly requires four hard things at once:

  1. A world model — knowing how each action changes the state.
  2. Lookahead — simulating consequences several steps ahead before acting.
  3. Search with backtracking — trying a branch, recognising a dead end, and undoing it.
  4. Knowing what stays the same — representing the (huge) part of the world an action does not touch.

Every difficulty below is one of these four breaking down.

1. The classical hardness (the Methods of AI view)

Combinatorial explosion → PSPACE-completeness

The state space is exponential in the number of fluents, and optimal plans can themselves be exponentially long. Classical planning is PSPACE-complete — harder than NP. This is why naïve DFS over states blows up almost immediately (see the search-tree figure in STRIPS: 3 blocks already give 13 states; add blocks and it explodes).

The three representational problems ⚠️

The lecture’s core trio — each is a reason planning is hard to even represent:

  • Frame problem — most things don’t change after an action; representing that efficiently is non-trivial (naïve Situation Calculus needs ~O(actions × fluents) frame axioms). See Frame Problem.
  • Qualification problem — an action has a potentially endless list of preconditions (battery? blocked path? malfunction?).
  • Ramification problem — actions have unintended side-effects (move a shelf → everything on it moves too).

The brittle assumptions

Classical planning assumes the world is deterministic, fully observable, static, and discrete. The real world is none of these. Drop “deterministic + fully observable” and you leave classical planning entirely — you need MDPs (uncertainty) and POMDPs (partial observability), which are even more expensive to solve (see Bellman Equation, value/policy iteration).

2. How classical AI fights back

The whole second half of the planning lecture is essentially coping strategies for the hardness above:

  • Search + heuristics: relaxation heuristics (Ignore-Delete, Ignore-Precondition) make an easier problem whose solution estimates the real cost, steering A* so it doesn’t explore the whole space. Heuristics exist because of the explosion.
  • Compact representations: STRIPS PRE/ADD/DEL + the closed-world assumption sidestep the frame problem syntactically (“anything not in ADD/DEL is unchanged”).
  • Probabilistic planning: MDPs handle the non-determinism classical planning can’t.

3. Why modern LLMs struggle with planning (the 2026 angle)

This is the part that connects the course to current AI — and it’s a clean illustration of the four demands above.

  • Autoregressive ≠ lookahead. An LLM predicts the next token from context (see Transformers / Self-Attention). That is brilliant pattern completion (System-1-like), but it is not search: it doesn’t simulate branches, evaluate them, and backtrack. A plan emitted token-by-token can’t easily “undo” an early bad commitment.
  • No explicit, reliable world model. The model has a fuzzy statistical sense of action effects, not a crisp transition function — so it loses track of state over a long horizon (exactly the frame problem, now as a memory failure).
  • Empirical evidence. Benchmarks like PlanBench (Kambhampati et al.) show LLMs do far better at retrieving familiar plans than generating novel correct ones; accuracy collapses as the horizon grows or the problem drifts out of distribution. They also can’t reliably self-verify their own plans.
  • What narrows the gap (and proves the point). Reasoning models that spend test-time compute on explicit deliberation/search (chain-of-thought, Tree-of-Thoughts, MCTS-style rollouts) plan markedly better — i.e. adding search back in is what helps. The strongest practical recipe is neuro-symbolic / “LLM-Modulo”: let the LLM translate the problem into PDDL and hand it to a real classical planner (Fast Downward), or wrap it with an external verifier. The LLM does language + framing; the classical planner does the planning.

The punchline: modern AI didn’t make classical planning obsolete — it made it load-bearing again. LLMs offload the actual planning to exactly the STRIPS/PDDL + search machinery this course teaches.

4. The cognitive-science angle

Worth noting (your field): humans also struggle with multi-step planning. Tasks like Tower of Hanoi load working memory and prefrontal / executive function heavily; people plan myopically, satisfice, and fail to look far ahead. Planning is a hallmark of deliberative (“System 2”) cognition precisely because it’s effortful — which is the same reason it’s hard to bolt onto a fast pattern-matching system, biological or artificial.

Takeaways (exam bridge)

  • “Why is planning hard?” → (1) combinatorial explosion / PSPACE-completeness, (2) the three problems (frame / qualification / ramification), (3) brittle deterministic-fully-observable assumptions.
  • Classical AI copes via heuristic search + compact representations + MDPs.
  • LLMs struggle because next-token prediction is not search + world-modelling; the fix is neuro-symbolic (LLM → PDDL → real planner).

See also

Tags: methods-of-ai planning llm ai-generated