Lernzettel: Planning II — MDPs & Probabilistic Planning

Methods of AI — SoSe 2026 · 1-page exam sheet

For more depth: Markov Decision Process (MDP) (full reference: all 6 algorithms with pseudocode, worked Grid World, deep explanations) · quiz_planning_30-04-26 (8 practice questions)

Core Ideas

Classical planning assumes deterministic actions. MDPs handle uncertainty (stochastic outcomes) AND long-term consequences.
An MDP gives a probability to each outcome of each action, and a reward for each state-action pair.
The goal is to find an optimal policy π — a mapping from states to actions that maximises expected discounted reward.
Key property: Markov = future depends only on the current state, not the history.

Mini-glossary

Symbol	Meaning
S	set of states
A	set of actions
p(s’\|s, a)	transition probability; Σ_{s’} p(s’\|s,a) = 1
r(s, a)	immediate reward for doing a in s
π : S → A	deterministic policy (defined for every state)
V^π(s)	value of s under π — expected discounted reward
*V(s)**	optimal value function
γ ∈ [0, 1]	discount factor — slides say “between 0 and 1”; γ < 1 needed for convergence in non-terminating MDPs
Markov property	P(Sₜ₊₁ \| Sₜ, Aₜ, history) = P(Sₜ₊₁ \| Sₜ, Aₜ)

⭐ Full glossary + Grid World example → Mini-glossary

Formulas

Discounted return

E_π[r₁] + γ · E_π[r₂] + γ² · E_π[r₃] + γ³ · E_π[r₄] + …

Bellman equation (Policy Evaluation, fixed π)

V^π(s) = r(s, π(s)) + γ · Σ_{s'} p(s' | s, π(s)) · V^π(s')

Bellman optimality equation (defines V*)

V*(s) = max_a [ r(s, a) + γ · Σ_{s'} p(s' | s, a) · V*(s') ]

Greedy policy extraction

π*(s) = argmax_a [ r(s, a) + γ · Σ_{s'} p(s' | s, a) · V*(s') ]

Common Exam Traps ⚠️

γ is NOT in the MDP tuple. The tuple is (S, A, p, r). γ belongs to the evaluation criterion. (Q117/Q119.)
γ ∈ [0, 1] inclusive — the slides say “between 0 and 1”. γ = 1 is allowed but values may diverge in non-terminating MDPs (slide 67, “termination problem”). γ < 1 is needed for guaranteed convergence.
Markov property = outcome depends only on current state and action, NOT history. Defining property of MDPs.
Policy ≠ plan. A policy covers ALL states; a classical plan is a sequence for one initial state.
Policy evaluation ≠ policy improvement. Evaluation = compute V^π for fixed π. Improvement = update π given V^π.
Optimal policy depends on γ. Same MDP, different γ → different optimum.
Transition function is a probability distribution: Σ_{s’} p(s’ | s, a) = 1.

Quick Comparisons

Classical Planning vs. MDP

Property	Classical	MDP
Action outcomes	Deterministic	Probabilistic
Objective	action sequence	policy π : S → A
State coverage	from initial state	every state
Long-term view	finite plan	(discounted) infinite horizon

Policy Iteration vs. Value Iteration

Aspect	Policy Iteration	Value Iteration
Per-iter work	Heavy (full eval inside)	Cheap (one Bellman backup)
# outer iters	Few	Many
Stopping	π stops changing	\|V_{k+1} − V_k\| < ε

Full pros/cons of all 6 algorithms → Algorithm comparison table — pros & cons

Quiz

quiz_planning_30-04-26 — 8 Qs (covers both Planning I + II)

Targeted exam questions in Questions for Methods of AI

Q40–46 (basic: MDP, policy, value iteration, Bellman) · Q100 (Discount factor γ) · Q117–119 (deep / exam-trap: V^π vs V*, VI vs PI trade-offs, why γ is NOT in the MDP tuple)

Atomic notes

Markov Decision Process (MDP) · Bellman Equation · Q-Function · Reinforcement Learning (RL)

Sibling Lernzettel

lernzettel_planning-i_30-04-26 — Planning I (Classical / STRIPS / Situation Calculus)

Brain Online

Explorer

lernzettel_planning-ii_30-04-26

Lernzettel: Planning II — MDPs & Probabilistic Planning

Core Ideas

Mini-glossary

Formulas

Common Exam Traps ⚠️

Quick Comparisons

Classical Planning vs. MDP

Policy Iteration vs. Value Iteration

See also

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis

Brain Online

Explorer

lernzettel_planning-ii_30-04-26

Lernzettel: Planning II — MDPs & Probabilistic Planning

Core Ideas

Mini-glossary

Formulas

Common Exam Traps ⚠️

Quick Comparisons

Classical Planning vs. MDP

Policy Iteration vs. Value Iteration

Related Q&A & Notes

See also

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis