Quiz: Planning I & II
Methods of AI — SoSe 2026
Q1 — Planning I
Question: Describe the three fundamental problems in planning: frame, qualification, and ramification. Give one example each.
Answer
- Frame problem: When an action occurs, how do we efficiently represent what doesn’t change? Naive approach: enumerate all unchanged facts explicitly (expensive). Example: moving block A doesn’t change blocks B, C, D — but must state this somehow.
- Qualification problem: An action may have an unlimited list of preconditions we can never fully enumerate. Example: “move robot” — fails if battery dead, motor broken, path blocked by another robot, weight too heavy…
- Ramification problem: An action may have unintended side-effects. Example: moving a storage unit (PSU) also moves all items stored inside it.
Max’s answer:
Result:
Q2 — Planning I
Question: What are the three components of a STRIPS operator? How does STRIPS “solve” the frame problem?
Answer
A STRIPS operator has:
- PRE (Preconditions): set of atoms that must be true before the operator applies
- ADD list: atoms that become true after the operator
- DEL list: atoms that become false after the operator
Frame problem in STRIPS: solved implicitly by convention — everything NOT in the ADD or DEL lists stays unchanged. No explicit frame axioms needed. This is the “closed-world + default persistence” assumption.
Note: STRIPS uses the closed-world assumption (what’s not listed = false).
Max’s answer:
Result:
Q3 — Planning I
Question: What is the difference between Situation Calculus and STRIPS in how they handle knowledge representation?
Answer
- Situation Calculus: uses first-order predicate logic. Predicates that change over time are fluents with an extra situation argument. Plans computed via theorem provers. Does NOT use closed-world assumption. Frame problem requires explicit frame axioms for every action×fact combination — expensive.
- STRIPS: state-based, uses sets of ground atoms. Actions have explicit PRE/ADD/DEL lists. Uses closed-world assumption. Frame problem “solved” by convention — cheaper but less expressive.
STRIPS is more practical; Situation Calculus more expressive and general.
Max’s answer:
Result:
Q4 — Planning II
Question: Give the formal definition of an MDP. What does each component represent?
Answer
An MDP is a 4-tuple (S, A, p, r):
- S: finite set of states
- A: finite set of actions
- p(s’|s, a): state transition probability — probability of reaching state s’ from state s after action a. Must sum to 1 over all s’ for any (s, a).
- r(s, a): reward function — immediate reward for performing action a in state s
The Markov property holds: the outcome depends only on the current state, not the history.
Max’s answer:
Result:
Q5 — Planning II
Question: What is the Bellman equation for policy evaluation? What does each term represent?
Answer
For a fixed deterministic policy π, the Bellman equation is:
Vᵖ(s) = r(s, π(s)) + γ · Σₛ' p(s'|s, π(s)) · Vᵖ(s')
Vᵖ(s): value of state s under policy π (expected discounted future reward)r(s, π(s)): immediate reward for taking action π(s) in state sγ: discount factor ∈ [0,1) — how much future rewards are worth relative to nowΣₛ' p(s'|s, π(s)) · Vᵖ(s'): expected value of the next state under the transition model
Solved iteratively by initializing all values to 0 and updating until convergence.
Max’s answer:
Result:
Q6 — Planning II
Question: Describe policy iteration. What are the two phases, and when does it terminate?
Answer
Policy iteration alternates two phases:
- Policy Evaluation: given the current policy π, compute the value function Vᵖ(s) for all states using the Bellman equation iteratively until convergence.
- Policy Improvement: for each state, update the policy to the greedy action:
π'(s) = argmax_a [r(s,a) + γ · Σₛ' p(s'|s,a) · Vᵖ(s')]
Termination: when π’ = π (no improvement), the policy is optimal (satisfies Bellman Optimality Equation).
Guaranteed to converge under mild conditions, but can cycle between equal-value actions — handled by Modified Policy Iteration.
Max’s answer:
Result:
Q7 — Planning II
Question: Why is γ = 1 (no discounting) problematic in MDPs without terminal states?
Answer
With γ = 1, the total reward is
r₁ + r₂ + r₃ + ...— an infinite sum that can diverge if the agent never reaches a terminal state. Even small constant negative rewards (like -0.04 per step) would sum to −∞.
With γ < 1, the geometric series converges:Σ γⁿ · r = r / (1-γ).
γ = 1 is only safe if the problem is guaranteed to terminate (episodic). For infinite-horizon problems, always use γ < 1.
Max’s answer:
Result:
Q8 — Planning
Question: What is PDDL, and how does it extend STRIPS?
Answer
PDDL (Planning Domain Definition Language) extends STRIPS significantly:
- STRIPS has just PRE/ADD/DEL lists with ground atoms
- PDDL adds: typing (:types), numeric fluents (e.g. Contains(P, I, 100)), derived predicates (computed automatically), conditional effects, action costs
- PDDL separates domain (action templates, types) from problem (initial state + goal)
- Used in International Planning Competition (IPC) since 1998
- Declarative — same domain + problem can be solved by any compatible planner
- Syntax is LISP-like:
(define (domain name) (:action ...) )
Max’s answer:
Result:
Score
Total: / 8