Quiz: Planning I & II

Methods of AI — SoSe 2026

Q1 — Planning I

Question: Describe the three fundamental problems in planning: frame, qualification, and ramification. Give one example each.

Answer

Frame problem: When an action occurs, how do we efficiently represent what doesn’t change? Naive approach: enumerate all unchanged facts explicitly (expensive). Example: moving block A doesn’t change blocks B, C, D — but must state this somehow.

Qualification problem: An action may have an unlimited list of preconditions we can never fully enumerate. Example: “move robot” — fails if battery dead, motor broken, path blocked by another robot, weight too heavy…

Ramification problem: An action may have unintended side-effects. Example: moving a storage unit (PSU) also moves all items stored inside it.

Max’s answer:
Result:

Q2 — Planning I

Question: What are the three components of a STRIPS operator? How does STRIPS “solve” the frame problem?

Answer

A STRIPS operator has:

PRE (Preconditions): set of atoms that must be true before the operator applies

ADD list: atoms that become true after the operator

DEL list: atoms that become false after the operator
Frame problem in STRIPS: solved implicitly by convention — everything NOT in the ADD or DEL lists stays unchanged. No explicit frame axioms needed. This is the “closed-world + default persistence” assumption.
Note: STRIPS uses the closed-world assumption (what’s not listed = false).

Max’s answer:
Result:

Q3 — Planning I

Question: What is the difference between Situation Calculus and STRIPS in how they handle knowledge representation?

Answer

Situation Calculus: uses first-order predicate logic. Predicates that change over time are fluents with an extra situation argument. Plans computed via theorem provers. Does NOT use closed-world assumption. Frame problem requires explicit frame axioms for every action×fact combination — expensive.

STRIPS: state-based, uses sets of ground atoms. Actions have explicit PRE/ADD/DEL lists. Uses closed-world assumption. Frame problem “solved” by convention — cheaper but less expressive.
STRIPS is more practical; Situation Calculus more expressive and general.

Max’s answer:
Result:

Q4 — Planning II

Question: Give the formal definition of an MDP. What does each component represent?

Answer

An MDP is a 4-tuple (S, A, p, r):

S: finite set of states

A: finite set of actions

p(s’|s, a): state transition probability — probability of reaching state s’ from state s after action a. Must sum to 1 over all s’ for any (s, a).

r(s, a): reward function — immediate reward for performing action a in state s
The Markov property holds: the outcome depends only on the current state, not the history.

Max’s answer:
Result:

Q5 — Planning II

Question: What is the Bellman equation for policy evaluation? What does each term represent?

Answer

For a fixed deterministic policy π, the Bellman equation is:
Vᵖ(s) = r(s, π(s)) + γ · Σₛ' p(s'|s, π(s)) · Vᵖ(s')

Vᵖ(s): value of state s under policy π (expected discounted future reward)

r(s, π(s)): immediate reward for taking action π(s) in state s

γ: discount factor ∈ [0,1) — how much future rewards are worth relative to now

Σₛ' p(s'|s, π(s)) · Vᵖ(s'): expected value of the next state under the transition model
Solved iteratively by initializing all values to 0 and updating until convergence.

Max’s answer:
Result:

Q6 — Planning II

Question: Describe policy iteration. What are the two phases, and when does it terminate?

Answer

Policy iteration alternates two phases:

Policy Evaluation: given the current policy π, compute the value function Vᵖ(s) for all states using the Bellman equation iteratively until convergence.

Policy Improvement: for each state, update the policy to the greedy action:
π'(s) = argmax_a [r(s,a) + γ · Σₛ' p(s'|s,a) · Vᵖ(s')]
Termination: when π’ = π (no improvement), the policy is optimal (satisfies Bellman Optimality Equation).
Guaranteed to converge under mild conditions, but can cycle between equal-value actions — handled by Modified Policy Iteration.

Max’s answer:
Result:

Q7 — Planning II

Question: Why is γ = 1 (no discounting) problematic in MDPs without terminal states?

Answer

With γ = 1, the total reward is r₁ + r₂ + r₃ + ... — an infinite sum that can diverge if the agent never reaches a terminal state. Even small constant negative rewards (like -0.04 per step) would sum to −∞.
With γ < 1, the geometric series converges: Σ γⁿ · r = r / (1-γ).
γ = 1 is only safe if the problem is guaranteed to terminate (episodic). For infinite-horizon problems, always use γ < 1.

Max’s answer:
Result:

Q8 — Planning

Question: What is PDDL, and how does it extend STRIPS?

Answer

PDDL (Planning Domain Definition Language) extends STRIPS significantly:

STRIPS has just PRE/ADD/DEL lists with ground atoms

PDDL adds: typing (:types), numeric fluents (e.g. Contains(P, I, 100)), derived predicates (computed automatically), conditional effects, action costs

PDDL separates domain (action templates, types) from problem (initial state + goal)

Used in International Planning Competition (IPC) since 1998

Declarative — same domain + problem can be solved by any compatible planner

Syntax is LISP-like: (define (domain name) (:action ...) )

Max’s answer:
Result:

Score

Total: / 8

Brain Online

Explorer

quiz_planning_30-04-26

Quiz: Planning I & II

Q1 — Planning I

Q2 — Planning I

Q3 — Planning I

Q4 — Planning II

Q5 — Planning II

Q6 — Planning II

Q7 — Planning II

Q8 — Planning

Score

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis