Quiz: Decision Trees and ID3

Methods of AI — SoSe 2026

7 questions. From definition → mechanics → exam-trap. Type your answer in the **Max's answer:** field below each question, ping me to evaluate.

Q1 — Entropy

Question: A leaf node contains 6 training examples, all labeled Yes. What is its entropy, and what does that tell you about the node — both numerically and intuitively?

Answer

Entropy = 0 bits.
Numerically: H(S) = −Σ pᵢ · log₂(pᵢ). With p(Yes) = 1 and p(No) = 0, the formula gives −1·log₂(1) − 0·log₂(0) = 0 − 0 = 0 (using the convention 0·log₂(0) = 0).
Intuitively: zero entropy = zero impurity = the node is pure = ID3 stops splitting here and declares this leaf with label Yes. No information would be gained by further splitting.

Max’s answer:
Result:

Q2 — Information Gain

Question: ID3 picks the attribute with the highest Information Gain at each split. Why does it use Information Gain — and not, say, just the attribute that produces the smallest entropy in the resulting child nodes?

Answer

Information Gain measures the reduction in entropy: Gain(S, A) = H(S) − Σᵥ (|Sᵥ|/|S|) · H(Sᵥ).
The second term is a weighted average of child entropies — weighted by how many examples each child receives.
If you only minimized child entropy without the weighting, ID3 would prefer to create one tiny pure child and one large impure child, which would barely reduce overall uncertainty.
By measuring the expected entropy after the split (and comparing to the parent), Information Gain captures how much uncertainty you actually eliminate per split, not just whether some child looks pure.

Max’s answer:
Result:

Q4 — Overfitting

Question: Your decision tree has 0 % training error and 25 % test error at depth 20. What’s the technical name for this phenomenon, what bias-variance term does it correspond to, and name two distinct ways to fix it.

Answer

Phenomenon: overfitting.
Bias-variance term: high variance (low bias). The tree fits training noise; small changes in training data → very different tree.
Fixes (any two):

Pre-pruning (early stopping): cap max_depth, set min_samples_leaf, require min Information Gain to split.

Post-pruning (reduced-error pruning): build the full tree, then collapse subtrees whose removal doesn’t hurt validation accuracy.

Use Random Forest (or any bagging ensemble): variance averages out across many trees.

More training data: variance reduces as 1/n; if you can collect more examples, the noise drowns out.

Restrict feature subset at each split (RF-style) → more decorrelated trees.

Max’s answer:
Result:

Q5 — Decision Tree vs. Random Forest

Question: Random Forest fixes a specific weakness of single Decision Trees. What weakness, and how exactly does Random Forest fix it?

Answer

Weakness fixed: high variance (a single tree is brittle — small data changes give very different trees).
How RF fixes it (two complementary tricks):

Bagging (Bootstrap Aggregation): train each tree on a different bootstrap sample of the data. Averaging identically-distributed estimators reduces variance by ~1/n if independent. Bias stays the same.

Feature subsampling at each split: at every node, only consider a random subset of features (typically √p for classification). This decorrelates the trees — without it, all trees would pick the same strong feature near the root and the averaging would barely help. Decorrelation amplifies the variance reduction.
Final prediction = majority vote (classification) or average (regression).

Max’s answer:
Result:

Q6 — Mechanism (short)

Question: In ONE sentence: why is a single Decision Tree considered a high-variance model?

Answer

Because the recursive greedy split chooses each attribute based on the specific training data it sees, so even small changes in the data can flip an early split decision, which propagates downward and produces a structurally very different tree.
(Alt: “Because the recursive top-down construction is unstable — a small data perturbation early in the tree cascades into very different subtrees, so the overall predictions fluctuate a lot across different training samples.“)

Max’s answer:
Result:

Beyond the lecture (optional)

These questions go beyond the SoSe 2026 lecture slides (textbook / external additions). Kept for depth, not exam-critical.

Q3 — ⚠️ Exam trap: high-cardinality attributes

Question: You include UserID as a feature in your training data. ID3 picks it as the root split. The tree has 100 % training accuracy but 50 % test accuracy. What happened, and what’s the standard fix?

Answer

What happened: every UserID value appears exactly once → splitting on it creates one pure leaf per example. Information Gain is maximal (entropy drops to 0 in every child). But the tree has memorized the IDs, not learned any pattern → catastrophic overfitting, useless on new data.
Standard fixes:

Gain Ratio (C4.5): normalize Gain by SplitInformation, which penalizes attributes that split into many branches.

Gini impurity (CART): different impurity measure; less biased toward high-cardinality attributes.

Random feature subsampling at each split (Random Forest trick): only consider a few features per split → high-cardinality attributes don’t always get chosen.

Drop the column — UserID has no predictive signal anyway.

Max’s answer:
Result:

Q7 — Applied judgement

Question: You’re designing a tabular medical decision-support system. Doctors must be able to audit every recommendation. You have ~10,000 patient records and ~30 features. Choose between (a) a single ID3 tree, (b) Random Forest, (c) XGBoost — and justify your choice. There is no single “right” answer; what matters is the trade-off you make explicit.

Answer

A strong defensible answer is (a) a single ID3/CART tree, possibly with pre-pruning + post-pruning. Reasoning:

Auditability is a hard requirement. A doctor can trace any prediction through if-then-else nodes in a single tree. A 100-tree forest is essentially a black box (you can’t explain a vote across 100 trees).

Random Forest and XGBoost would beat the single tree in accuracy by 2–5%, but lose interpretability. With only 10k samples and 30 features, the accuracy gap is often small.

Mitigations for the tree’s variance: prune aggressively (depth 4–6), tune on cross-validation, validate with clinicians on edge cases.

Alternative defensible answer: Random Forest with SHAP values for explanations — gives near-XGBoost accuracy with per-prediction explanations. Slightly less raw interpretability but still auditable.

XGBoost is the wrong choice despite being most accurate — its boosted-tree explanations are post-hoc and harder to defend to regulators (GDPR Article 22).
The exam-relevant insight: when interpretability is a constraint, raw accuracy is not the only objective.

Max’s answer:
Result:

Score

When all 7 are graded:

✓ Correct:
~ Partial:
✗ Wrong:

Topics to re-drill if any wrong:

Q1, Q6 — entropy and variance intuition
Q2 — Information Gain mechanics
Q3 — high-cardinality trap
Q4, Q5 — overfitting fixes + ensemble reasoning
Q7 — applied trade-offs

Brain Online

Explorer

quiz_decision-trees_18-05-26

Quiz: Decision Trees and ID3

Q1 — Entropy

Q2 — Information Gain

Q4 — Overfitting

Q5 — Decision Tree vs. Random Forest

Q6 — Mechanism (short)

Beyond the lecture (optional)

Q3 — ⚠️ Exam trap: high-cardinality attributes

Q7 — Applied judgement

Score

See also

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis