Quiz: Machine Learning I & II
Methods of AI — SoSe 2026
Q2 — ML
Question: How does the ID3 algorithm build a decision tree? What does it optimize?
Answer
ID3 greedily selects the attribute with highest Information Gain at each node.
- Calculate entropy H(S) = −Σ pᵢ log₂(pᵢ) for the current set S.
- For each attribute A, calculate Gain(S,A) = H(S) − Σ_{v∈Values(A)} (|Sᵥ|/|S|)·H(Sᵥ)
- Choose the attribute with highest gain → creates a branch.
- Recurse on each subset Sᵥ until all examples have same class (entropy = 0) or no attributes left.
It looks for the shortest decision tree (highest info gain first = fewest needed splits). Can overfit — deep trees memorize training noise.
Max’s answer:
Result:
Q3 — ML
Question: Why are Random Forests often better than a single decision tree? What is bagging?
Answer
Random Forests train many decision trees on random subsets of training data (bootstrap samples) and random subsets of features at each split. Prediction = majority vote.
Why better: individual trees are accurate but correlated if trained on same data. Random subsampling ensures diversity — trees make different errors. Averaging diverse errors reduces variance (overfitting) while keeping low bias.
Bagging: bootstrap aggregation — each tree is trained on a random sample (with replacement) of the training data. Trees train independently and vote.
Random Forest = Bagging + random feature selection at each split.
Max’s answer:
Result:
Q4 — ML
Question: What is entropy in the context of decision trees? When is entropy maximized and minimized?
Answer
Entropy H(S) = −Σ pᵢ log₂(pᵢ) measures the impurity of a set S.
- Minimum (= 0): all examples in S have the same class label → pᵢ = 1 for one class, 0 for all others → −1·log₂(1) = 0. Pure node.
- Maximum: classes are equally distributed → most uncertainty. For binary case: H = 1 when p = 0.5 (50/50 split). For k classes: max H = log₂(k).
Information gain = reduction in entropy after splitting on an attribute. Higher gain = better split.
Max’s answer:
Result:
Q5 — ML
Question: What are the three main types of learning? Give one concrete algorithm for each.
Answer
- Supervised learning: labeled training data (input → correct output). Goal: learn a function to predict output for new inputs.
Algorithm: decision tree (ID3), SVM, perceptron.- Unsupervised learning: no labels. Goal: find hidden structure/patterns in data.
Algorithm: k-means clustering, hierarchical clustering.- Reinforcement learning: agent interacts with environment, receives rewards/penalties. Goal: learn policy to maximize long-term reward.
Algorithm: Q-learning, policy iteration for MDPs.
(Also: semi-supervised, self-supervised — but these 3 are the classical trio.)
Max’s answer:
Result:
Q6 — ML
Question: What is the inductive bias of a learning algorithm? Why is it necessary?
Answer
The inductive bias is the set of prior assumptions a learner makes to generalize from training examples to unseen data.
Why necessary: without any assumptions, a learner could only “memorize” examples — no generalization is possible. Any generalization beyond the training set requires assuming that patterns will hold.
Formally: B is the inductive bias if: for all x∈U, B ∧ D ∧ x ⊨ L(x,D) — the bias plus training data logically entails the prediction.
Example: “occam’s razor” (prefer simpler models) is an inductive bias — simple boundaries generalize better.
Max’s answer:
Result:
Q7 — ML
Question: Describe k-means clustering. What does it optimize, and what is its main limitation?
Answer
K-means:
- Initialize k cluster centroids (randomly or with k-means++)
- Assign each data point to nearest centroid
- Recompute centroids as means of assigned points
- Repeat steps 2-3 until assignments don’t change
Optimizes: minimizes within-cluster sum of squared distances: E = (1/|D|) Σⱼ ‖xⱼ − w_{m(xⱼ)}‖²
Each step guarantees E doesn’t increase. Converges to a local minimum.
Main limitation: depends heavily on initialization — different random starts → different results. Only finds local optima, not global minimum. Also assumes spherical clusters (Euclidean distance).
Max’s answer:
Result:
Beyond the lecture (optional)
These questions go beyond the SoSe 2026 lecture slides (textbook / external additions). Kept for depth, not exam-critical.
Q1 — ML
Question: What is the bias-variance tradeoff? How does model complexity affect each?
Answer
- Bias: systematic error from wrong assumptions. High-bias models are too simple (underfitting) — e.g. fitting a straight line to curved data. Error is consistent and predictable.
- Variance: sensitivity to small fluctuations in training data. High-variance models overfit — they memorize noise. Error changes a lot with different training sets.
Total error ≈ bias² + variance + irreducible noise.- Increasing complexity → decreases bias, increases variance.
- Decreasing complexity → increases bias, decreases variance.
Goal: find the sweet spot. Cross-validation helps estimate it.
Max’s answer:
Result:
Score
Total: / 7