Information Gain
Information gain aims to reduce entropy — it measures how much knowing the value of an attribute A reduces our uncertainty about the class label of a set S. It is the splitting criterion the ID3 decision-tree algorithm uses: at each node, pick the attribute with the highest information gain.
Formula
- E(S) = entropy of S,
E(S) = −Σᵢ pᵢ log₂ pᵢ(in bits). ⚠️ ThisEis the lecture’s notation for entropy = ShannonH(S); it is not the expectation𝔼. See Machine Learning I & II. - Values(A) = the distinct values attribute A can take; S_v = the subset of S with
A = v;|S_v|/|S|is that subset’s share.
In words: entropy before the split, minus the weighted entropy after the split. A big gain means A splits the data into much purer subsets.
Intuition
- Pure subset (all one class) → entropy 0 → contributes nothing to the weighted sum → high gain.
- 50/50 split → entropy 1 bit → no purity gained → low gain.
- ID3 is greedy: it maximises gain at each node, never looking ahead.
⚠️ The high-cardinality bias
Pure information gain over-favours attributes with many distinct values. An attribute like patient_ID (unique per example) splits S into singleton, perfectly pure subsets → maximal gain → but zero generalisation (overfitting in the extreme).
Fix: Gain Ratio (used in C4.5) divides gain by the intrinsic information of the split, SplitInfo(A) = −Σ_v (|S_v|/|S|) log₂(|S_v|/|S|), penalising high-cardinality attributes. Gain Ratio / C4.5 are textbook content — the slides give Gini impurity as the alternative criterion.
Exam framing
- IG = entropy reduction; it is ID3’s per-node choice.
- Know the formula and the weighted child-entropy term (easy to forget the
|S_v|/|S|weights). - High-cardinality trap → Gain Ratio.
See also
- Machine Learning I & II — entropy, ID3, worked example (the E(S) vs 𝔼 disambiguation)
- Decision Trees and ID3 · Random Forest
Status:
Tags: science methods-of-ai ai-generated
Superlink: 611 📠Machine Learning
610 🤖Artificial Intelligence, Künstliche Intelligenz
Quellen
Erstellt: 15-01-25 19:53