Algorithm Decision Tree — Methods of AI
The meta-skill the exam actually tests. Not “how does X work?” but “for problem of this type, which algorithm do I reach for, and why?” This page is the answer.
Use this as a flowchart. Start at the top question; follow the answer that fits your problem; arrive at the right tool.
Step 1 — What kind of problem are you solving?
| If your problem is about… | Jump to |
|---|---|
| Finding the best parameters / state | §A Optimization |
| Assigning values to variables under constraints | §B Constraint Satisfaction |
| Choosing a sequence of actions over time | §C Planning / Decision Making |
| Representing facts and reasoning about them | §D Knowledge Representation |
| Handling not-knowing-for-sure | §E Uncertainty / Vagueness |
| Predicting a label from features | §F Classification / Regression |
| Storing & retrieving patterns | §G Memory / Associative Recall |
§A · Optimization
“I want to find the x that maximizes (or minimizes) f(x).”
Is f(x) differentiable + you can compute the gradient?
├── YES → Gradient Descent / SGD / Adam [[Gradient Descent]] · [[Gradient Backpropagation]]
│
└── NO → Is the search space continuous or discrete?
│
├── CONTINUOUS → How rugged is the landscape?
│ │
│ ├── Smooth (1 minimum) → Hill Climbing [[Hill Climbing]]
│ ├── Some local optima → Random-Restart Hill Climbing
│ └── Many local optima → Simulated Annealing [[Simulated Annealing]]
│
└── DISCRETE / COMBINATORIAL
│
├── Can solutions be encoded as fixed-length strings?
│ AND crossover would help?
│ → Genetic Algorithm [[Genetic Algorithms]]
│
├── Want parallel exploration of k regions?
│ → Local Beam Search [[Local Beam Search]]
│
├── Want communication between agents?
│ → Stochastic Diffusion Search
│
└── ILP-shaped?
→ MILP solver (Gurobi/CPLEX) [not MoAI, but worth knowing]
Decision-driver questions:
- Can I compute gradients? → if yes, don’t use SA/GA/HC, use SGD.
- How many local optima? → more = warmer SA + slower cooling.
- Is fitness expensive to evaluate? → BayesOpt > SA > GA in sample efficiency.
§B · Constraint Satisfaction
“Many variables, each with a domain, must satisfy constraints simultaneously.”
Are variables purely boolean?
├── YES → Modern SAT solver (CDCL: MiniSat, Glucose, Z3)
│
└── NO → Need optimization objective on top of constraints?
│
├── YES → MILP solver (Gurobi, CPLEX)
│
└── NO → Classical CSP solver:
1. Apply AC-3 preprocessing (Arc Consistency)
2. Backtracking with MRV + Degree heuristic for variable choice
3. LCV heuristic for value choice
[[Constraint Satisfaction Problems]]
Need temporal constraints? → Allen's Tense Logic [[Allen's Tense Logic]]
Need spatial constraints? → RCC-8 [[RCC-8]]
Decision-driver questions:
- Are constraints binary? → AC-3 is O(c·d³); use it as preprocessing.
- High-cardinality domains? → Local search on CSP often beats backtracking.
§C · Planning / Decision Making
“Find a sequence of actions that gets me from initial state to goal.”
Are action outcomes deterministic AND world fully observable?
│
├── YES (deterministic + observable) → Classical Planning
│ │
│ ├── Need full FOL expressivity? → Situation Calculus [[Situation Calculus]]
│ └── Practical use? → STRIPS / PDDL [[STRIPS]] [[PDDL and STRIPS]]
│
├── PROBABILISTIC outcomes + full observation → MDP [[Markov Decision Process (MDP)]]
│ │
│ ├── Small state space → Value Iteration (Bellman optimality)
│ └── Larger state space → Policy Iteration (often fewer iters)
│ [[Bellman Equation]]
│
├── PARTIAL observation → POMDP
│
├── Reward UNKNOWN / dynamics unknown → Reinforcement Learning
│ │
│ ├── Discrete actions → Q-Learning / DQN [[Q-Function]]
│ ├── Continuous actions → SAC, TD3, PPO
│ └── Want to learn from demonstrations → Imitation Learning / Inverse RL
│ [[Reinforcement Learning (RL)]]
⚠️ Exam-critical: STRIPS sidesteps the Frame Problem (CWA + ADD/DEL), it doesn’t solve it. See Frame Problem.
§D · Knowledge Representation
“Encode facts about a domain so a machine can reason.”
Do you need terminological reasoning (classes + relationships)?
│
├── YES → Description Logics (ALC, OWL) [[Description Logics]]
│ ├── TBox: schema-level facts (Student ⊑ Person)
│ └── ABox: instance-level facts (alice : Student)
│
├── Need temporal logic? → Allen's Tense Logic
│
├── Need spatial / qualitative regions? → RCC-8
│
├── Need full expressive logic? → First-Order Logic (undecidable in general)
│
└── Just rules + facts? → Datalog / Prolog (rare in MoAI)
⚠️ DL uses Open World Assumption — absence of evidence ≠ evidence of absence. STRIPS uses Closed World. Easy to mix up.
§E · Uncertainty / Vagueness
“I have to make decisions when I don’t know for sure.”
What kind of "not sure" is it?
│
├── The predicate has no sharp boundary (it's vague)
│ "Bob is tall" — at what height does tallness begin?
│ → Fuzzy Logic [[Fuzzy Logic]]
│ ├── AND ↔ t-norm (min, product, Łukasiewicz)
│ └── OR ↔ s-norm (max, prob-sum, Łukasiewicz)
│
├── The fact is crisp but I don't know which way it'll go
│ "It will rain tomorrow" (yes-or-no, just unknown)
│ → Probability / Bayesian methods
│
└── I have evidence but don't know how to split it across hypotheses
│ "The patient has flu or COVID, evidence is 0.5, can't tell which"
│ → Dempster-Shafer Theory [[Dempster-Shafer Theory]]
│ ├── Bel(A) = lower bound
│ └── Pl(A) = upper bound
⚠️ The most-tested distinction in this unit: Fuzzy ≠ Probability. Memorize: fuzzy = vague predicate, probability = uncertain crisp fact.
§F · Classification / Regression
“Predict a label / value from a feature vector.”
Data type?
│
├── TABULAR (rows × columns)
│ │
│ ├── Need interpretability + audit trail?
│ │ → Single Decision Tree (CART / ID3) [[Decision Trees and ID3]]
│ │
│ ├── Tabular, want strong default?
│ │ → Random Forest [[Random Forest]]
│ │
│ ├── Tabular, want best accuracy?
│ │ → XGBoost / LightGBM / CatBoost (not MoAI but worth knowing)
│ │
│ ├── Linearly separable?
│ │ → Perceptron / Logistic Regression [[Perceptron]]
│ │
│ └── Non-linear, medium data, want margin guarantees?
│ → SVM + kernel trick (RBF is default) [[Support Vector Machines]] [[Kernel Trick]]
│
├── IMAGES → CNN / Vision Transformer (ViT)
│
├── TEXT → Transformer (BERT-style for classification, GPT-style for generation)
│ [[Transformers]] [[Self-Attention]]
│
├── SEQUENCES (audio, time series) → Transformer or State Space Model (Mamba)
│
└── HIGH-DIM, SPARSE → Linear model with L1 regularization
Bias-Variance decision drivers (Bias-Variance Tradeoff):
- Train error low, test error high → overfitting → variance too high → simpler model, regularize, more data, bagging
- Train error AND test error high → underfitting → bias too high → more complex model, more features, less regularization
§G · Memory / Associative Recall
“I want to retrieve a stored pattern from a partial or noisy cue.”
Need to recall a discrete stored pattern from a cue?
│
├── Small, classical, biologically inspired → Hopfield Network [[Hopfield Networks]]
│ └── Capacity ~0.14·N for N neurons
│
└── Modern, scalable, learnable
│
└── Modern Hopfield ≡ Transformer Self-Attention [[Self-Attention]]
(Ramsauer et al. 2020 — exponential capacity, integrates with backprop)
Meta-rules across all sections
When in doubt, ask these 4 questions:
| Question | If “yes,” lean toward… |
|---|---|
| 1. Gradient available? | Gradient descent (SGD/Adam) |
| 2. Many cheap evaluations possible? | Random-Restart HC, Genetic Algorithms |
| 3. Need interpretability / audit? | Decision Trees, Description Logics, classical planners |
| 4. Need optimal solution (proof)? | A*, Branch-and-Bound, Value Iteration |
If none of these is “yes,” you’re in the Simulated Annealing / SA-flavored corner — the “no math required, no guarantees, but tends to work” zone.
Pattern: each algorithm has a successor that replaces it when more info is available
This is the deepest learning across MoAI:
| Algorithm | Replaced when this becomes available |
|---|---|
| Hill Climbing | Gradient → use SGD |
| Simulated Annealing | Gradient → SGD ; probabilistic model → Bayesian Optimization |
| Genetic Algorithms | Gradient → SGD ; differentiable architecture → DARTS |
| STRIPS | Probabilistic outcomes → MDP ; unknown dynamics → RL |
| MDP / Value Iteration | Large state space + neural value function → DQN, MuZero |
| Hopfield Networks | Continuous attention with backprop → Transformer Self-Attention |
| Perceptron | Non-linear separability needed → MLP ; raw inputs → Transformer |
| Decision Trees | Variance too high → Random Forest → Gradient Boosting |
| Kernel SVM | Image / sequence data → CNN / Transformer |
| Description Logics | Open-domain knowledge → LLM + RAG |
| CSP Backtracking | Boolean only → SAT solver ; arithmetic → SMT solver |
→ The exam pattern: when a question asks “why is X used / not used today?” the answer almost always involves “because Y information became available, making Z method viable.”
See also
- Loss Surface — the shared object all optimization algorithms in this tree operate on
- Methods of AI Lecture — full course MOC
- Questions for Methods of AI — Q&A hub
- Methods of AI — Exam Cheat Sheet — companion one-pager (formulas + traps)
Tags: methods-of-ai moc exam-prep decision-tree
Created: 18-05-26