Algorithm Decision Tree — Methods of AI

methods-of-ai exam-prep

The meta-skill the exam actually tests. Not “how does X work?” but “for problem of this type, which algorithm do I reach for, and why?” This page is the answer.

Use this as a flowchart. Start at the top question; follow the answer that fits your problem; arrive at the right tool.


Step 1 — What kind of problem are you solving?

If your problem is about…Jump to
Finding the best parameters / state§A Optimization
Assigning values to variables under constraints§B Constraint Satisfaction
Choosing a sequence of actions over time§C Planning / Decision Making
Representing facts and reasoning about them§D Knowledge Representation
Handling not-knowing-for-sure§E Uncertainty / Vagueness
Predicting a label from features§F Classification / Regression
Storing & retrieving patterns§G Memory / Associative Recall

§A · Optimization

“I want to find the x that maximizes (or minimizes) f(x).”

Is f(x) differentiable + you can compute the gradient?
├── YES  →  Gradient Descent / SGD / Adam               [[Gradient Descent]] · [[Gradient Backpropagation]]
│
└── NO   →  Is the search space continuous or discrete?
            │
            ├── CONTINUOUS → How rugged is the landscape?
            │   │
            │   ├── Smooth (1 minimum) → Hill Climbing  [[Hill Climbing]]
            │   ├── Some local optima  → Random-Restart Hill Climbing
            │   └── Many local optima  → Simulated Annealing  [[Simulated Annealing]]
            │
            └── DISCRETE / COMBINATORIAL
                │
                ├── Can solutions be encoded as fixed-length strings?
                │   AND crossover would help?
                │   → Genetic Algorithm  [[Genetic Algorithms]]
                │
                ├── Want parallel exploration of k regions?
                │   → Local Beam Search  [[Local Beam Search]]
                │
                ├── Want communication between agents?
                │   → Stochastic Diffusion Search
                │
                └── ILP-shaped?
                    → MILP solver (Gurobi/CPLEX)         [not MoAI, but worth knowing]

Decision-driver questions:

  • Can I compute gradients? → if yes, don’t use SA/GA/HC, use SGD.
  • How many local optima? → more = warmer SA + slower cooling.
  • Is fitness expensive to evaluate? → BayesOpt > SA > GA in sample efficiency.

§B · Constraint Satisfaction

“Many variables, each with a domain, must satisfy constraints simultaneously.”

Are variables purely boolean?
├── YES → Modern SAT solver (CDCL: MiniSat, Glucose, Z3)
│
└── NO  → Need optimization objective on top of constraints?
         │
         ├── YES → MILP solver (Gurobi, CPLEX)
         │
         └── NO  → Classical CSP solver:
                   1. Apply AC-3 preprocessing (Arc Consistency)
                   2. Backtracking with MRV + Degree heuristic for variable choice
                   3. LCV heuristic for value choice
                   [[Constraint Satisfaction Problems]]

Need temporal constraints?  → Allen's Tense Logic  [[Allen's Tense Logic]]
Need spatial constraints?   → RCC-8                [[RCC-8]]

Decision-driver questions:

  • Are constraints binary? → AC-3 is O(c·d³); use it as preprocessing.
  • High-cardinality domains? → Local search on CSP often beats backtracking.

§C · Planning / Decision Making

“Find a sequence of actions that gets me from initial state to goal.”

Are action outcomes deterministic AND world fully observable?
│
├── YES (deterministic + observable) → Classical Planning
│   │
│   ├── Need full FOL expressivity? → Situation Calculus  [[Situation Calculus]]
│   └── Practical use?                → STRIPS / PDDL     [[STRIPS]] [[PDDL and STRIPS]]
│
├── PROBABILISTIC outcomes + full observation → MDP  [[Markov Decision Process (MDP)]]
│   │
│   ├── Small state space → Value Iteration       (Bellman optimality)
│   └── Larger state space → Policy Iteration     (often fewer iters)
│   [[Bellman Equation]]
│
├── PARTIAL observation → POMDP
│
├── Reward UNKNOWN / dynamics unknown → Reinforcement Learning
│   │
│   ├── Discrete actions → Q-Learning / DQN          [[Q-Function]]
│   ├── Continuous actions → SAC, TD3, PPO
│   └── Want to learn from demonstrations → Imitation Learning / Inverse RL
│   [[Reinforcement Learning (RL)]]

⚠️ Exam-critical: STRIPS sidesteps the Frame Problem (CWA + ADD/DEL), it doesn’t solve it. See Frame Problem.


§D · Knowledge Representation

“Encode facts about a domain so a machine can reason.”

Do you need terminological reasoning (classes + relationships)?
│
├── YES → Description Logics (ALC, OWL)  [[Description Logics]]
│   ├── TBox: schema-level facts (Student ⊑ Person)
│   └── ABox: instance-level facts (alice : Student)
│
├── Need temporal logic?                 → Allen's Tense Logic
│
├── Need spatial / qualitative regions?  → RCC-8
│
├── Need full expressive logic?          → First-Order Logic (undecidable in general)
│
└── Just rules + facts?                  → Datalog / Prolog (rare in MoAI)

⚠️ DL uses Open World Assumption — absence of evidence ≠ evidence of absence. STRIPS uses Closed World. Easy to mix up.


§E · Uncertainty / Vagueness

“I have to make decisions when I don’t know for sure.”

What kind of "not sure" is it?
│
├── The predicate has no sharp boundary (it's vague)
│   "Bob is tall" — at what height does tallness begin?
│   → Fuzzy Logic  [[Fuzzy Logic]]
│   ├── AND ↔ t-norm (min, product, Łukasiewicz)
│   └── OR  ↔ s-norm (max, prob-sum, Łukasiewicz)
│
├── The fact is crisp but I don't know which way it'll go
│   "It will rain tomorrow" (yes-or-no, just unknown)
│   → Probability / Bayesian methods
│
└── I have evidence but don't know how to split it across hypotheses
│   "The patient has flu or COVID, evidence is 0.5, can't tell which"
│   → Dempster-Shafer Theory  [[Dempster-Shafer Theory]]
│   ├── Bel(A) = lower bound
│   └── Pl(A)  = upper bound

⚠️ The most-tested distinction in this unit: Fuzzy ≠ Probability. Memorize: fuzzy = vague predicate, probability = uncertain crisp fact.


§F · Classification / Regression

“Predict a label / value from a feature vector.”

Data type?
│
├── TABULAR (rows × columns)
│   │
│   ├── Need interpretability + audit trail?
│   │   → Single Decision Tree (CART / ID3)     [[Decision Trees and ID3]]
│   │
│   ├── Tabular, want strong default?
│   │   → Random Forest                          [[Random Forest]]
│   │
│   ├── Tabular, want best accuracy?
│   │   → XGBoost / LightGBM / CatBoost           (not MoAI but worth knowing)
│   │
│   ├── Linearly separable?
│   │   → Perceptron / Logistic Regression       [[Perceptron]]
│   │
│   └── Non-linear, medium data, want margin guarantees?
│       → SVM + kernel trick (RBF is default)    [[Support Vector Machines]] [[Kernel Trick]]
│
├── IMAGES → CNN / Vision Transformer (ViT)
│
├── TEXT → Transformer (BERT-style for classification, GPT-style for generation)
│   [[Transformers]] [[Self-Attention]]
│
├── SEQUENCES (audio, time series) → Transformer or State Space Model (Mamba)
│
└── HIGH-DIM, SPARSE → Linear model with L1 regularization

Bias-Variance decision drivers (Bias-Variance Tradeoff):

  • Train error low, test error high → overfitting → variance too high → simpler model, regularize, more data, bagging
  • Train error AND test error high → underfitting → bias too high → more complex model, more features, less regularization

§G · Memory / Associative Recall

“I want to retrieve a stored pattern from a partial or noisy cue.”

Need to recall a discrete stored pattern from a cue?
│
├── Small, classical, biologically inspired → Hopfield Network  [[Hopfield Networks]]
│   └── Capacity ~0.14·N for N neurons
│
└── Modern, scalable, learnable
    │
    └── Modern Hopfield ≡ Transformer Self-Attention  [[Self-Attention]]
        (Ramsauer et al. 2020 — exponential capacity, integrates with backprop)

Meta-rules across all sections

When in doubt, ask these 4 questions:

QuestionIf “yes,” lean toward…
1. Gradient available?Gradient descent (SGD/Adam)
2. Many cheap evaluations possible?Random-Restart HC, Genetic Algorithms
3. Need interpretability / audit?Decision Trees, Description Logics, classical planners
4. Need optimal solution (proof)?A*, Branch-and-Bound, Value Iteration

If none of these is “yes,” you’re in the Simulated Annealing / SA-flavored corner — the “no math required, no guarantees, but tends to work” zone.


Pattern: each algorithm has a successor that replaces it when more info is available

This is the deepest learning across MoAI:

AlgorithmReplaced when this becomes available
Hill ClimbingGradient → use SGD
Simulated AnnealingGradient → SGD ; probabilistic model → Bayesian Optimization
Genetic AlgorithmsGradient → SGD ; differentiable architecture → DARTS
STRIPSProbabilistic outcomes → MDP ; unknown dynamics → RL
MDP / Value IterationLarge state space + neural value function → DQN, MuZero
Hopfield NetworksContinuous attention with backprop → Transformer Self-Attention
PerceptronNon-linear separability needed → MLP ; raw inputs → Transformer
Decision TreesVariance too high → Random Forest → Gradient Boosting
Kernel SVMImage / sequence data → CNN / Transformer
Description LogicsOpen-domain knowledge → LLM + RAG
CSP BacktrackingBoolean only → SAT solver ; arithmetic → SMT solver

The exam pattern: when a question asks “why is X used / not used today?” the answer almost always involves “because Y information became available, making Z method viable.”

See also

Tags: methods-of-ai moc exam-prep decision-tree
Created: 18-05-26