Quiz: SVMs & Perceptron

Methods of AI — SoSe 2026

Q1 — SVM

Question: What are support vectors, and why are they the only points that matter for defining the hyperplane?

Answer

Support vectors are the training examples closest to the decision hyperplane — the points that lie exactly on the margin boundaries (⟨w,x⟩+b = ±1).
Why they’re the only ones that matter: from the Lagrangian formulation, the Karush-Kuhn-Tucker condition gives:
αᵢ · (yᵢ · (⟨xᵢ,w⟩+b) − 1) = 0
For non-support-vector points, the constraint is strict (>1), so αᵢ = 0. Only support vectors have αᵢ > 0.
Since w = Σ αᵢ · yᵢ · xᵢ, only support vectors contribute to w. Remove all other training points → hyperplane unchanged.

Max’s answer:
Result:

Q2 — SVM

Question: What is the kernel trick? Why is it useful?

Answer

Kernel trick: replace the dot product ⟨x, xᵢ⟩ in the SVM decision function with a kernel function K(x, xᵢ) = ⟨φ(x), φ(xᵢ)⟩.
The kernel computes the dot product in a higher-dimensional feature space without ever explicitly computing φ(x). This makes it computationally tractable even for very high-dimensional (or infinite-dimensional) feature spaces.
Why useful: allows SVM to find non-linear decision boundaries in the original space by implicitly working in a higher-dimensional space where the data becomes linearly separable.
Decision function: f(x) = sgn(Σ αᵢ · yᵢ · K(x, xᵢ) + b)

Max’s answer:
Result:

Q4 — SVM

Question: What is the margin in SVM? Write the formula and explain what it means geometrically.

Answer

The margin is the distance between the two parallel hyperplanes bounding the “no man’s land” between classes:

Positive boundary: ⟨w, x⟩ + b = +1

Negative boundary: ⟨w, x⟩ + b = −1
Distance between them: margin = 2 / ‖w‖
SVM maximizes this margin, which is equivalent to minimizing ‖w‖² (constrained optimization).
Geometrically: a wider margin = more separation between classes = better generalization to unseen data.

Max’s answer:
Result:

Q5 — SVM

Question: What is the perceptron, and why can’t it solve XOR?

Answer

A perceptron is a single linear threshold unit: output = sign(⟨w, x⟩ + b). It learns a linear decision boundary via gradient descent on MSE.
XOR is not linearly separable — there’s no straight line that separates (0,0), (1,1) from (0,1), (1,0). A single perceptron can only represent linear boundaries → XOR is impossible.
Historical significance: Minsky & Papert (1969) proved XOR was unsolvable → led to the first “AI winter” for neural networks. The solution came later with multi-layer perceptrons (backprop).

Max’s answer:
Result:

Q6 — SVM

Question: Name three common kernel functions and give their formulas.

Answer

Linear kernel: K(x, y) = ⟨x, y⟩ — same as standard dot product, no feature mapping

Polynomial kernel: K(x, y) = (⟨x, y⟩ + c)^d — maps to d-th degree polynomial feature space

RBF (Radial Basis Function / Gaussian) kernel: slides use the σ-form K(x, y) = exp(−‖x − y‖²/2σ²) (equiv. exp(−γ‖x − y‖²), γ = 1/2σ²) — maps to infinite-dimensional space; σ/γ controls how quickly similarity falls off with distance
RBF is often the default choice — very powerful for complex non-linear boundaries.

Max’s answer:
Result:

Q7 — SVM

Question: Write the SVM optimization problem (what we minimize and what the constraint is).

Answer

Minimize: τ(w) = ½ · ‖w‖²
Subject to: yᵢ · (⟨xᵢ, w⟩ + b) ≥ 1 for all training examples i
This is a constrained quadratic optimization problem. The objective minimizes ‖w‖² (maximizes margin 2/‖w‖). The constraint ensures all training examples are correctly classified and outside the margin.
Solved via Lagrange multipliers → dual formulation → only support vectors (αᵢ > 0) contribute to solution.
Soft-margin adds slack variables ξᵢ: constraint becomes yᵢ · (⟨xᵢ,w⟩+b) ≥ 1 − ξᵢ, and minimizes ½‖w‖² + C·Σξᵢ.

Max’s answer:
Result:

Beyond the lecture (optional)

These questions go beyond the SoSe 2026 lecture slides (textbook / external additions). Kept for depth, not exam-critical.

Q3 — SVM

Question: What does the C parameter in soft-margin SVM control? What happens with large vs. small C?

Answer

C controls the tradeoff between margin size and training error (number of violated constraints).

Large C: few violations tolerated → small margin, more complex boundary → risk of overfitting.

Small C: many violations allowed → large margin, simpler boundary → more regularization, may underfit.
At C = ∞: hard-margin SVM (no violations allowed — only works if data is linearly separable).
At C → 0: ignores constraints entirely (degenerate).
In practice: C is a hyperparameter tuned via cross-validation.

Max’s answer:
Result:

Session 25/05/26 — slide-level rapid recall ⚡

5 questions on the core slide-level SVM mechanics (margin, support vectors, kernel trick, decision rule, convexity). Hard difficulty.

S1 — Margin

Question: The SVM margin is 2/‖w‖. Where does the 2 come from — why not just 1/‖w‖?

Answer

The two parallel margin hyperplanes are canonicalized as ⟨w,x⟩+b = +1 and ⟨w,x⟩+b = −1. Subtracting the closest positive and negative point equations: ⟨w, x₁−x₂⟩ = 2, and dividing by ‖w‖ gives the distance between them = 2/‖w‖. The 2 is the full gap between the +1 and −1 hyperplanes. 1/‖w‖ would only be the distance from the decision boundary (=0) to one margin hyperplane — half the margin.

Max’s answer:
Result:

S2 — Support vectors

Question: You delete a training point and re-train. When does the hyperplane stay identical, and when does it move? Tie your answer to a formula.

Answer

Hyperplane is unchanged if you delete a non-support vector (αᵢ = 0, lies strictly beyond its margin). It moves / is recomputed if you delete a support vector (αᵢ > 0, sits on the margin). Reason: w = Σᵢ αᵢ yᵢ xᵢ — only points with αᵢ > 0 contribute to w, and by the KKT condition only on-margin points have αᵢ > 0.

Max’s answer:
Result:

S3 — Kernel trick

Question: State precisely what K(x, x′) computes and why we never have to build the feature map Φ.

Answer

K(x, x′) = ⟨Φ(x), Φ(x′)⟩ — it computes the inner product in feature space, but evaluated directly in the original input space (typically O(n)). We never build Φ because the SVM dual and the decision function depend on the data only through dot products ⟨xᵢ, xⱼ⟩. Replacing each one with K(xᵢ, xⱼ) classifies in the high-dim space implicitly. Essential when Φ is infinite-dimensional (RBF).

Max’s answer:
Result:

S4 — Decision rule

Question: Write the kernelized decision function for a new input x, and explain how the class is read off it.

Answer

f(x) = sgn(Σᵢ αᵢ yᵢ K(x, xᵢ) + b). The sum effectively runs over support vectors only (αᵢ = 0 elsewhere). sgn = +1 → positive class, −1 → negative class. Intuitively it’s a kernel-similarity-weighted vote of the support vectors, offset by the bias b.

Max’s answer:
Result:

S5 — Convexity

Question: Why does the SVM optimization have a single, unique solution (unlike training a perceptron or neural net)?

Answer

The primal — minimize ½‖w‖² subject to linear inequality constraints yᵢ(⟨xᵢ,w⟩+b) ≥ 1 — is a convex quadratic program: convex objective + convex (linear) constraints ⇒ a single global optimum, no local minima to get stuck in. Perceptrons/NNs optimize non-convex landscapes, so their solution depends on initialization and can be locally optimal.

Max’s answer:
Result:

Score

Total: / 7
Session 25/05/26 (slide-level): / 5

Brain Online

Explorer

quiz_svm_30-04-26

Quiz: SVMs & Perceptron

Q1 — SVM

Q2 — SVM

Q4 — SVM

Q5 — SVM

Q6 — SVM

Q7 — SVM

Beyond the lecture (optional)

Q3 — SVM

Session 25/05/26 — slide-level rapid recall ⚡

S1 — Margin

S2 — Support vectors

S3 — Kernel trick

S4 — Decision rule

S5 — Convexity

Score

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis