Lernzettel: SVMs & Perceptron

Methods of AI — SoSe 2026 · 1-page exam sheet

For more depth: Support Vector Machines (full reference with all algorithms, worked examples, deep explanations) · quiz_svm_30-04-26 (7 practice questions)

Core Ideas

SVM: hyperplane that maximizes the margin between two classes. Only support vectors (closest points, αᵢ > 0) determine the hyperplane.
Margin = 2/‖w‖. Maximize margin ⇔ minimize ½‖w‖² s.t. yᵢ(⟨xᵢ,w⟩+b) ≥ 1.
Kernel trick: replace ⟨x, x′⟩ with K(x, x′) = ⟨Φ(x), Φ(x′)⟩ to classify in higher-dim spaces without computing Φ.
Perceptron: single-layer linear classifier, output Θ(s) ∈ {0,1} (Heaviside, slide 36). Provably convergent iff linearly separable. Cannot solve XOR.

Mini-glossary

Term	Meaning
Hyperplane H	{x ∈ Rⁿ \| ⟨w,x⟩ + b = 0} — separating (n−1)-dim surface
Margin	2/‖w‖ — distance between the two parallel support hyperplanes
Support vector	training point on the margin; αᵢ > 0; alone determines w
Kernel K(x, x′)	implicit inner product ⟨Φ(x), Φ(x′)⟩ in feature space
Slack ξᵢ (suppl.)	how much constraint i may be violated (soft-margin only)
C (suppl.)	soft-margin trade-off: large C → narrow margin, small C → more regularization
αᵢ ≥ 0	Lagrange multiplier; αᵢ > 0 ⇔ point i is a support vector (KKT)
RBF	Radial Basis Function (= Gaussian kernel). “Radial” = depends only on distance ‖x−y‖; “basis function” = each SV is a bump. Maps to infinite-dim space. γ = bump width
PSD (suppl.)	Positive Semi-Definite: property of the kernel/Gram matrix K (Kᵢⱼ = K(xᵢ,xⱼ)): cᵀKc ≥ 0 ∀c, i.e. all eigenvalues ≥ 0. Mercer: K PSD ⟺ ∃ φ with K(x,y)=⟨φ(x),φ(y)⟩

⭐ Full glossary with examples → Perceptron vocabulary ⭐

Key formulas

	Formula
Hyperplane	⟨w, x⟩ + b = 0
Margin	2/‖w‖
Optimization (primal)	min ½‖w‖² s.t. yᵢ(⟨xᵢ,w⟩+b) ≥ 1
Lagrangian	L = ½‖w‖² − Σᵢ αᵢ(yᵢ(⟨xᵢ,w⟩+b) − 1)
KKT condition	αᵢ · (yᵢ(⟨xᵢ,w⟩+b) − 1) = 0
Weight recovery	w = Σᵢ αᵢ yᵢ xᵢ
Decision function	f(x) = sgn(Σᵢ αᵢ yᵢ K(x, xᵢ) + b)
Linear kernel	K(x, y) = ⟨x, y⟩
Polynomial kernel	K(x, y) = (⟨x, y⟩ + c)^d
RBF (Radial Basis Function = Gaussian) kernel	slides: K(x, y) = exp(−‖x − y‖² / 2σ²), param σ · equiv. exp(−γ‖x−y‖²) with γ = 1/(2σ²)
Perceptron output	y = Θ(w⃗ · x⃗) ∈ {0, 1}, x₀ = 1
Perceptron update	Δw⃗ = ε · (tⁿ − yⁿ) · x⃗ⁿ

Common Exam Traps ⚠️

Only support vectors matter — αᵢ = 0 points contribute nothing to w. Removing non-SVs leaves the hyperplane unchanged.
SVs aren’t “chosen” / aren’t simply “the closest points” — they emerge from the optimization as the points with active constraints (αᵢ > 0), i.e. those lying exactly on the margin (yᵢ(w·xᵢ+b)=1). It’s the QP + KKT, not a distance ranking, that picks them. (Soft-margin: 0<αᵢ<C on the margin, αᵢ=C violating it.) → How are support vectors chosen?
Lagrangian sign convention — standard form subtracts αᵢ·(constraint); flipping the sign inverts the saddle-point and is a classic error.
Margin is 2/‖w‖, not 1/‖w‖ — the 2 comes from the ±1 spacing of the two parallel margin hyperplanes.
Perceptron output is Θ(s) ∈ {0,1} — NOT sign(s). Bias is absorbed as w₀ via constant input x₀ = 1.
Perceptron “gradient” descent is technically not — Θ isn’t differentiable; slide 44 takes the gradient w.r.t. the preactivation w·x instead of the output.
Perceptron convergence is conditional — only guaranteed if (i) data linearly separable AND (ii) ε small enough.
XOR → “temporary drop in interest in ANN” (slide 47 wording). “AI winter” / Minsky & Papert is textbook embellishment.
Soft-margin SVM, C trade-off, slack variables, Mercer’s condition are NOT in slides — supplementary R&N material (large C → overfitting; small C → more regularization).
Kernel trick ≠ explicit mapping — K computes ⟨Φ(x), Φ(x′)⟩ without ever evaluating Φ; works even when Φ maps into infinite-dim spaces (RBF).
PSD-ness = why the kernel trick is legal (suppl., tied to Mercer) — K is a valid kernel iff its Gram matrix is positive semi-definite (eigenvalues ≥ 0, not strictly > 0). PSD guarantees a φ exists with K(x,y)=⟨φ(x),φ(y)⟩, and makes the dual QP convex (concave maximization → one global optimum). Non-PSD kernel → non-convex → solver can get stuck in a local optimum. It’s the Gram matrix that must be PSD, not the raw data.
RBF γ vs C — γ = kernel width: large γ → narrow bumps → overfitting; small γ → smooth, near-linear boundary. Don’t confuse with C (slack penalty); both tune over/underfitting but via different mechanisms.

Quick comparison

	Perceptron	SVM (linear)	SVM (kernel)
Boundary	Linear	Linear, max-margin	Non-linear (via Φ)
Output	Θ ∈ {0,1}	sgn ∈ {−1,+1}	sgn ∈ {−1,+1}
Training	MSE “gradient” descent (on preactivation)	Convex QP	Convex QP with K
Uses	All training points	Support vectors only	Support vectors only
Solves XOR	No	No	Yes (right K)
Convergence	Iff linearly sep. + small ε	Always (unique optimum)	Always

Full algorithm reference (perceptron training, SVM QP, kernel eval, soft-margin) → ALGORITHMS (full reference) ⭐

Practice quiz

quiz_svm_30-04-26 — 7 questions

Targeted exam questions in Questions for Methods of AI

Q74–79 (basic: SVM hyperplane, support vectors, kernel trick, perceptron) · Q84 (perceptron) · Q94 (perceptron + XOR) · Q95 (Kernel Trick) · Q130–132 (deep / exam-trap: Lagrangian sign convention, slack ξᵢ + C trade-off, Mercer’s condition)

Atomic notes

Perceptron · Support Vector Machines · Kernel Trick

Sibling Lernzettel

lernzettel_ml-i-ii_30-04-26 — ML I & II (background: bias-variance, classifiers)

Brain Online

Explorer

lernzettel_svm_30-04-26

Lernzettel: SVMs & Perceptron

Core Ideas

Mini-glossary

Key formulas

Common Exam Traps ⚠️

Quick comparison

See also

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis

Brain Online

Explorer

lernzettel_svm_30-04-26

Lernzettel: SVMs & Perceptron

Core Ideas

Mini-glossary

Key formulas

Common Exam Traps ⚠️

Quick comparison

Related Q&A & Notes

See also

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis