Vagueness & Uncertainty — Fuzzy + Probabilistic Logic

chatbot methods-of-ai

Companion files for exam prep:

Table of contents

  1. Core Ideas
  2. Glossary
  3. Fuzzy Set Theory
  4. Zadeh operations
  5. t-norms and s-norms — axioms
  6. Algebraic & Quotient norms
  7. Visual — membership function shapes
  8. Fuzzy algebra in code
  9. Visualisations (Python) — 4 figures
  10. Probabilistic Logic
  11. Kolmogorov axioms & derived properties
  12. Conditional probability
  13. Probabilistic knowledge bases
  14. Formulas & Notation
  15. Common Exam Traps
  16. Worked examples
  17. Quick Comparison Table
  18. ALGORITHMS / TECHNIQUES
  19. Where Fuzzy Logic is used today
  20. Where Fuzzy Logic was replaced — and by what
  21. See also

Core Ideas

  • Classical logic: statements are either true or false — can’t capture partial truth or uncertainty.
  • Fuzzy logic (Lotfi Zadeh, 1965) addresses vaguenesspartial truth (“the ball is reddish”). Truth values live in [0, 1] and represent degree of membership.
  • Probabilistic logic addresses uncertaintycomplete truth we don’t know yet (“probably rain”). Truth values live in [0, 1] and represent probability of a crisp fact.
  • Fuzzy and probabilistic logic both use [0,1], but the semantics differ: degree of membership ≠ likelihood.
  • Probabilistic logic preserves more classical properties than fuzzy logic — tautologies always get probability 1, contradictions get 0; fuzzy logic does not guarantee this (intentional for vague predicates).

Glossary — important vocabulary ⭐

Vagueness — predicate boundary is fuzzy. “Tall”, “red”, “heap”. Heap paradox: if removing one grain keeps it a heap, then 0 grains is a heap — contradiction in classical logic.

Uncertainty — predicate boundary is sharp, but we lack information to decide. “It is raining in Berlin” is either true or false; we just don’t know.

Membership function μ_A(x) — function U → [0,1] assigning each x its degree of membership in fuzzy set A.

Fuzzy set A = (U, μ_A) — universe U plus its membership function. Classical (crisp) sets are the special case μ_A: U → {0,1}.

t-norm t(x,y) — binary operator on [0,1] generalising AND. Axioms: neutral element 1, commutative, associative, monotone increasing (if x ≤ x’ and y ≤ y’ then t(x,y) ≤ t(x’,y’)).

s-norm s(x,y) (a.k.a. t-conorm) — generalises OR. Axioms: neutral element 0, commutative, associative, monotone increasing.

Probability measure P — function on events satisfying Kolmogorov’s axioms (see below).

Conditional probability P(B|A) — probability of B given A has occurred, defined only when P(A) > 0.

Tautology T — formula true in every interpretation. In probabilistic logic always gets P(T) = 1. In fuzzy logic not guaranteed.

Contradiction C — formula false in every interpretation. Probabilistic: P(C) = 0. Fuzzy: not guaranteed.

Probabilistic knowledge base — set of probabilistic formulas F : p (P(F) = p) and/or conditional formulas (G|F)[p] (P(G|F) = p).


Fuzzy Set Theory

A fuzzy set generalises a classical (crisp) set:

  • Classical set A ⊆ U is fully described by its characteristic function χ_A : U → {0, 1}.
  • A fuzzy set A is the pair (U, μ_A) where μ_A : U → [0, 1].
  • μ_A(x) = 0.6 means “x belongs to A to degree 0.6”.

Classical: Tall(x) is true or false. Fuzzy: Tall(180 cm) might have membership degree 0.7 — partially tall.

Common membership function shapes:

  • Triangular — rises linearly to 1, falls linearly back to 0
  • Trapezoidal — rises, plateaus at 1, falls
  • Gaussian — bell curve centered on the “ideal” value
  • Sigmoidal — s-shaped, used for “saturation” concepts

Standard fuzzy operations (Zadeh) ⭐

OperationFormulaNote
Intersection A ∩ Bμ_{A∩B}(x) = min(μ_A(x), μ_B(x))t-norm
Union A ∪ Bμ_{A∪B}(x) = max(μ_A(x), μ_B(x))s-norm
Complement A^Cμ_{A^C}(x) = 1 − μ_A(x)standard negation

Non-classical behaviour:

  • A ∩ A^C ≠ ∅: with μ_A(x) = 0.6 we get min(0.6, 0.4) = 0.4 > 0.
  • A ∪ A^C ≠ U: max(0.6, 0.4) = 0.6 < 1.

This is intentional — for vague statements “red” and “not red”, “x is red or not red” need not be fully true.

t-norms and s-norms (generalising AND/OR)

A t-norm t : [0,1]² → [0,1] satisfies:

  1. Neutral element 1: t(x, 1) = x
  2. Commutativity: t(x, y) = t(y, x)
  3. Associativity: t(x, t(y, z)) = t(t(x, y), z)
  4. Monotonicity: x ≤ x’ and y ≤ y’ ⇒ t(x, y) ≤ t(x’, y’)

An s-norm (t-conorm) s : [0,1]² → [0,1] satisfies the same axioms with neutral element 0 instead of 1.

Concrete t-norm / s-norm families

Namet-norms-norm
Standard (Zadeh)min(x, y)max(x, y)
Algebraic (product, Goguen)x·yx + y − x·y
Quotient (Hamacher form)xy / (x + y − xy)(x + y − 2xy) / (1 − xy)
Łukasiewiczmax(0, x + y − 1)min(1, x + y)
Drastic (smallest non-trivial)1 if x = y = 1 else 0
Largest t-normmin(x, y)

Useful inequality. For any t-norm t and any x, y ∈ [0,1]: t(x, y) ≤ min(x, y). For any s-norm s: s(x, y) ≥ max(x, y). So min/max are the “extremal” Zadeh choices.

⚠️ Exam trap: the t-norm min and the s-norm max are NOT distributive over each other in general — fuzzy logic doesn’t obey all classical Boolean laws. Specifically, the law of excluded middle fails (x ⊔ ¬x ≠ 1).

Visual — the four membership function shapes

What to see:

  • Triangular — simplest, used when you only need “definitely belongs at this point, fades linearly elsewhere”
  • Trapezoidal — has a plateau of full membership; common for “tall person = 175–195 cm”
  • Gaussian — smooth, differentiable (matters for fuzzy neural networks)
  • Sigmoidal — saturation behaviour; for predicates like “old” (no upper bound)

Real fuzzy controllers usually use triangular or trapezoidal for speed; Gaussian when you need gradients (e.g. ANFIS — Adaptive Neuro-Fuzzy Inference Systems).

See the algebra concretely

Output:

Fuzzy AND (t-norm):
  min(x, y)         = 0.400     (Zadeh)
  x * y             = 0.240     (Goguen / product)
  max(0, x + y - 1) = 0.000     (Łukasiewicz)

Fuzzy OR (s-norm):
  max(x, y)         = 0.600     (Zadeh)
  x + y - x*y       = 0.760     (algebraic sum)
  min(1, x + y)     = 1.000     (Łukasiewicz)

Non-classical behavior:
  μ(A ⊓ A^c) using min  = 0.400    (classically should be 0)
  μ(A ⊔ A^c) using max  = 0.600    (classically should be 1)

→ Excluded middle (A ∨ ¬A = 1) and non-contradiction (A ∧ ¬A = 0) fail in fuzzy logic. This is by design — vague predicates allow partial overlap with their negation.

Visualisations (Python)

These four figures make the fuzzy logic concepts above concrete. Each toggle contains a self-contained Pyodide block (matplotlib installed on the fly) — open them in Obsidian with the Execute Code plugin to render.

What to see. Each curve is one vague concept. Around T = 14 °C the temperature is both somewhat cold (μ_cold ≈ 0.3) and somewhat warm (μ_warm ≈ 0.2). That overlap is the whole point of fuzzy logic — classical sets would have to draw a hard line at, say, 15 °C and call 14.99 °C “cold” but 15.01 °C “warm”. Notice the three shapes (trapezoidal / triangular / gaussian) all encode the same kind of vague predicate with different boundary behaviour: trapezoidal has a flat plateau of full membership, triangular peaks at one point, gaussian is smooth and differentiable.


What to see. Both green (min) and purple (product) satisfy all four t-norm axioms — neutral element 1, commutativity, associativity, monotonicity — yet they give different answers for “A AND B”. The orange band is the gap: min ≥ product for all x, y ∈ [0,1] (general inequality t(x,y) ≤ min(x,y) for any t-norm). At the extremes (μ = 0 or μ = 1) both agree, matching classical AND. The choice of t-norm is a design decision — Zadeh’s min is the textbook default and preserves more “non-classical” structure (e.g. idempotence: min(x,x) = x), while the product is differentiable and behaves more like probabilities of independent events.


What to see. In classical logic A ∨ ¬A is the textbook tautology — always 1, no exceptions (grey line). In Zadeh fuzzy logic the same formula becomes max(μ_A, 1 − μ_A), which dips down to 0.5 exactly where μ_A = 0.5 — the most ambiguous point. The yellow region is the “vagueness gap”: every point where the fuzzy interpretation falls short of the classical tautology. This isn’t a bug — it’s the defining feature that lets fuzzy logic talk about predicates with no sharp boundary (“reddish”, “tall”, “warm”). The flip side: tautologies are no longer guaranteed 1, which is why probabilistic logic (P(T) = 1 always) is closer to classical logic than fuzzy logic is.


What to see. The blue fuzzy set is the aggregated output of two clipped fuzzy rules — a typical bimodal shape after Mamdani inference. Centroid (green) takes the centre of gravity of the entire shape, sitting somewhere between the two lobes — it “knows” both rules fired. Max-membership (red) returns the average of the x-values where μ reaches its maximum — it ignores the smaller lobe entirely and reports a point near the right peak. Both are valid defuzzification methods; they can disagree by several units on the same set. Real fuzzy controllers usually pick centroid for smoothness, max-membership for speed and intuitive “winner-take-all” semantics.


Probabilistic Logic

Propositional atoms are mapped to events in a probability space (Ω, Σ, P).

Kolmogorov’s axioms

  1. Non-negativity: P(E) ≥ 0 for every event E.
  2. Normalisation: P(Ω) = 1.
  3. Finite additivity: for disjoint events A, B: P(A ∪ B) = P(A) + P(B). (σ-additivity for countable disjoint families.)

Derived properties

  • P(∅) = 0
  • P(A^C) = 1 − P(A)
  • A ⊆ B ⇒ P(A) ≤ P(B) (monotonicity)
  • Inclusion–exclusion: P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
  • Tautology T: P(T) = P(E_T) = P(Ω) = 1
  • Contradiction C: P(C) = P(E_C) = P(∅) = 0
  • For fuzzy logics we do not have this property — A ∨ ¬A need not be 1 (slide 70).

Conditional probability

P(B|A) = P(B ∩ A) / P(A) (requires P(A) > 0)

Same definition in probabilistic logic: P(G|F) = P(G ∧ F) / P(F).

Probabilistic knowledge bases

Two formula types:

  • F : punconditional. P satisfies it iff P(F) = p.
  • (G | F)[p]conditional. P satisfies it iff P(G | F) = p (requires P(F) > 0).

A KB is satisfiable if at least one probability measure satisfies every formula. Reasoning means deriving bounds like P(B) ≤ 0.5 from the KB (see worked example below).


Formulas & Notation

SymbolMeaning
μ_A(x)Degree of membership of x in fuzzy set A
t(x, y)t-norm (generalises AND)
s(x, y)s-norm / t-conorm (generalises OR)
(Ω, Σ, P)Probability space
P(E)Probability of event E
P(B|A)Conditional probability of B given A
F : pProbabilistic formula — P(F) = p
(G|F)[p]Conditional formula — P(G|F) = p
T, CTautology, contradiction

Common Exam Traps ⚠️

  • Vagueness ≠ uncertainty. Vagueness = predicate is partial (degree); uncertainty = predicate is sharp, we just don’t know which side x falls on.
  • Fuzzy: A ∨ ¬A is NOT necessarily 1. With μ_A(x) = 0.6: I(red ∨ ¬red) = max(0.6, 0.4) = 0.6. Intentional for vague predicates.
  • Probabilistic: tautologies always get P = 1. Contradictions always P = 0. This is one of the main differences from fuzzy logic (Session 08 slide 70).
  • min and max are one valid Zadeh choice for t-norm / s-norm — not the only ones. Algebraic (x·y, x+y−xy) is equally valid.
  • Algebraic t-norm ≠ min. x·y ≤ min(x, y), strict when both < 1.
  • Heap paradox motivates fuzzy logic — classical logic + induction produces a contradiction.
  • Conditional KB satisfaction: P satisfies (G|F)[p] iff P(G|F) = p, which requires P(F) > 0 (otherwise undefined).
  • t-norm and s-norm have different neutral elements — 1 for t-norm, 0 for s-norm. Don’t mix up.
  • Centroid vs. max-membership defuzzification can disagree on the same aggregated fuzzy set.
  • ⚠️ Dempster-Shafer is NOT in Session 08 slides — it is sometimes asked about externally but is not taught in this lecture (see the supplementary section below).

⚠️ Fuzzy vs. Probability — the big distinction

This is the #1 exam trap on Vagueness.

Fuzzy LogicProbability
What it capturesVagueness — predicate has no sharp boundaryUncertainty — fact is crisp but we don’t know
Statement”The ball is reddish” (μ_red(ball) = 0.7)“The ball is red, with probability 0.7”
Operationsmin / max for ∧ / ∨· / + for independent events
Complementμ_A^c(x) = 1 − μ_A(x)P(¬A) = 1 − P(A)
Excluded middleA ⊓ A^c ≠ ∅ (can have nonzero μ)P(A ∩ ¬A) = 0 always
UniverseA ⊔ A^c ≠ U (in general)P(A ∪ ¬A) = 1 always

Concrete example: “Bob is tall.”

  • Fuzzy: Bob is 180 cm; he’s tall to degree 0.7. There is no fact of the matter about whether he’s “really” tall.
  • Probabilistic: I don’t know Bob’s height; I estimate a 70% chance he’s tall (where “tall” is a crisp threshold).

If you ever confuse these on the exam, you lose the entire question. Memorize the example.


Worked examples

Heap paradox (motivates fuzzy logic)

  • Premise 1: “A pile of 1 000 000 grains of sand is a heap.”
  • Premise 2: “If n grains form a heap, then n − 1 grains also form a heap.”
  • By induction → “0 grains form a heap.” Contradiction.

Fuzzy fix. “Is a heap” is a fuzzy predicate with μ_heap(n) decreasing smoothly from 1 (large n) to 0 (small n). The induction step no longer preserves full truth: μ_heap(n − 1) ≈ μ_heap(n), but the tiny loss compounds — and that is fine in fuzzy logic.

Conditional KB satisfaction (slide 76)

KB: Bird : 0.8, (Penguin | Bird)[0.2], (Flies | Bird)[0.7], (Bird | Penguin)[1], (Flies | Penguin)[0].

Derive P(Penguin) ≥ 0.16:

P(Penguin) = P((Bird ∨ ¬Bird) ∧ Penguin)                (Tautology)
           = P((Bird ∧ Penguin) ∨ (¬Bird ∧ Penguin))   (Distributivity)
           = P(Bird ∧ Penguin) + P(¬Bird ∧ Penguin)     (Additivity)
           = P(Penguin | Bird) · P(Bird)                (Conditional)
             + P(Penguin | ¬Bird) · P(¬Bird)
           ≥ P(Penguin | Bird) · P(Bird)                (Non-negativity)
           = 0.2 · 0.8 = 0.16

This is the canonical pattern: split with a tautology A ∨ ¬A, apply distributivity + additivity, then bound with the conditional formula.


Quick Comparison Table

Classical LogicFuzzy LogicProbabilistic Logic
Truth values{0, 1}[0, 1][0, 1]
PropositionsTrue or falseTrue to a degreeTrue or false
Truth values representTrue or FalseDegree of membershipProbability of truth
A ∨ ¬AAlways 1Can be < 1Always 1
A ∧ ¬AAlways 0Can be > 0Always 0
AddressesVaguenessUncertainty

(“Truth values represent” row matches Session 08 slide 75 “Summary” verbatim.)


ALGORITHMS / TECHNIQUES (full reference) ⭐

Procedural techniques. Items 1–3 follow directly from the Session 08 slides; item 4 (Dempster-Shafer) is not in Session 08 and is included here as supplementary external material only — it is not exam-relevant for the lecture content.


1. Fuzzy inference (rule evaluation)

Setting. A fuzzy rule IF X is A AND Y is B THEN Z is C plus crisp inputs x₀, y₀.

Step 1 — Fuzzification.
    α_A = μ_A(x₀)         # input matches premise A to degree α_A
    α_B = μ_B(y₀)
 
Step 2 — Aggregate premises with a t-norm.
    α = t(α_A, α_B)       # Zadeh: α = min(α_A, α_B)
                          # Algebraic: α = α_A · α_B
 
Step 3 — Apply to consequent (implication / clipping).
    μ_C'(z) = min(α, μ_C(z))      # Mamdani-style: clip C at height α
 
Step 4 — Combine multiple rules with an s-norm.
    μ_aggregate(z) = max_r μ_C_r'(z)     # union of clipped consequents
                                         # (Zadeh) or use a different s-norm
 
Step 5 — Defuzzify (see #2) to obtain a crisp output z*.

Choice of t-norm / s-norm decides the inference style: Zadeh min/max is the textbook default; algebraic product / probabilistic sum is common in control applications.


2. Defuzzification

Turning the aggregated output fuzzy set μ_agg : Z → [0,1] back into a single crisp value z*.

2a. Centroid (centre-of-gravity)

z* = ( ∫ z · μ_agg(z) dz ) / ( ∫ μ_agg(z) dz )

Discrete form: z* = Σ_i z_i · μ_agg(z_i) / Σ_i μ_agg(z_i).

Pros. Smooth, takes the whole shape into account.
Cons. Expensive; can yield a value with low membership if μ_agg is bimodal.

2b. Max-membership (mean of maxima)

M = { z | μ_agg(z) = max_z' μ_agg(z') }
z* = mean(M)                # or smallest / largest element

Pros. Cheap, intuitive.
Cons. Ignores the shape — different aggregate sets can give the same answer.

2c. Other defuzzification methods

MethodHow it worksWhen it matters
Centroid (CoG)Center of gravity of the output membership functionDefault in most control systems
Mean of Maxima (MoM)Average of the x-values where μ is maximalFaster, less smooth
First / Last MaximumFirst (or last) x where μ reaches maxUsed when ties matter
Smallest of MaximumSmallest x at the max valueConservative controllers

⚠️ Centroid and max-membership can disagree on the same aggregated set — common exam question. Centroid is pulled by mass; MoM sees only the peak (see Q123–125 in the questions hub).


3. Bayesian update via conditional probability

Given:  prior P(H), likelihood P(E | H), P(E | ¬H)
Goal:   posterior P(H | E) after observing E
 
P(E)     = P(E | H) · P(H) + P(E | ¬H) · P(¬H)        # total probability
P(H | E) = P(E | H) · P(H) / P(E)                     # Bayes' rule

Used in probabilistic logic to update belief in hypothesis H given evidence E. Special case of the conditional-probability definition P(B|A) = P(B ∩ A) / P(A).


4. Dempster-Shafer combination rule (supplementary) ⚠️

⚠️ Not in Session 08 slides — included only for completeness because external questions (e.g. Q56–63) reference it. Do not assume this material is exam-relevant for Methods of AI unless the lecturer explicitly added it.

Setting. Frame of discernment Θ. Two mass functions m₁, m₂ : 2^Θ → [0,1] with Σ_A m_i(A) = 1 and m_i(∅) = 0. Each represents one independent source of evidence.

Combined mass via Dempster’s rule:

K = Σ_{B ∩ C = ∅}  m₁(B) · m₂(C)              # conflict mass
 
m₁₂(A) = ( 1 / (1 − K) ) · Σ_{B ∩ C = A}  m₁(B) · m₂(C)   for A ≠ ∅
m₁₂(∅) = 0

Belief & Plausibility:

Bel(A) = Σ_{B ⊆ A} m(B)
Pl(A)  = Σ_{B ∩ A ≠ ∅} m(B)
        = 1 − Bel(¬A)

Use. Combining evidence from independent sources when you cannot or do not want to commit to a single probability distribution. Reduces to Bayesian update when all masses are on singletons.


Where Fuzzy Logic is used today

  • Industrial control systems — Sendai subway (Japan, 1987) was the first famous deployment; smooth braking/acceleration via fuzzy controller. Still in production.
  • Consumer electronics — washing machines (load + dirt sensing), rice cookers, camera autofocus, air conditioners. Hugely popular in Japan.
  • Anti-lock braking (ABS) in cars — Bosch and others use fuzzy-logic-style controllers for traction control.
  • Camera image processing — exposure, white balance — fuzzy classifiers for scene type.
  • Medical decision support — diagnosis systems using fuzzy rules for symptoms (e.g. “high fever AND moderate pain”).
  • Stock trading systems — fuzzy rules for technical analysis indicators.

Where Fuzzy Logic was replaced — and by what

DomainWas fuzzy, now …Why
Pattern recognition / classificationNeural networksNNs learn membership functions from data instead of needing them hand-designed
Control of complex nonlinear systemsReinforcement Learning + neural controllersModel-free RL can learn controllers without needing to specify rules
Natural language understandingTransformer LLMsLLMs implicitly handle vagueness without explicit fuzzy formalism
Decision support with uncertaintyBayesian networks, probabilistic graphical modelsWhen uncertainty (not vagueness) dominates, probability is the right tool

Where fuzzy logic still stands: when you need an interpretable controller (rules can be read by humans), when training data is scarce, and when the system has to handle genuine vagueness rather than uncertainty.

See also

Tags: methods-of-ai vagueness uncertainty fuzzy-logic probabilistic-logic t-norm s-norm ai-generated
Created: 18-05-26 · Merged: 23-05-26