Dempster-Shafer Theory

methods-of-ai

Dempster-Shafer Theory (DST) (Dempster 1967, Shafer 1976) is an alternative to classical probability for reasoning under uncertainty. It generalizes Bayesian probability by allowing belief mass to be assigned to sets of hypotheses rather than individual ones — which captures ignorance explicitly.

DST is famous in MoAI’s Vagueness & Uncertainty unit because it answers an awkward question: what if I don’t even know how to split my probability between hypotheses? Bayesian probability forces you to commit; DST lets you say “I have some belief but I don’t know how to distribute it.”

The setup

Given a frame of discernment Θ = {θ₁, θ₂, …, θₙ} — the set of mutually exclusive, exhaustive hypotheses — DST works with mass functions that map subsets of Θ to [0, 1]:

m : 2^Θ → [0, 1]
m(∅) = 0
Σ_{A ⊆ Θ} m(A) = 1

Crucially: m(A) is belief committed exactly to A — not spread across A’s subsets. If m({θ₁, θ₂}) = 0.4, it means “I’m 40 % committed to one of θ₁ or θ₂, but I don’t know which.”

This is the central difference from probability: mass can sit on sets, representing ignorance about which element is correct.

Belief and Plausibility — the two key numbers

For any hypothesis A ⊆ Θ:

	Formula	Meaning
Belief	Bel(A) = Σ_{B ⊆ A} m(B)	Sum of mass committed to A or subsets of A — lower bound on confidence
Plausibility	Pl(A) = Σ_{B ∩ A ≠ ∅} m(B)	Sum of mass on sets that could contain A — upper bound on confidence
Doubt	1 − Pl(A)	Mass that definitely excludes A

Always: 0 ≤ Bel(A) ≤ Pl(A) ≤ 1. The gap Pl(A) − Bel(A) = the amount of ignorance about A.

⚠️ Exam trap: Belief and Plausibility are bounds on probability, not a probability itself. In Bayesian theory P(A) + P(¬A) = 1. In DST: Bel(A) + Bel(¬A) ≤ 1 (slack = ignorance).

See the math concretely

Suppose Θ = {Cold, Flu, COVID} (frame of discernment). A doctor receives test evidence with mass:

m({Cold}) = 0.20
m({Flu, COVID}) = 0.50            ← "it's probably one of these two, but I can't tell which"
m({Cold, Flu, COVID}) = 0.30      ← "I have no idea"

Sum to 1 ✓. Now compute Belief and Plausibility for “patient has Flu”:

A = {Flu}
Bel({Flu}) = m({Flu})              # only mass committed exactly to {Flu}
           = 0.0                   # nothing is on {Flu} alone

Pl({Flu})  = m({Flu}) + m({Flu, COVID}) + m({Cold, Flu, COVID})
           = 0.0 + 0.5 + 0.3 = 0.8

So belief that the patient has flu = 0, but plausibility = 0.8. The 0.8 captures “the mass that might support flu, even though it’s not committed to flu alone.” That gap (0.8 − 0 = 0.8) is pure ignorance.

See it in code

🐍 Code anzeigen / ausblenden

# Pyodide / Obsidian Execute Code: install matplotlib first.
# In normal Python (terminal / Jupyter), delete the next 2 lines.
import micropip
await micropip.install("matplotlib")
 
import matplotlib.pyplot as plt
from itertools import chain, combinations
 
def power_set(iterable):
    s = list(iterable)
    return [set(c) for c in chain.from_iterable(combinations(s, r) for r in range(len(s) + 1))]
 
def belief(A, m):
    return sum(mass for B, mass in m.items() if set(B).issubset(A))
 
def plausibility(A, m):
    return sum(mass for B, mass in m.items() if set(B) & A)
 
# Frame: {Cold, Flu, COVID}
theta = {'Cold', 'Flu', 'COVID'}
m = {
    frozenset({'Cold'}):                 0.20,
    frozenset({'Flu', 'COVID'}):         0.50,
    frozenset({'Cold', 'Flu', 'COVID'}): 0.30,
}
 
# Compute Bel and Pl for each singleton hypothesis
hypotheses = [{'Cold'}, {'Flu'}, {'COVID'}]
labels = ['Cold', 'Flu', 'COVID']
bels = [belief(h, m) for h in hypotheses]
pls  = [plausibility(h, m) for h in hypotheses]
doubts = [1 - p for p in pls]
 
print(f"{'Hyp':<8} {'Bel':>6} {'Pl':>6} {'gap':>6}")
for label, b, p in zip(labels, bels, pls):
    print(f"{label:<8} {b:>6.2f} {p:>6.2f} {p - b:>6.2f}")
 
# --- Plot: Bel / Pl / Doubt as stacked bars ---
import numpy as np
x = np.arange(len(labels))
plt.figure(figsize=(8, 4.5))
plt.bar(x, bels,                          color='steelblue', label='Belief (lower bound)')
plt.bar(x, [p - b for p, b in zip(pls, bels)], bottom=bels, color='gold',
        label='Ignorance (Pl − Bel)')
plt.bar(x, doubts, bottom=pls,            color='lightgray', label='Doubt (1 − Pl)')
plt.xticks(x, labels); plt.ylim(0, 1)
plt.ylabel('mass distribution per hypothesis')
plt.title('Dempster-Shafer: Belief / Ignorance / Doubt')
plt.legend(loc='upper right'); plt.tight_layout(); plt.show()

What to see:

For each hypothesis, the bar shows three regions: Belief (committed) at the bottom, Ignorance (uncommitted but plausible) in the middle, Doubt (excluded) at the top.
“Cold” has Bel = 0.20 (some direct commitment), big ignorance, modest doubt.
“Flu” and “COVID” have Bel = 0 but high plausibility — that gold middle band is the DST signature: belief is uncertain but the evidence doesn’t rule them out either.
A pure Bayesian framework can’t represent this — it would force you to split the 0.5 mass between Flu and COVID with some arbitrary ratio.

Dempster’s Rule of Combination

DST has a way to fuse evidence from multiple sources. Given two independent mass functions m₁ and m₂:

(m₁ ⊕ m₂)(A) = (1 / (1 - K)) · Σ_{B ∩ C = A, A ≠ ∅} m₁(B) · m₂(C)

where K = Σ_{B ∩ C = ∅} m₁(B) · m₂(C) is the conflict between the two evidence sources. Higher K = more disagreement.

⚠️ The famous Zadeh counterexample: when two highly-confident sources disagree completely, K → 1 and Dempster’s rule produces absurd results (“the rare exotic hypothesis wins”). This is why DST has variants like Yager’s rule and Dubois-Prade rule for high-conflict cases.

DST vs. Bayesian Probability vs. Fuzzy Logic

	Bayesian	Dempster-Shafer	Fuzzy
What it captures	Uncertainty (don’t know which crisp fact holds)	Uncertainty + Ignorance (don’t even know the prior)	Vagueness (predicate has no sharp boundary)
Numbers represent	Probability of crisp event	Belief mass on subsets	Degree of membership
Sum constraint	`Σ P(θᵢ) = 1` over singletons	`Σ m(A) = 1` over all subsets	none — `μ(x) ∈ [0,1]` independent per x
Excluded middle	`P(A) + P(¬A) = 1`	`Bel(A) + Bel(¬A) ≤ 1`	`μ(A) + μ(¬A) ≤ 1` (often <)
Update rule	Bayes’ rule	Dempster’s rule of combination	Fuzzy inference (Mamdani, Sugeno)

⚠️ Common exam confusion: DST is not fuzzy logic. DST handles uncertainty about which discrete hypothesis is true. Fuzzy logic handles vagueness of a predicate’s extension. They both have numbers in [0,1] but mean very different things.

Where DST is used today

Sensor fusion in robotics and autonomous vehicles — combine evidence from multiple sensors that may disagree. Used in some self-driving stacks for perception.
Medical decision support — combine evidence from multiple tests/symptoms when each test has partial reliability.
Forensic identification — DNA evidence combined with eyewitness reports under DST.
Information fusion in defence systems — combining radar/sonar/visual evidence in target classification.
Recommender systems with incomplete user data — DST handles “I don’t know” gracefully.
Reliability engineering — combining failure mode estimates from multiple experts.

Where DST is being challenged — and by what

Domain	Was DST, now …	Why
Most uncertainty reasoning	Bayesian probability + hierarchical priors	Bayesian methods are simpler, mathematically cleaner, and modern computation (MCMC, variational inference) makes them tractable
Sensor fusion in self-driving	Probabilistic graphical models + deep learning	Learned fusion (multi-modal networks) outperforms hand-coded DST in modern systems
Multi-source data integration	Bayesian networks, Markov random fields	Bayesian frameworks integrate evidence with cleaner semantics
Conflict handling	Imprecise probability (Walley 1991)	More general theory; DST is a special case

Where DST still stands: when you need to explicitly represent ignorance (not just uncertainty), and when committing to a Bayesian prior would be misleading. Common in safety-critical sensor fusion where “we don’t know” should be a first-class output.

Brain Online

Explorer

Dempster-Shafer Theory

Dempster-Shafer Theory

The setup

Belief and Plausibility — the two key numbers

See the math concretely

See it in code

Dempster’s Rule of Combination

DST vs. Bayesian Probability vs. Fuzzy Logic

Where DST is used today

Where DST is being challenged — and by what

See also

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis