Dempster-Shafer Theory (DST) (Dempster 1967, Shafer 1976) is an alternative to classical probability for reasoning under uncertainty. It generalizes Bayesian probability by allowing belief mass to be assigned to sets of hypotheses rather than individual ones — which captures ignorance explicitly.
DST is famous in MoAI’s Vagueness & Uncertainty unit because it answers an awkward question: what if I don’t even know how to split my probability between hypotheses? Bayesian probability forces you to commit; DST lets you say “I have some belief but I don’t know how to distribute it.”
The setup
Given a frame of discernment Θ = {θ₁, θ₂, …, θₙ} — the set of mutually exclusive, exhaustive hypotheses — DST works with mass functions that map subsets of Θ to [0, 1]:
m : 2^Θ → [0, 1]
m(∅) = 0
Σ_{A ⊆ Θ} m(A) = 1
Crucially: m(A) is belief committed exactly to A — not spread across A’s subsets. If m({θ₁, θ₂}) = 0.4, it means “I’m 40 % committed to one of θ₁ or θ₂, but I don’t know which.”
This is the central difference from probability: mass can sit on sets, representing ignorance about which element is correct.
Belief and Plausibility — the two key numbers
For any hypothesis A ⊆ Θ:
Formula
Meaning
Belief
Bel(A) = Σ_{B ⊆ A} m(B)
Sum of mass committed to A or subsets of A — lower bound on confidence
Plausibility
Pl(A) = Σ_{B ∩ A ≠ ∅} m(B)
Sum of mass on sets that could contain A — upper bound on confidence
Doubt
1 − Pl(A)
Mass that definitely excludes A
Always: 0 ≤ Bel(A) ≤ Pl(A) ≤ 1. The gap Pl(A) − Bel(A) = the amount of ignorance about A.
⚠️ Exam trap: Belief and Plausibility are bounds on probability, not a probability itself. In Bayesian theory P(A) + P(¬A) = 1. In DST: Bel(A) + Bel(¬A) ≤ 1 (slack = ignorance).
See the math concretely
Suppose Θ = {Cold, Flu, COVID} (frame of discernment). A doctor receives test evidence with mass:
m({Cold}) = 0.20
m({Flu, COVID}) = 0.50 ← "it's probably one of these two, but I can't tell which"
m({Cold, Flu, COVID}) = 0.30 ← "I have no idea"
Sum to 1 ✓. Now compute Belief and Plausibility for “patient has Flu”:
A = {Flu}
Bel({Flu}) = m({Flu}) # only mass committed exactly to {Flu}
= 0.0 # nothing is on {Flu} alone
Pl({Flu}) = m({Flu}) + m({Flu, COVID}) + m({Cold, Flu, COVID})
= 0.0 + 0.5 + 0.3 = 0.8
So belief that the patient has flu = 0, but plausibility = 0.8. The 0.8 captures “the mass that might support flu, even though it’s not committed to flu alone.” That gap (0.8 − 0 = 0.8) is pure ignorance.
See it in code
🐍 Code anzeigen / ausblenden
# Pyodide / Obsidian Execute Code: install matplotlib first.# In normal Python (terminal / Jupyter), delete the next 2 lines.import micropipawait micropip.install("matplotlib")import matplotlib.pyplot as pltfrom itertools import chain, combinationsdef power_set(iterable): s = list(iterable) return [set(c) for c in chain.from_iterable(combinations(s, r) for r in range(len(s) + 1))]def belief(A, m): return sum(mass for B, mass in m.items() if set(B).issubset(A))def plausibility(A, m): return sum(mass for B, mass in m.items() if set(B) & A)# Frame: {Cold, Flu, COVID}theta = {'Cold', 'Flu', 'COVID'}m = { frozenset({'Cold'}): 0.20, frozenset({'Flu', 'COVID'}): 0.50, frozenset({'Cold', 'Flu', 'COVID'}): 0.30,}# Compute Bel and Pl for each singleton hypothesishypotheses = [{'Cold'}, {'Flu'}, {'COVID'}]labels = ['Cold', 'Flu', 'COVID']bels = [belief(h, m) for h in hypotheses]pls = [plausibility(h, m) for h in hypotheses]doubts = [1 - p for p in pls]print(f"{'Hyp':<8} {'Bel':>6} {'Pl':>6} {'gap':>6}")for label, b, p in zip(labels, bels, pls): print(f"{label:<8} {b:>6.2f} {p:>6.2f} {p - b:>6.2f}")# --- Plot: Bel / Pl / Doubt as stacked bars ---import numpy as npx = np.arange(len(labels))plt.figure(figsize=(8, 4.5))plt.bar(x, bels, color='steelblue', label='Belief (lower bound)')plt.bar(x, [p - b for p, b in zip(pls, bels)], bottom=bels, color='gold', label='Ignorance (Pl − Bel)')plt.bar(x, doubts, bottom=pls, color='lightgray', label='Doubt (1 − Pl)')plt.xticks(x, labels); plt.ylim(0, 1)plt.ylabel('mass distribution per hypothesis')plt.title('Dempster-Shafer: Belief / Ignorance / Doubt')plt.legend(loc='upper right'); plt.tight_layout(); plt.show()
What to see:
For each hypothesis, the bar shows three regions: Belief (committed) at the bottom, Ignorance (uncommitted but plausible) in the middle, Doubt (excluded) at the top.
“Cold” has Bel = 0.20 (some direct commitment), big ignorance, modest doubt.
“Flu” and “COVID” have Bel = 0 but high plausibility — that gold middle band is the DST signature: belief is uncertain but the evidence doesn’t rule them out either.
A pure Bayesian framework can’t represent this — it would force you to split the 0.5 mass between Flu and COVID with some arbitrary ratio.
Dempster’s Rule of Combination
DST has a way to fuse evidence from multiple sources. Given two independent mass functions m₁ and m₂:
(m₁ ⊕ m₂)(A) = (1 / (1 - K)) · Σ_{B ∩ C = A, A ≠ ∅} m₁(B) · m₂(C)
where K = Σ_{B ∩ C = ∅} m₁(B) · m₂(C) is the conflict between the two evidence sources. Higher K = more disagreement.
⚠️ The famous Zadeh counterexample: when two highly-confident sources disagree completely, K → 1 and Dempster’s rule produces absurd results (“the rare exotic hypothesis wins”). This is why DST has variants like Yager’s rule and Dubois-Prade rule for high-conflict cases.
DST vs. Bayesian Probability vs. Fuzzy Logic
Bayesian
Dempster-Shafer
Fuzzy
What it captures
Uncertainty (don’t know which crisp fact holds)
Uncertainty + Ignorance (don’t even know the prior)
Vagueness (predicate has no sharp boundary)
Numbers represent
Probability of crisp event
Belief mass on subsets
Degree of membership
Sum constraint
Σ P(θᵢ) = 1 over singletons
Σ m(A) = 1 over all subsets
none — μ(x) ∈ [0,1] independent per x
Excluded middle
P(A) + P(¬A) = 1
Bel(A) + Bel(¬A) ≤ 1
μ(A) + μ(¬A) ≤ 1 (often <)
Update rule
Bayes’ rule
Dempster’s rule of combination
Fuzzy inference (Mamdani, Sugeno)
⚠️ Common exam confusion: DST is not fuzzy logic. DST handles uncertainty about which discrete hypothesis is true. Fuzzy logic handles vagueness of a predicate’s extension. They both have numbers in [0,1] but mean very different things.
Where DST is used today
Sensor fusion in robotics and autonomous vehicles — combine evidence from multiple sensors that may disagree. Used in some self-driving stacks for perception.
Medical decision support — combine evidence from multiple tests/symptoms when each test has partial reliability.
Forensic identification — DNA evidence combined with eyewitness reports under DST.
Information fusion in defence systems — combining radar/sonar/visual evidence in target classification.
Recommender systems with incomplete user data — DST handles “I don’t know” gracefully.
Reliability engineering — combining failure mode estimates from multiple experts.
Where DST is being challenged — and by what
Domain
Was DST, now …
Why
Most uncertainty reasoning
Bayesian probability + hierarchical priors
Bayesian methods are simpler, mathematically cleaner, and modern computation (MCMC, variational inference) makes them tractable
Sensor fusion in self-driving
Probabilistic graphical models + deep learning
Learned fusion (multi-modal networks) outperforms hand-coded DST in modern systems
Multi-source data integration
Bayesian networks, Markov random fields
Bayesian frameworks integrate evidence with cleaner semantics
Conflict handling
Imprecise probability (Walley 1991)
More general theory; DST is a special case
Where DST still stands: when you need to explicitly represent ignorance (not just uncertainty), and when committing to a Bayesian prior would be misleading. Common in safety-critical sensor fusion where “we don’t know” should be a first-class output.
See also
Fuzzy Logic — the vagueness sibling (not to be confused with DST)
Description Logics — KR with Open World Assumption (also doesn’t force you to commit)