Dempster-Shafer Theory

methods-of-ai

Dempster-Shafer Theory (DST) (Dempster 1967, Shafer 1976) is an alternative to classical probability for reasoning under uncertainty. It generalizes Bayesian probability by allowing belief mass to be assigned to sets of hypotheses rather than individual ones — which captures ignorance explicitly.

DST is famous in MoAI’s Vagueness & Uncertainty unit because it answers an awkward question: what if I don’t even know how to split my probability between hypotheses? Bayesian probability forces you to commit; DST lets you say “I have some belief but I don’t know how to distribute it.”

The setup

Given a frame of discernment Θ = {θ₁, θ₂, …, θₙ} — the set of mutually exclusive, exhaustive hypotheses — DST works with mass functions that map subsets of Θ to [0, 1]:

m : 2^Θ → [0, 1]
m(∅) = 0
Σ_{A ⊆ Θ} m(A) = 1

Crucially: m(A) is belief committed exactly to Anot spread across A’s subsets. If m({θ₁, θ₂}) = 0.4, it means “I’m 40 % committed to one of θ₁ or θ₂, but I don’t know which.”

This is the central difference from probability: mass can sit on sets, representing ignorance about which element is correct.

Belief and Plausibility — the two key numbers

For any hypothesis A ⊆ Θ:

FormulaMeaning
BeliefBel(A) = Σ_{B ⊆ A} m(B)Sum of mass committed to A or subsets of A — lower bound on confidence
PlausibilityPl(A) = Σ_{B ∩ A ≠ ∅} m(B)Sum of mass on sets that could contain A — upper bound on confidence
Doubt1 − Pl(A)Mass that definitely excludes A

Always: 0 ≤ Bel(A) ≤ Pl(A) ≤ 1. The gap Pl(A) − Bel(A) = the amount of ignorance about A.

⚠️ Exam trap: Belief and Plausibility are bounds on probability, not a probability itself. In Bayesian theory P(A) + P(¬A) = 1. In DST: Bel(A) + Bel(¬A) ≤ 1 (slack = ignorance).

See the math concretely

Suppose Θ = {Cold, Flu, COVID} (frame of discernment). A doctor receives test evidence with mass:

m({Cold}) = 0.20
m({Flu, COVID}) = 0.50            ← "it's probably one of these two, but I can't tell which"
m({Cold, Flu, COVID}) = 0.30      ← "I have no idea"

Sum to 1 ✓. Now compute Belief and Plausibility for “patient has Flu”:

A = {Flu}
Bel({Flu}) = m({Flu})              # only mass committed exactly to {Flu}
           = 0.0                   # nothing is on {Flu} alone

Pl({Flu})  = m({Flu}) + m({Flu, COVID}) + m({Cold, Flu, COVID})
           = 0.0 + 0.5 + 0.3 = 0.8

So belief that the patient has flu = 0, but plausibility = 0.8. The 0.8 captures “the mass that might support flu, even though it’s not committed to flu alone.” That gap (0.8 − 0 = 0.8) is pure ignorance.

See it in code

What to see:

  • For each hypothesis, the bar shows three regions: Belief (committed) at the bottom, Ignorance (uncommitted but plausible) in the middle, Doubt (excluded) at the top.
  • “Cold” has Bel = 0.20 (some direct commitment), big ignorance, modest doubt.
  • “Flu” and “COVID” have Bel = 0 but high plausibility — that gold middle band is the DST signature: belief is uncertain but the evidence doesn’t rule them out either.
  • A pure Bayesian framework can’t represent this — it would force you to split the 0.5 mass between Flu and COVID with some arbitrary ratio.

Dempster’s Rule of Combination

DST has a way to fuse evidence from multiple sources. Given two independent mass functions m₁ and m₂:

(m₁ ⊕ m₂)(A) = (1 / (1 - K)) · Σ_{B ∩ C = A, A ≠ ∅} m₁(B) · m₂(C)

where K = Σ_{B ∩ C = ∅} m₁(B) · m₂(C) is the conflict between the two evidence sources. Higher K = more disagreement.

⚠️ The famous Zadeh counterexample: when two highly-confident sources disagree completely, K → 1 and Dempster’s rule produces absurd results (“the rare exotic hypothesis wins”). This is why DST has variants like Yager’s rule and Dubois-Prade rule for high-conflict cases.

DST vs. Bayesian Probability vs. Fuzzy Logic

BayesianDempster-ShaferFuzzy
What it capturesUncertainty (don’t know which crisp fact holds)Uncertainty + Ignorance (don’t even know the prior)Vagueness (predicate has no sharp boundary)
Numbers representProbability of crisp eventBelief mass on subsetsDegree of membership
Sum constraintΣ P(θᵢ) = 1 over singletonsΣ m(A) = 1 over all subsetsnone — μ(x) ∈ [0,1] independent per x
Excluded middleP(A) + P(¬A) = 1Bel(A) + Bel(¬A) ≤ 1μ(A) + μ(¬A) ≤ 1 (often <)
Update ruleBayes’ ruleDempster’s rule of combinationFuzzy inference (Mamdani, Sugeno)

⚠️ Common exam confusion: DST is not fuzzy logic. DST handles uncertainty about which discrete hypothesis is true. Fuzzy logic handles vagueness of a predicate’s extension. They both have numbers in [0,1] but mean very different things.

Where DST is used today

  • Sensor fusion in robotics and autonomous vehicles — combine evidence from multiple sensors that may disagree. Used in some self-driving stacks for perception.
  • Medical decision support — combine evidence from multiple tests/symptoms when each test has partial reliability.
  • Forensic identification — DNA evidence combined with eyewitness reports under DST.
  • Information fusion in defence systems — combining radar/sonar/visual evidence in target classification.
  • Recommender systems with incomplete user data — DST handles “I don’t know” gracefully.
  • Reliability engineering — combining failure mode estimates from multiple experts.

Where DST is being challenged — and by what

DomainWas DST, now …Why
Most uncertainty reasoningBayesian probability + hierarchical priorsBayesian methods are simpler, mathematically cleaner, and modern computation (MCMC, variational inference) makes them tractable
Sensor fusion in self-drivingProbabilistic graphical models + deep learningLearned fusion (multi-modal networks) outperforms hand-coded DST in modern systems
Multi-source data integrationBayesian networks, Markov random fieldsBayesian frameworks integrate evidence with cleaner semantics
Conflict handlingImprecise probability (Walley 1991)More general theory; DST is a special case

Where DST still stands: when you need to explicitly represent ignorance (not just uncertainty), and when committing to a Bayesian prior would be misleading. Common in safety-critical sensor fusion where “we don’t know” should be a first-class output.

See also

Tags: methods-of-ai uncertainty dempster-shafer
Created: 18-05-26