Lecture 7
RN 12.4, 13.1–13.2 · PM 8.2–8.3
A joint distribution over \(n\) Boolean variables needs \(2^n\) probabilities.
| Variables | Probabilities |
|---|---|
| 6 (Holmes) | 64 |
| 10 | 1,024 |
| 20 | ~ 1 million |
| 30 | ~ 1 billion |
Independence lets us factor this huge joint into tiny pieces.
\(P(X | Y) = P(X), \quad P(Y | X) = P(Y), \quad P(X \wedge Y) = P(X)\, P(Y)\).
Learning \(Y\) does not change our belief about \(X\).
Two probabilities (the priors) suffice to recover four joint probabilities.
\(P(X | Y \wedge Z) = P(X | Z), \quad P(Y | X \wedge Z) = P(Y | Z), \quad P(X \wedge Y | Z) = P(X | Z)\, P(Y | Z)\).
Once we know \(Z\), learning \(Y\) doesn't change our belief about \(X\).
Important: independence does not imply conditional independence, and vice versa. We'll see both on later slides.
Three Boolean variables \(A, B, C\). How many probabilities are needed to specify the joint?
Q1. No independence assumed.
Q2. \(A, B, C\) all mutually independent.
Q3. \(A\) and \(B\) conditionally independent given \(C\).
Inheritance of handedness
Car diagnostic network
| Representation | Probabilities needed |
|---|---|
| Full joint over 6 Boolean variables | \(2^6 = 64\) |
| Holmes Bayes net: \(1 + 1 + 2 + 4 + 2 + 2\) | 12 |
From those 12 numbers we can reconstruct every one of the 64 joint probabilities.
For larger networks, the savings are exponential.
\(P(X_1, \ldots, X_n) = \prod_{i=1}^{n} P(X_i \,|\, \mathrm{Parents}(X_i))\).
Each variable is conditionally independent of its non-descendants given its parents.
Goal: \(P(\neg B \wedge \neg E \wedge A \wedge \neg R \wedge G \wedge W)\). All neighbours call but neither a burglary nor an earthquake actually happened.
Step 1. Factor by the BN structure (using parents):
\(= P(\neg B) \cdot P(\neg E) \cdot P(A | \neg B \wedge \neg E) \cdot P(\neg R | \neg E) \cdot P(G | A) \cdot P(W | A)\)
Step 2. Plug in numbers:
\(= 0.9999 \times 0.9997 \times 0.01 \times 0.9998 \times 0.40 \times 0.80\)
\(\approx \mathbf{3.2 \times 10^{-3}}\)
Q. What is \(P(\neg B \wedge \neg E \wedge \neg A \wedge \neg R \wedge \neg G \wedge \neg W)\)?
\(A\) and \(C\) talk through \(B\).
\(B\) and \(C\) share a hidden cause \(A\).
\(A\) and \(B\) both cause \(C\) (a v-structure).
Each structure has its own independence pattern. Let's check.
Q1. Are \(B\) and \(W\) unconditionally independent?
Q2. Are \(B\) and \(W\) conditionally independent given \(A\)?
Q1. Are \(W\) and \(G\) unconditionally independent?
Q2. Are \(W\) and \(G\) conditionally independent given \(A\)?
Q1. Are \(E\) and \(B\) unconditionally independent?
Q2. Are \(E\) and \(B\) conditionally independent given \(A\)?
The v-structure is the surprise: the only structure where conditioning on the middle node opens a previously closed path.
| Structure | Are endpoints independent? | Given the middle node? |
|---|---|---|
| Chain \(A \to B \to C\) | No | Yes blocked |
| Common cause \(B \leftarrow A \to C\) | No | Yes blocked |
| Common effect \(A \to C \leftarrow B\) | Yes | No opens up |
In the chain and common-cause structures, knowing the middle node blocks information flow. In the v-structure, knowing the middle node opens it (explaining away).
The three structures cover paths of length 2. For arbitrary paths in a Bayes net, we generalize to d-separation — the rule for reading off all (conditional) independences from any DAG.