Lecture 7
RN 12.4, 13.1–13.2 · PM 8.2–8.3
Chat 3 (Probabilities) on Chrysalis.
A joint distribution over \(n\) Boolean variables needs \(2^n\) probabilities.
| Variables | Probabilities |
|---|---|
| 6 (Holmes) | 64 |
| 10 | 1,024 |
| 20 | ~ 1 million |
| 30 | ~ 1 billion |
Independence lets us factor this huge joint into tiny pieces.
\(P(X | Y) = P(X)\)
\(P(Y | X) = P(Y)\)
\(P(X \wedge Y) = P(X)\, P(Y)\)
Learning \(Y\) does not change our belief about \(X\).
Two probabilities (the priors) suffice to recover four joint probabilities.
\(P(X | Y \wedge Z) = P(X | Z)\)
\(P(Y | X \wedge Z) = P(Y | Z)\)
\(P(X \wedge Y | Z) = P(X | Z)\, P(Y | Z)\)
Once we know \(Z\), learning \(Y\) doesn't change our belief about \(X\).
Important: independence does not imply conditional independence, and vice versa. We'll see both on later slides.
Three Boolean variables \(A, B, C\). How many probabilities specify the joint?
Q1. No independence assumed.
Q2. \(A, B, C\) all mutually independent.
Q3. \(A, B\) conditionally independent given \(C\).
Inheritance of handedness
Car diagnostic network
| Representation | Probabilities needed |
|---|---|
| Full joint over 6 Boolean variables | \(2^6 = 64\) |
| Holmes Bayes net: \(1 + 1 + 2 + 4 + 2 + 2\) | 12 |
From those 12 numbers we can reconstruct every one of the 64 joint probabilities.
For larger networks, the savings are exponential.
\(P(X_1, \ldots, X_n) = \prod_{i=1}^{n} P(X_i \,|\, \mathrm{Parents}(X_i))\).
Each variable is conditionally independent of its non-descendants given its parents.
Goal: \(P(\neg B \wedge \neg E \wedge A \wedge \neg R \wedge G \wedge W)\). All neighbours call but neither a burglary nor an earthquake actually happened.
Step 1. Factor by the BN structure (using parents):
\(= P(\neg B) \cdot P(\neg E) \cdot P(A | \neg B \wedge \neg E) \cdot P(\neg R | \neg E) \cdot P(G | A) \cdot P(W | A)\)
Step 2. Plug in numbers:
\(= 0.9999 \times 0.9997 \times 0.01 \times 0.9998 \times 0.40 \times 0.80\)
\(\approx \mathbf{3.2 \times 10^{-3}}\)
Q. What is \(P(\neg B \wedge \neg E \wedge \neg A \wedge \neg R \wedge \neg G \wedge \neg W)\)?
\(A\) and \(C\) talk through \(B\).
\(B\) and \(C\) share a hidden cause \(A\).
\(A\) and \(B\) both cause \(C\) (a v-structure).
Each structure has its own independence pattern. Let's check.
Q1. Are \(B\) and \(W\) unconditionally independent?
Q2. Are \(B\) and \(W\) conditionally independent given \(A\)?
Q1. Are \(W\) and \(G\) unconditionally independent?
Q2. Are \(W\) and \(G\) conditionally independent given \(A\)?
Q1. Are \(E\) and \(B\) unconditionally independent?
Q2. Are \(E\) and \(B\) conditionally independent given \(A\)?
The v-structure is the surprise: conditioning on the middle node opens a path instead of blocking it.
| Structure | Are endpoints independent? | Given the middle node? |
|---|---|---|
| Chain \(A \to B \to C\) | No | Yes blocked |
| Common cause \(B \leftarrow A \to C\) | No | Yes blocked |
| Common effect \(A \to C \leftarrow B\) | Yes | No opens up |
In the chain and common-cause structures, knowing the middle node blocks information flow. In the v-structure, knowing the middle node opens it (explaining away).
The three structures cover paths of length 2. For arbitrary paths in a Bayes net, we generalize to d-separation — the rule for reading off all (conditional) independences from any DAG.