Apply d-separation to check (conditional) independence in any Bayes net
Given a BN and a variable ordering, construct a correct Bayes net
Distinguish correlation from causation
Recap: the three structures from L7
Chain
Common cause
V-structure
Today: generalize the 3 structures to arbitrary paths using d-separation.
D-separation: the rule
D-separation. A set of observed variables \(E\) d-separates \(X\) and \(Y\) iff \(E\) blocks every undirected path between \(X\) and \(Y\).
If \(E\) d-separates \(X\) and \(Y\), then \(X\) and \(Y\) are conditionally independent given \(E\).
A single open path is enough to break independence. We have to check every path and ask: is it blocked?
The three blocking rules
A path through middle node \(B\) is blocked iff:
Chain \(X \to B \to Y\)
Blocked when \(B \in E\) (middle is observed)
Common cause \(X \leftarrow B \to Y\)
Blocked when \(B \in E\) (middle is observed)
V-structure \(X \to B \leftarrow Y\)
Reversed: blocked when neither \(B\) nor any of its descendants are in \(E\)
Quiz pack 1: chain blocking
Q1. Are TravelSubway and HighTemp independent?
No. The path Subway \(\to\) Flu \(\to\) Fever \(\to\) HighTemp has two chain middles (Flu, Fever); both unobserved \(\Rightarrow\) path open.
Q2. Are TravelSubway and HighTemp independent given Flu?
Yes. Observing Flu blocks the chain at the first middle \(\Rightarrow\) path closed.
Quiz pack 2: another chain test
Q3. Are Aches and HighTemp independent?
No. Path Aches \(\leftarrow\) Flu \(\to\) Fever \(\to\) HighTemp: common cause (Flu, unobserved) and chain (Fever, unobserved). Both open \(\Rightarrow\) path open.
Q4. Are Aches and HighTemp independent given Flu?
Yes. Observing Flu (the common cause) blocks the path at the first middle.
Quiz pack 3: v-structure with a descendant
Q5. Are Flu and ExoticTrip independent?
Yes. Path Flu \(\to\) Fever \(\leftarrow\) Malaria \(\leftarrow\) ExoticTrip. Fever is a v-structure middle; neither Fever nor its descendant HighTemp is observed \(\Rightarrow\) v-structure blocks the path.
Q6. Independent given HighTemp?
No. HighTemp is a descendant of the v-structure middle Fever, so the v-structure now opens the path (explaining-away).
Many correct Bayes nets exist
For a fixed joint distribution, more than one Bayes net is "correct".
Correctness: Bayes net B is correct (w.r.t. A) iff every independence that B requires also holds in A.
Missing independence is OK — a denser network still encodes the joint.
Missing dependence is NOT OK — a too-sparse network forces a wrong independence.
We prefer the BN with the fewest probabilities — usually the one with fewest edges.
Construction algorithm
Order the variables \(X_1, \ldots, X_n\).
For each \(X_i\), choose the smallest subset of \(\{X_1, \ldots, X_{i-1}\}\) such that, given those parents, \(X_i\) is independent of the rest.
Add edges from each parent to \(X_i\); write down \(P(X_i \,|\, \mathrm{Parents}(X_i))\).
Ordering matters: a bad order can force every later variable to depend on many earlier ones.
Example 1: chain BN, order \(W, A, B\)
Original BN:
Set { } — add \(W\) first.Set { W } — is \(A\) dependent on \(W\)? Yes (chain in original). Add edge \(W \to A\).Set { W, A } — \(B\) and \(W\) independent given \(A\)? Yes (chain blocked). So \(A\) is \(B\)'s only parent.
Final: \(W \to A \to B\). Same shape as the original (2 edges).
Example 1 alt: same BN, order \(A, W, B\)
Original BN:
Set { } — add \(A\) first.Set { A } — \(W\) depends on \(A\). Add \(A \to W\).Set { A, W } — \(B\) and \(W\) independent given \(A\)? Yes. \(B\) and \(A\) independent given \(W\)? No. So only \(A\) is \(B\)'s parent.
Final: \(A \to W\) and \(A \to B\). Different shape, still 2 edges — different but equally compact.
Example 2: common cause, order \(W, G, A\)
Original BN:
Set { } — add \(W\).Set { W } — \(G\) and \(W\) dependent (shared cause \(A\) is hidden). Add \(W \to G\).Set { W, G } — \(A\) is the parent of both. Both \(W\) and \(G\) must be parents of \(A\).
Final: 3 edges (\(W \to G, W \to A, G \to A\)) — more than the original's 2 edges. Suboptimal!
Example 3: v-structure, order \(A, B, E\)
Original BN:
Set { } — add \(A\).Set { A } — \(B\) depends on \(A\) (direct neighbours). Add \(A \to B\).Set { A, B } — \(E\) and \(B\) NOT independent given \(A\) (v-structure middle observed!). \(E\) and \(A\) always dependent. Both are \(E\)'s parents.
Final: 3 edges (\(A \to B, A \to E, B \to E\)) — more than the original's 2. Reversed v-structures are expensive.
Holmes with a bad order: \(G, W, E, B, A, R\)
Adding effects before causes forces every later node to depend on every earlier one.
\(1 + 2 + 4 + 8 + 16 + 2 = \mathbf{33}\) probabilities — vs 12 with the causal order.
Pick a causal order
Causes precede effects. Add root causes first; effects last.
Example
Original
Order
Reconstructed
#Edges
Ex 1
\(B \to A \to W\)
\(W, A, B\)
\(W \to A \to B\)
2 (same)
Ex 1 alt
\(B \to A \to W\)
\(A, W, B\)
\(A \to W, A \to B\)
2 (different)
Ex 2
\(A \to W, A \to G\)
\(W, G, A\)
\(W \to G, W \to A, G \to A\)
3
Ex 3
\(E \to A, B \to A\)
\(A, B, E\)
\(A \to B, A \to E, B \to E\)
3
Holmes
12 probs
bad: \(G, W, E, B, A, R\)
10-edge mess
33 probs
Finding the most compact BN is NP-hard in general — but a causal ordering is a good heuristic.
Correlation \(\ne\) causation
Two variables can be highly correlated without one causing the other.
Ice cream sales correlate with shark attacks. Does ice cream attract sharks?
No — both rise in summer. Temperature is a hidden common cause.
An edge in a Bayes net is associational, not necessarily causal.
Confounding variables
Bigger shoes correlate with better reading scores.
Hidden cause: Age. Older children have bigger feet and read better.
The Shoe \(\to\) Reading "effect" is spurious; controlling for Age would erase it.
Causal intervention: the \(\mathrm{do}\) operator
Observing vs. forcing:
\(P(R | \text{Shoe} = 1)\) — what we see when we condition (large feet correlate with reading).
\(P(R | \mathrm{do}(\text{Shoe} = 1))\) — what would happen if we intervene and force big shoes onto a random child.
Average Treatment Effect (adjusting for confounders):