Lecture 10
RN 14.1 & 14.2.1 · PM 8.5.1–8.5.3
So far, we reasoned about a static world. But the world evolves over time — we need to reason about sequences of events.
Today's running example: an umbrella story — the simplest possible HMM.
You are a security guard stationed at a secret underground facility.
You want to know whether it is raining outside today.
Your only signal: each morning, the director walks in with or without an umbrella.
Q. What are the random variables in the umbrella world?
"The future is independent of the past given the present."
First-order Markov (the default):
\(P(S_t | S_{t-1}, \ldots, S_0) = P(S_t | S_{t-1})\)
Second-order: also depends on \(S_{t-2}\) (blue skip-edges).
\(P(S_t | S_{t-1}, S_{t-2})\)
Stationary process: the same CPT is used at every time step.
Warm-up: what is \(P(S_K)\) for large \(K\)?
\(P(S_1 = T) = 0.5 \cdot 0.7 + 0.5 \cdot 0.3 = 0.5\)
\(P(S_K = T) = 0.5\) for all \(K\) — the chain reaches its steady state immediately.
Convention: in a CPT a lowercase letter means that variable is true (\(\neg\) = false) — e.g. \(P(o_t \mid s_t)\) is P(umbrella | rain).
\(S_t = T\) iff it rains on day \(t\). \(O_t = T\) iff the director carries an umbrella on day \(t\).
By the same reasoning as before, \(P(O_K = T) = 0.5 \cdot 0.9 + 0.5 \cdot 0.2 = 0.55\) for all \(K\) (steady state, no evidence).
An HMM is a Bayes net — so the variable elimination algorithm already works! But two specialized algorithms exploit the chain structure:
What state am I in right now, given all evidence to date?
What state will I be in in the future?
What state was I in at some past time?
Which sequence of states is most likely?
Today: filtering.
Given observations from time \(0\) to time \(k\), what is my distribution over the state now?
\(P(S_k | o_{0:k})\)
Yellow = query, gray = observed. \(o_{0:k}\) denotes the observed values we condition on (the evidence).
By the definition of conditional probability:
\(P(S_k \mid o_{0:k}) = \dfrac{P(S_k,\, o_{0:k})}{P(o_{0:k})}\)
The denominator \(P(o_{0:k})\) is the same for every value of \(S_k\) — a normalizing constant \(\alpha\):
\(P(S_k \mid o_{0:k}) \;\propto\; P(S_k,\, o_{0:k})\)
Recover that joint by summing out the hidden states \(s_0, \ldots, s_{k-1}\):
\(P(S_k,\, o_{0:k}) = \sum_{s_0} \cdots \sum_{s_{k-1}} P(s_0, \ldots, s_{k-1},\, S_k,\, o_{0:k})\)
Factor the joint with the HMM structure (\(s_k = S_k\)):
\(P(S_k | o_{0:k}) \propto \sum_{s_0} \cdots \sum_{s_{k-1}} P(s_0)\, P(o_0 | s_0) \prod_{i=1}^{k} P(s_i | s_{i-1})\, P(o_i | s_i)\)
Tiny case (\(k = 1\)):
\(P(S_1 | o_0, o_1) \propto \sum_{s_0} P(s_0)\, P(o_0 | s_0)\, P(S_1 | s_0)\, P(o_1 | S_1)\)
3 mul × 2 + 1 add, all times 2 \(\Rightarrow\) 14 ops — manageable.
For general \(k\), the sum ranges over all \(2^k\) hidden-state assignments.
Sum over all \(2^k\) hidden state assignments.
\(O(k \cdot 2^k)\) ops
Reuse \(f_{0:(k-1)}\) when computing \(f_{0:k}\).
\(O(k)\) ops
Let \(f_{0:k} \;\overset{\Delta}{=}\; P(S_k | o_{0:k})\).
Total cost: \(O(k)\) ops — exponentially faster than naive enumeration.
\(\alpha\) is a normalization constant (chosen so the two entries of \(f_{0:k}\) sum to 1).
\(f_{0:k} = \underbrace{\alpha\, P(o_k | S_k)}_{\textbf{update}} \cdot \underbrace{\sum_{s_{k-1}} P(S_k | s_{k-1})\, f_{0:(k-1)}}_{\textbf{predict}}\)
Recursive Bayesian update — one step at a time.
The umbrella story. Given \(O_0 = T,\; O_1 = T\) (umbrella seen on both days), compute \(f_{0:0}\) and \(f_{0:1}\) using forward recursion.
Use \(f_{0:0} = \alpha\, P(o_0 | S_0)\, P(S_0)\), treating each as a 2-vector \(\langle T, F\rangle\).
\(f_{0:0} = \alpha\, \langle P(o_0 | s_0),\, P(o_0 | \neg s_0)\rangle \cdot \langle P(s_0),\, P(\neg s_0)\rangle\)
\(= \alpha\, \langle 0.9,\, 0.2\rangle \cdot \langle 0.5,\, 0.5\rangle\)
\(= \alpha\, \langle 0.45,\, 0.1\rangle\)
\(= \langle 0.818,\, 0.182\rangle\)
Normalize: \(0.45 / (0.45 + 0.1) \approx 0.818\); \(0.1 / 0.55 \approx 0.182\).
\(f_{0:1} = \alpha\, P(o_1 | S_1) \sum_{s_0} P(S_1 | s_0)\, f_{0:0}\) with \(f_{0:0} = \langle 0.818, 0.182\rangle\).
\(= \alpha\, \langle 0.9, 0.2\rangle \cdot \bigl(\langle 0.7, 0.3\rangle \cdot 0.818 + \langle 0.3, 0.7\rangle \cdot 0.182\bigr)\)
\(= \alpha\, \langle 0.9, 0.2\rangle \cdot \bigl(\langle 0.5726, 0.2454\rangle + \langle 0.0546, 0.1274\rangle\bigr)\)
\(= \alpha\, \langle 0.9, 0.2\rangle \cdot \langle 0.6272, 0.3728\rangle\)
\(= \alpha\, \langle 0.56448, 0.07456\rangle\)
\(= \langle 0.883, 0.117\rangle\)
After two days of umbrellas, our belief that it rained today went from 0.5 \(\to\) 0.818 \(\to\) 0.883.
\(f_{0:0} = \langle 0.75, 0.25 \rangle\) → \(f_{0:1} = \langle 0.859, 0.141 \rangle\) → \(\boxed{f_{0:2} \approx \langle 0.884, 0.116 \rangle}\)
Six small steps from \(P(S_k | o_{0:k})\) to the forward-recursion formula:
Pick one of: (A) rewrite (B) Bayes' rule (C) chain rule (D) Markov (E) sum rule
Pick one of: (A) rewrite (B) Bayes' rule (C) chain rule (D) Markov (E) sum rule
\(\boxed{\;f_{0:k} = \underbrace{\alpha\, P(o_k | S_k)}_{\textbf{update}} \cdot \underbrace{\sum_{s_{k-1}} P(S_k | s_{k-1})\, f_{0:(k-1)}}_{\textbf{predict}}\;}\)
All three are variations on the same predict-update recursion.