Lecture 10
RN 14.1 & 14.2.1 · PM 8.5.1–8.5.3
So far, we reasoned about a static world. But the world evolves over time — we need to reason about sequences of events.
Today's running example: an umbrella story — the simplest possible HMM.
You are a security guard stationed at a secret underground facility.
You want to know whether it is raining outside today.
Your only signal: each morning, the director walks in with or without an umbrella.
Q. What are the random variables in the umbrella world?
"The future is independent of the past given the present."
First-order Markov (the default):
\(P(S_t | S_{t-1}, \ldots, S_0) = P(S_t | S_{t-1})\)
Second-order: also depends on \(S_{t-2}\) (blue skip-edges).
\(P(S_t | S_{t-1}, S_{t-2})\)
Stationary process: the same CPT is used at every time step.
Warm-up: what is \(P(S_K)\) for large \(K\)?
\(P(S_1 = T) = 0.5 \cdot 0.7 + 0.5 \cdot 0.3 = 0.5\)
\(P(S_K = T) = 0.5\) for all \(K\) — the chain has reached its steady state immediately.
\(S_t = T\) iff it rains on day \(t\). \(O_t = T\) iff the director carries an umbrella on day \(t\).
By the same reasoning as before, \(P(O_K = T) = 0.5 \cdot 0.9 + 0.5 \cdot 0.2 = 0.55\) for all \(K\) (steady state, no evidence).
An HMM is a Bayes net — so the variable elimination algorithm already works! But two specialized algorithms exploit the chain structure:
What state am I in right now, given all evidence to date?
What state will I be in in the future?
What state was I in at some past time?
Which sequence of states is most likely?
Today: filtering.
Given observations from time \(0\) to time \(k\), what is my distribution over the state now?
\(P(S_k | o_{0:k})\)
Yellow = query. Gray = observed.
Tiny case (\(k = 1\)): \(P(S_1 | o_0, \neg o_1) \propto \sum_{s_0} P(s_0)\, P(o_0 | s_0)\, P(S_1 | s_0)\, P(\neg o_1 | S_1)\)
3 mul × 2 + 1 add, all times 2 \(\Rightarrow\) 14 ops — manageable.
General case: marginalize all hidden states
\(P(S_k | o_{0:k}) \propto \sum_{s_0} \cdots \sum_{s_{k-1}} P(s_0)\, P(o_0 | s_0) \cdots P(o_k | S_k)\)
Sum over all \(2^k\) hidden state assignments.
\(O(k \cdot 2^k)\) ops
Reuse \(f_{0:(k-1)}\) when computing \(f_{0:k}\).
\(O(k)\) ops
Let \(f_{0:k} \;\overset{\Delta}{=}\; P(S_k | o_{0:k})\).
Total cost: \(O(k)\) ops — exponentially faster than naive enumeration.
\(\alpha\) is a normalization constant (chosen so the two entries of \(f_{0:k}\) sum to 1).
\(f_{0:k} = \underbrace{\alpha\, P(o_k | S_k)}_{\textbf{update}} \cdot \underbrace{\sum_{s_{k-1}} P(S_k | s_{k-1})\, f_{0:(k-1)}}_{\textbf{predict}}\)
Recursive Bayesian update — one step at a time.
The umbrella story. Given \(O_0 = t,\; O_1 = t\), compute \(f_{0:0}\) and \(f_{0:1}\) using forward recursion.
Use \(f_{0:0} = \alpha\, P(o_0 | S_0)\, P(S_0)\), treating each as a 2-vector \(\langle T, F\rangle\).
\(f_{0:0} = \alpha\, \langle P(o_0 | s_0),\, P(o_0 | \neg s_0)\rangle \cdot \langle P(s_0),\, P(\neg s_0)\rangle\)
\(= \alpha\, \langle 0.9,\, 0.2\rangle \cdot \langle 0.5,\, 0.5\rangle\)
\(= \alpha\, \langle 0.45,\, 0.1\rangle\)
\(= \langle 0.818,\, 0.182\rangle\)
Normalize: \(0.45 / (0.45 + 0.1) \approx 0.818\); \(0.1 / 0.55 \approx 0.182\).
\(f_{0:1} = \alpha\, P(o_1 | S_1) \sum_{s_0} P(S_1 | s_0)\, f_{0:0}\) with \(f_{0:0} = \langle 0.818, 0.182\rangle\).
\(= \alpha\, \langle 0.9, 0.2\rangle \cdot \bigl(\langle 0.7, 0.3\rangle \cdot 0.818 + \langle 0.3, 0.7\rangle \cdot 0.182\bigr)\)
\(= \alpha\, \langle 0.9, 0.2\rangle \cdot \bigl(\langle 0.5726, 0.2454\rangle + \langle 0.0546, 0.1274\rangle\bigr)\)
\(= \alpha\, \langle 0.9, 0.2\rangle \cdot \langle 0.6272, 0.3728\rangle\)
\(= \alpha\, \langle 0.56448, 0.07456\rangle\)
\(= \langle 0.883, 0.117\rangle\)
After two days of umbrellas, our belief that it rained today went from 0.5 \(\to\) 0.818 \(\to\) 0.883.
\(f_{0:0} = \langle 0.75, 0.25 \rangle\) → \(f_{0:1} = \langle 0.859, 0.141 \rangle\) → \(\boxed{f_{0:2} \approx \langle 0.884, 0.116 \rangle}\)
Six small steps from \(P(S_k | o_{0:k})\) to the forward-recursion formula:
Pick one of: (A) rewrite (B) Bayes' rule (C) chain rule (D) Markov (E) sum rule
Pick one of: (A) rewrite (B) Bayes' rule (C) chain rule (D) Markov (E) sum rule
\(\boxed{\;f_{0:k} = \underbrace{\alpha\, P(o_k | S_k)}_{\textbf{update}} \cdot \underbrace{\sum_{s_{k-1}} P(S_k | s_{k-1})\, f_{0:(k-1)}}_{\textbf{predict}}\;}\)
All three are variations on the same predict-update recursion.