Lecture 11
RN 14.2.2
From past evidence \(o_{0:k}\), filtering gives the posterior over the current state:
\(f_{0:k} = P(S_k | o_{0:k}) = \alpha\, P(o_k | S_k) \sum_{s_{k-1}} P(S_k | s_{k-1})\, f_{0:(k-1)}\)
Today: a query about a past state, given all the evidence we have.
\(P(S_k | o_{0:(t-1)}), \quad 0 \leq k \leq t-1\)
Split the evidence around \(S_k\) into past and future:
\(P(S_k | o_{0:(t-1)}) \;=\; \alpha \, \underbrace{P(S_k | o_{0:k})}_{\textbf{forward } f_{0:k}} \, \underbrace{P(o_{(k+1):(t-1)} | S_k)}_{\textbf{backward } b_{(k+1):(t-1)}}\)
Two sweeps meet at \(S_k\); multiply pointwise and normalize.
Compute \(b_{(k+1):(t-1)} \;\overset{\Delta}{=}\; P(o_{(k+1):(t-1)} | S_k)\) right-to-left.
Same factor sizes as forward recursion — total cost \(O(t)\) for the whole backward pass.
No normalization here: \(b\) is a likelihood, not a probability. Normalize after multiplying \(f \times b\).
Umbrella story, given \(O_0 = O_1 = O_2 = t\). Compute the smoothed beliefs:
From L10 we already have \(f_{0:0} = \langle 0.818, 0.182\rangle\) and \(f_{0:1} = \langle 0.883, 0.117\rangle\).
\(b_{2:2} = P(o_2 | S_1) = \sum_{s_2} P(o_2 | s_2)\, b_{3:2}\, P(s_2 | S_1)\) with \(b_{3:2} = \langle 1, 1\rangle\):
\(= 0.9 \cdot 1 \cdot \langle 0.7, 0.3\rangle + 0.2 \cdot 1 \cdot \langle 0.3, 0.7\rangle\)
\(= \langle 0.63, 0.27\rangle + \langle 0.06, 0.14\rangle = \langle \mathbf{0.69}, \mathbf{0.41}\rangle\)
\(b_{1:2} = P(o_{1:2} | S_0) = \sum_{s_1} P(o_1 | s_1)\, b_{2:2}\, P(s_1 | S_0)\):
\(= 0.9 \cdot 0.69 \cdot \langle 0.7, 0.3\rangle + 0.2 \cdot 0.41 \cdot \langle 0.3, 0.7\rangle\)
\(= \langle 0.4347, 0.1863\rangle + \langle 0.0246, 0.0574\rangle = \langle \mathbf{0.4593}, \mathbf{0.2437}\rangle\)
\(P(S_1 | o_{0:2}) = \alpha\, f_{0:1} \cdot b_{2:2}\):
\(= \alpha\, \langle 0.883, 0.117\rangle \cdot \langle 0.69, 0.41\rangle = \alpha\, \langle 0.6093, 0.0480\rangle = \langle \mathbf{0.927}, \mathbf{0.073}\rangle\)
\(P(S_0 | o_{0:2}) = \alpha\, f_{0:0} \cdot b_{1:2}\):
\(= \alpha\, \langle 0.818, 0.182\rangle \cdot \langle 0.4593, 0.2437\rangle = \alpha\, \langle 0.3757, 0.0444\rangle = \langle \mathbf{0.894}, \mathbf{0.106}\rangle\)
| \(P(S_1 = T \mid \ldots)\) | |
|---|---|
| Filtering with \(o_{0:1}\) only | 0.883 |
| Smoothing with \(o_{0:2}\) | 0.927 |
The extra "umbrella on day 2" makes "rain on day 1" more likely — future evidence helps.
Four small steps from \(P(S_k | o_{0:(t-1)})\) to the forward-backward product:
Try to identify the rule before the label appears.
Five steps from \(P(o_{(k+1):(t-1)} | S_k)\) to the recursive form:
The final factor is exactly \(P(o_{k+1} | s_{k+1})\, b_{(k+2):(t-1)}\, P(s_{k+1} | S_k)\) — the backward recursion.
Compute the smoothed posterior at every time step \(k\) with one forward sweep + one backward sweep.
Total cost: \(O(t)\) — one full sweep in each direction.
Compute \(P(S_2 \mid o_0 \wedge o_1 \wedge o_2 \wedge \neg o_3)\) — here \(k = 2,\; t = 4\).
Forward (from L10): \(f_{0:2} \approx \langle 0.884, 0.116 \rangle\)
Backward: \(b_{3:3} = 0.1\langle 0.7, 0.2\rangle + 0.8\langle 0.3, 0.8\rangle = \langle 0.31, 0.66 \rangle\)
Smoothed: \(\alpha \langle 0.884, 0.116\rangle \cdot \langle 0.31, 0.66\rangle = \alpha \langle 0.274, 0.0766\rangle = \boxed{\langle 0.782, 0.218\rangle}\)
Find the single best sequence of hidden states that explains the observations:
\(\hat{s}_{0:t-1} = \arg\max_{s_{0:t-1}} P(s_{0:t-1} | o_{0:t-1})\)
Brute force: enumerate every state sequence.
\(n^t\) sequences — intractable.
Dynamic programming to the rescue: the Viterbi algorithm.
The highest probability of any state sequence \(s_{0:k}\) ending with \(S_k = j\), given observations \(o_{0:k}\).
Trellis: one column per time step, one node per state value. Red = best path entering each node.
\(\pi_k(j)\) = best probability of any path ending at state \(j\) at time \(k\). \(\phi_k(j)\) remembers which predecessor produced it — we'll use it to trace the best sequence back.
One sweep fills the trellis; one walk back along the backpointers extracts the MAP sequence.
Best path: \(\hat{s} = (1, 0, 1, 2, 1)\).
Complexity: \(O(t \cdot n^2)\) — one trellis edge per cell per step. Linear in time, quadratic in #states.
So far we have beliefs. Next we add preferences — how should an agent act when outcomes are uncertain?