⏰ Chat 5 is due TODAY (Thu Jun 18), 11:59 PM on Chrysalis.
📅 Assignment 1 is now due Thu Jun 25 — it was originally due today (Jun 18), but the deadline was extended by a week. The PDF, starter code, and model file were updated on LEARN; please re-download.
📝 Assignment 1 questions → Dake Zhang on Piazza.
Recap: the Viterbi recursion
Last lecture — the most-likely explanation: the single best hidden-state sequence \(\hat s_{0:t-1} = \arg\max_{s_{0:t-1}} P(s_{0:t-1} \mid o_{0:t-1})\). Let \(\pi_k(j)\) = highest probability of any path ending in state \(j\) at time \(k\).
The inner max has no \(j\) in it — so one best path per node serves every successor (\(\phi_k(j)\) records the winning \(z\)). Constants don't change the argmax, so we work with the joint.
\(\hat s_{k-1} = \phi_k(\hat s_k),\quad k = t{-}1,\dots,1\)
repeat down the chain \(\ \blacksquare\)
Splitting the argmax one state at a time turns the global search into \(t\) tiny argmaxes — and each one is exactly a backpointer \(\phi_k\).
Search›Uncertainty›Decisions›Learning
Learning goals
Model a one-off decision problem as a decision network (chance / decision / utility nodes).
Find the optimal policy by computing the expected utility of every policy.
Find the optimal policy via the variable elimination algorithm.
Quantify the value of information: what's a piece of evidence worth?
Apply VEA to a multi-decision network with an information ordering.
Decision theory: how should an agent act?
Probability theory: what should the agent believe from the evidence?
Utility theory: what does the agent want? (Assign a real number to each world.)
Maximum expected utility (MEU)
A rational agent picks the action that maximizes its expected utility.
Decision network = Bayes net + actions + utilities.
Running example: a mail-pickup robot
A robot must fetch the mail. There's a short, slippery route and a long, safe route. It can wear pads (slow it down, but reduce damage if it slips).
What should the robot do?
Three kinds of nodes in a decision network
Chance node
random variable (BN)
Decision node
action (we choose)
Utility node
agent's happiness
For the robot: \(A\) (chance) = whether an accident occurs; \(P\) (decision) = wear pads; \(S\) (decision) = take short route; \(U\) (utility) = the robot's happiness.
The robot's decision network
\(P(A|\neg S) = 0,\; P(A|S) = q\) · arcs to Utility revealed on click.
Q1. Which variables directly influence the robot's happiness?
\(P\) only
\(S\) only
\(A\) only
Two of (A), (B), (C)
All of (A), (B), (C)
E. Pads (weight), Short (speed), Accident (damage) all matter — all three are parents of Utility.
The robot's utility table
\(P\)
\(S\)
\(A\)
Description
\(U\)
\(\neg P\)
\(\neg S\)
\(\neg A\)
slow, no weight
6
\(\neg P\)
\(\neg S\)
\(A\)
impossible (long route + accident)
—
\(\neg P\)
\(S\)
\(\neg A\)
quick, no weight
10
\(\neg P\)
\(S\)
\(A\)
severe damage
0
\(P\)
\(\neg S\)
\(\neg A\)
slow, extra weight
4
\(P\)
\(\neg S\)
\(A\)
impossible
—
\(P\)
\(S\)
\(\neg A\)
quick, extra weight
8
\(P\)
\(S\)
\(A\)
moderate damage
2
Pads slow the robot down (lower utility), but cushion impact in an accident.
Reading the utility function
Q2. When an accident does not happen, which is true?
Robot prefers no pads over pads.
Robot prefers long route over short.
Both (A) and (B).
Neither (A) nor (B).
A. No pads is faster (\(6 > 4,\; 10 > 8\)). But the robot does prefer the short route (\(10 > 6\)).
When an accident does happen?
The robot must have taken the short route (long route never crashes).
Now the robot prefers wearing pads: \(U(P, S, A) = 2 > 0 = U(\neg P, S, A)\). Pads cushion the fall.
Evaluating a decision network
How do we choose the optimal action?
evaluate-decision-network(DN, evidence)
Fix every evidence variable to its observed value.
For each assignment to the decision variables: compute the posterior of the utility node's parents, then compute the expected utility.
Return the assignment with the maximum expected utility.
For the robot: enumerate the 4 combinations of \((P, S)\).