Model a one-off decision problem as a decision network (chance / decision / utility nodes).
Find the optimal policy by computing the expected utility of every policy.
Find the optimal policy via the variable elimination algorithm.
Quantify the value of information: what's a piece of evidence worth?
Apply VEA to a multi-decision network with an information ordering.
Decision theory: how should an agent act?
Probability theory: what should the agent believe from the evidence?
Utility theory: what does the agent want? (Assign a real number to each world.)
Maximum expected utility (MEU)
A rational agent picks the action that maximizes its expected utility.
Decision network = Bayes net + actions + utilities.
Running example: a mail-pickup robot
A robot must fetch the mail. There's a short, slippery route and a long, safe route. It can wear pads (slow it down, but reduce damage if it slips).
What should the robot do?
Three kinds of nodes in a decision network
Chance node
random variable (BN)
Decision node
action (we choose)
Utility node
agent's happiness
For the robot: \(A\) (chance) = whether an accident occurs; \(P\) (decision) = wear pads; \(S\) (decision) = take short route; \(U\) (utility) = the robot's happiness.
The robot's decision network
\(P(A|\neg S) = 0,\; P(A|S) = q\) · arcs to Utility revealed on click.
Q1. Which variables directly influence the robot's happiness?
\(P\) only
\(S\) only
\(A\) only
Two of (A), (B), (C)
All of (A), (B), (C)
E. Pads (weight), Short (speed), Accident (damage) all matter — all three are parents of Utility.
The robot's utility table
\(P\)
\(S\)
\(A\)
Description
\(U\)
\(\neg P\)
\(\neg S\)
\(\neg A\)
slow, no weight
6
\(\neg P\)
\(\neg S\)
\(A\)
impossible (long route + accident)
—
\(\neg P\)
\(S\)
\(\neg A\)
quick, no weight
10
\(\neg P\)
\(S\)
\(A\)
severe damage
0
\(P\)
\(\neg S\)
\(\neg A\)
slow, extra weight
4
\(P\)
\(\neg S\)
\(A\)
impossible
—
\(P\)
\(S\)
\(\neg A\)
quick, extra weight
8
\(P\)
\(S\)
\(A\)
moderate damage
2
Pads slow the robot down (lower utility), but cushion impact in an accident.
Reading the utility function
Q2. When an accident does not happen, which is true?
Robot prefers no pads over pads.
Robot prefers long route over short.
Both (A) and (B).
Neither (A) nor (B).
A. No pads is faster (\(6 > 4,\; 10 > 8\)). But the robot does prefer the short route (\(10 > 6\)).
When an accident does happen?
The robot must have taken the short route (long route never crashes).
Now the robot prefers wearing pads: \(U(P, S, A) = 2 > 0 = U(\neg P, S, A)\). Pads cushion the fall.
Evaluating a decision network
How do we choose the optimal action?
evaluate-decision-network(DN, evidence)
Fix every evidence variable to its observed value.
For each assignment to the decision variables: compute the posterior of the utility node's parents, then compute the expected utility.
Return the assignment with the maximum expected utility.
For the robot: enumerate the 4 combinations of \((P, S)\).