CS 486/686 Lecture 22 — Course Recap

CS 486/686
Course Recap

Yuntian Deng

Lecture 22

Putting the term together: search, uncertainty, decisions, learning

The course in one picture

Search I — uninformed

Problem formulation

States, actions, transition model, initial & goal states, path cost.
Search tree explores states by expanding the frontier.

Pick the data structure for the frontier → pick the algorithm.

DFS vs BFS vs IDS

DFS: stack → \(O(bm)\) space, not complete.
BFS: queue → complete, \(O(b^d)\) time and space.
IDS: DFS in increasing depth → BFS-completeness, DFS-space.

IDS combines the best of both.

Search II — heuristic + A*

The three frontier rules

LCFS: min \(\text{cost}(n)\).
GBFS: min \(h(n)\).
A*: min \(f(n) = \text{cost}(n) + h(n)\).

Heuristic properties

Admissible (\(h \le h^*\)) ⇒ A* is optimal with multi-path pruning off.
Consistent (\(h(n) \le c(n,n') + h(n')\)) ⇒ safe to use multi-path pruning.
Consistent ⇒ admissible (not the other way around).

Search strategy summary

Strategy	Frontier choice	Complete?	Space	Time
Depth-first	Last added	No	Linear	Exp
Breadth-first	First added	Yes	Exp	Exp
Lowest-cost first	min \(\text{cost}(n)\)	Yes	Exp	Exp
Greedy best-first	min \(h(n)\)	No	Exp	Exp
A*	min \(\text{cost}(n) + h(n)\)	Yes	Exp	Exp

A* dominates when the heuristic is informative enough to prune most of the tree.

CSPs + local search

Constraint satisfaction

Generate-and-test is exponential — exploit problem structure.
Backtracking: incremental, prune as soon as a constraint fails.
Arc consistency / AC-3: shrink domains before/during search; complexity \(O(cd^3)\).

Local search

Trade completeness for very large state spaces.
Greedy descent / hill-climbing: always move to best neighbour.
Beats local minima with random restarts + simulated annealing (accept worse moves with prob \(e^{-\Delta/T}\)).

Probability + independence

The four rules

Product: \(P(A,B) = P(A\mid B)\,P(B)\)
Sum: \(P(A) = \sum_b P(A, B{=}b)\)
Chain: \(P(X_1,\ldots,X_n) = \prod_i P(X_i \mid X_{<i})\)
Bayes: \(P(H\mid e) \propto P(e\mid H)\,P(H)\)

Independence

Unconditional: \(P(A,B) = P(A)\,P(B)\).
Conditional: \(P(A,B\mid C) = P(A\mid C)\,P(B\mid C)\).
Tells us which terms drop out of the chain rule — the key to scaling up inference.

Bayesian networks + d-separation

Construction recipe

Pick a variable order (causal often works best).
Add each node; choose the smallest set of parents that makes it conditionally independent of the rest.
Joint factorizes as \(P(X_1,\ldots,X_n) = \prod_i P(X_i \mid \mathrm{pa}(X_i))\).

3 d-separation patterns

Chain \(A \to B \to C\): blocked by observing \(B\).
Common cause \(A \leftarrow B \to C\): blocked by observing \(B\).
Common effect \(A \to B \leftarrow C\): blocked by not observing \(B\) or descendants.

A path is blocked ⇒ the endpoints are conditionally independent.

Inference — VE & HMM filtering

Variable elimination

Define a factor for each CPT.
Restrict factors to evidence.
Multiply factors sharing a variable; sum out the hidden one.
Normalize to get a posterior.

HMM filtering

Recursion: \(P(S_t \mid o_{0:t}) \propto P(o_t \mid S_t)\, \sum_{s_{t-1}} P(S_t \mid s_{t-1})\, P(s_{t-1} \mid o_{0:t-1})\)
One forward pass: \(O(t \cdot |S|^2)\) time.

HMM decoding — smoothing & Viterbi

Forward×backward smoothing

Forward: \(\alpha_k = P(S_k \mid o_{0:k})\).
Backward: \(\beta_k = P(o_{k+1:t-1} \mid S_k)\).
Posterior: \(P(S_k \mid o_{0:t-1}) \propto \alpha_k\, \beta_k\).

Viterbi

DP over the trellis, max instead of sum.
Backtrack the argmax pointers to recover the MAP sequence.

Decision networks

Three node types

Chance nodes — like a Bayes net (CPTs).
Decision nodes — rectangles, no CPT, chosen by us.
Utility node — diamond, deterministic function of its parents.

Maximum expected utility

Policy \(\pi(d \mid \text{parents})\); MEU picks \(\pi^*\) maximizing \(\mathbb{E}[U]\).
Evaluate by enumeration or by running VE on the DN.
Value of information: EU gain from observing a variable before deciding.

Markov decision processes

Setup

States, actions, transition \(P(s' \mid s, a)\), reward \(R(s)\), discount \(\gamma\).
Policy \(\pi : S \to A\); value \(V^\pi(s) = \mathbb{E}[\sum_t \gamma^t R(s_t) \mid \pi]\).

Bellman + the two algorithms

Bellman optimality: \(V^*(s) = R(s) + \gamma \max_a \sum_{s'} P(s' \mid s, a)\, V^*(s')\).
Value iteration: apply the Bellman update until convergence.
Policy iteration: evaluate (linear system) ↔ improve (greedy) — converges fast.

Reinforcement learning

Two learning regimes

Passive ADP: follow a fixed policy, estimate \(\hat P, \hat R\), solve via PE.
Active ADP: also pick actions — balance exploration vs exploitation (\(\varepsilon\)-greedy, softmax, optimistic init).

Model-free updates

Q-learning (off-policy): \(Q(s,a) \leftarrow Q(s,a) + \alpha\big[R + \gamma \max_{a'} Q(s', a') - Q(s,a)\big]\).
SARSA (on-policy): use the action actually taken instead of \(\max_{a'}\).
ADP converges fast but needs a model; Q-learning is slower but model-free.

Supervised + unsupervised learning

Supervised

Classification (discrete \(y\)) vs regression (continuous \(y\)).
Bias-variance: underfit (high bias) ↔ overfit (high variance).
Cross-validation: \(K\)-fold to pick the model complexity that minimizes held-out error.
ERM: minimize empirical loss \(\frac{1}{N}\sum_i \ell(\hat y^{(i)}, y^{(i)})\).

Unsupervised

k-means: alternate assign ↔ recompute centroids; pick \(k\) via elbow / silhouette.
PCA: project onto top eigenvectors of the covariance matrix; maximize variance.
Autoencoders & GANs: learn representations / generators with neural nets.

Decision trees + neural networks

Decision trees

Greedy top-down split with ID3.
Choose split that maximizes information gain \(I = H(Y) - H(Y\mid X)\).
Stop when pure / no features / no examples.
Interpretable, great for small tabular data.

Neural networks

Weighted sums + non-linear activations (sigmoid, ReLU).
Forward pass computes loss; backprop computes gradients via chain rule (\(\delta\) recursion).
Train with SGD + momentum / Adam.
Black box but unmatched for images, audio, text.

Notation cheat sheet

Symbols that recur across the four modules. Watch for the two collisions on the right.

Symbol	Meaning	First seen
\(\gamma\)	Discount factor for future rewards	L13 (MDPs)
\(\pi\), \(\pi^\star\)	Policy / optimal policy	L12–L15
\(V(s),\ Q(s,a)\)	State / state-action value functions	L13–L15
\(H(p),\ IG\)	Entropy / information gain (\(IG = H_{\text{before}} - H_{\text{after}}\))	L6 (entropy) · L18 (IG)
\(\mathcal{H}\)	Hypothesis class (ERM)	L16
\(\Sigma\)	Covariance matrix (PCA)	L17
\(\alpha\)	RL TD step size (L15); momentum coefficient (L21) — different meanings	L15 / L21
\(\varepsilon\)	Exploration probability in \(\varepsilon\)-greedy (L15); learning rate in SGD (L21) — different meanings	L15 / L21

Worth a row on your two-sided study sheet.

Final exam — logistics

What to bring

Non-programmable calculator.
One two-sided A4 study sheet (handwritten or typed).

Format (tentative)

Around 8 problems spanning all four modules.
~84 marks total; 76+ marks count as 100%.
Pass threshold: 42 marks.

Exact details will be confirmed on the course page before exam day.

Thanks for a great term — good luck on the final!