Course page
CS 486/686
Course Recap

Lecture 22

Putting the term together: search, uncertainty, decisions, learning

The course in one picture

1. Search Problem formulation DFS / BFS / IDS LCFS, GBFS, A* Admissibility, consistency CSPs Backtracking, AC-3 Local search Hill-climbing, SA 2. Uncertainty Probability rules Independence Bayesian networks D-separation Variable Elimination HMMs Filtering / smoothing Viterbi 3. Decisions Utility, MEU Decision networks VOI MDPs Bellman equation Value / Policy iteration Reinforcement learning Q-learning, SARSA 4. Learning Supervised learning Bias / variance Cross-validation Decision trees, ID3 Unsupervised: PCA, k-means Neural networks Backprop SGD, momentum, Adam deterministic world → uncertainty → decisions under uncertainty → learning from data

Search I — uninformed

Search II — heuristic + A*

Search strategy summary

Strategy Frontier choice Complete? Space Time
Depth-firstLast addedNoLinearExp
Breadth-firstFirst addedYesExpExp
Lowest-cost firstmin \(\text{cost}(n)\)YesExpExp
Greedy best-firstmin \(h(n)\)NoExpExp
A*min \(\text{cost}(n) + h(n)\)YesExpExp

A* dominates when the heuristic is informative enough to prune most of the tree.

CSPs + local search

Probability + independence

The four rules

  • Product: \(P(A,B) = P(A\mid B)\,P(B)\)
  • Sum: \(P(A) = \sum_b P(A, B{=}b)\)
  • Chain: \(P(X_1,\ldots,X_n) = \prod_i P(X_i \mid X_{<i})\)
  • Bayes: \(P(H\mid e) \propto P(e\mid H)\,P(H)\)

Independence

  • Unconditional: \(P(A,B) = P(A)\,P(B)\).
  • Conditional: \(P(A,B\mid C) = P(A\mid C)\,P(B\mid C)\).
  • Tells us which terms drop out of the chain rule — the key to scaling up inference.

Bayesian networks + d-separation

Construction recipe

  • Pick a variable order (causal often works best).
  • Add each node; choose the smallest set of parents that makes it conditionally independent of the rest.
  • Joint factorizes as \(P(X_1,\ldots,X_n) = \prod_i P(X_i \mid \mathrm{pa}(X_i))\).

3 d-separation patterns

  • Chain \(A \to B \to C\): blocked by observing \(B\).
  • Common cause \(A \leftarrow B \to C\): blocked by observing \(B\).
  • Common effect \(A \to B \leftarrow C\): blocked by not observing \(B\) or descendants.
A path is blocked ⇒ the endpoints are conditionally independent.

Inference — VE & HMM filtering

Variable elimination

  • Define a factor for each CPT.
  • Restrict factors to evidence.
  • Multiply factors sharing a variable; sum out the hidden one.
  • Normalize to get a posterior.

HMM filtering

  • Recursion: \(P(S_t \mid o_{0:t}) \propto P(o_t \mid S_t)\, \sum_{s_{t-1}} P(S_t \mid s_{t-1})\, P(s_{t-1} \mid o_{0:t-1})\)
  • One forward pass: \(O(t \cdot |S|^2)\) time.

HMM decoding — smoothing & Viterbi

Forward×backward smoothing

  • Forward: \(\alpha_k = P(S_k \mid o_{0:k})\).
  • Backward: \(\beta_k = P(o_{k+1:t-1} \mid S_k)\).
  • Posterior: \(P(S_k \mid o_{0:t-1}) \propto \alpha_k\, \beta_k\).

Viterbi

  • DP over the trellis, max instead of sum.
  • Backtrack the argmax pointers to recover the MAP sequence.
MAP path through the trellis s\(_1\) s\(_2\) s\(_1\) s\(_2\) s\(_1\) s\(_2\) s\(_1\) s\(_2\) t=1 t=2 t=3 t=4

Decision networks

Three node types

  • Chance nodes — like a Bayes net (CPTs).
  • Decision nodes — rectangles, no CPT, chosen by us.
  • Utility node — diamond, deterministic function of its parents.

Maximum expected utility

  • Policy \(\pi(d \mid \text{parents})\); MEU picks \(\pi^*\) maximizing \(\mathbb{E}[U]\).
  • Evaluate by enumeration or by running VE on the DN.
  • Value of information: EU gain from observing a variable before deciding.

Markov decision processes

Setup

  • States, actions, transition \(P(s' \mid s, a)\), reward \(R(s)\), discount \(\gamma\).
  • Policy \(\pi : S \to A\); value \(V^\pi(s) = \mathbb{E}[\sum_t \gamma^t R(s_t) \mid \pi]\).

Bellman + the two algorithms

  • Bellman optimality: \(V^*(s) = R(s) + \gamma \max_a \sum_{s'} P(s' \mid s, a)\, V^*(s')\).
  • Value iteration: apply the Bellman update until convergence.
  • Policy iteration: evaluate (linear system) ↔ improve (greedy) — converges fast.

Reinforcement learning

Two learning regimes

  • Passive ADP: follow a fixed policy, estimate \(\hat P, \hat R\), solve via PE.
  • Active ADP: also pick actions — balance exploration vs exploitation (\(\varepsilon\)-greedy, softmax, optimistic init).

Model-free updates

  • Q-learning (off-policy): \(Q(s,a) \leftarrow Q(s,a) + \alpha\big[R + \gamma \max_{a'} Q(s', a') - Q(s,a)\big]\).
  • SARSA (on-policy): use the action actually taken instead of \(\max_{a'}\).
  • ADP converges fast but needs a model; Q-learning is slower but model-free.

Supervised + unsupervised learning

Supervised

  • Classification (discrete \(y\)) vs regression (continuous \(y\)).
  • Bias-variance: underfit (high bias) ↔ overfit (high variance).
  • Cross-validation: \(K\)-fold to pick the model complexity that minimizes held-out error.
  • ERM: minimize empirical loss \(\frac{1}{N}\sum_i \ell(\hat y^{(i)}, y^{(i)})\).

Unsupervised

  • k-means: alternate assign ↔ recompute centroids; pick \(k\) via elbow / silhouette.
  • PCA: project onto top eigenvectors of the covariance matrix; maximize variance.
  • Autoencoders & GANs: learn representations / generators with neural nets.

Decision trees + neural networks

Decision trees

  • Greedy top-down split with ID3.
  • Choose split that maximizes information gain \(I = H(Y) - H(Y\mid X)\).
  • Stop when pure / no features / no examples.
  • Interpretable, great for small tabular data.

Neural networks

  • Weighted sums + non-linear activations (sigmoid, ReLU).
  • Forward pass computes loss; backprop computes gradients via chain rule (\(\delta\) recursion).
  • Train with SGD + momentum / Adam.
  • Black box but unmatched for images, audio, text.

Notation cheat sheet

Symbols that recur across the four modules. Watch for the two collisions on the right.

SymbolMeaningFirst seen
\(\gamma\)Discount factor for future rewardsL13 (MDPs)
\(\pi\), \(\pi^\star\)Policy / optimal policyL12–L15
\(V(s),\ Q(s,a)\)State / state-action value functionsL13–L15
\(H(p),\ IG\)Entropy / information gain (\(IG = H_{\text{before}} - H_{\text{after}}\))L6 (entropy) · L18 (IG)
\(\mathcal{H}\)Hypothesis class (ERM)L16
\(\Sigma\)Covariance matrix (PCA)L17
\(\alpha\)RL TD step size (L15); momentum coefficient (L21) — different meaningsL15 / L21
\(\varepsilon\)Exploration probability in \(\varepsilon\)-greedy (L15); learning rate in SGD (L21) — different meaningsL15 / L21

Worth a row on your two-sided study sheet.

Final exam — logistics

What to bring

  • Non-programmable calculator.
  • One two-sided A4 study sheet (handwritten or typed).

Format (tentative)

  • Around 8 problems spanning all four modules.
  • ~84 marks total; 76+ marks count as 100%.
  • Pass threshold: 42 marks.
Exact details will be confirmed on the course page before exam day.

Thanks for a great term — good luck on the final!