Day 34 - Return to Standford AI/ML Class | The Learning AI and ML Journey for a Developer

(Stanford)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)
Game Playing 1 - Minimax, Alpha-beta Pruning
- Talking about Two-player, Zero-Sum games (Turn-taking games - Chess, Checkers, different players in control, one at a time)
  - Players = {agent, opp} (chess example: white, black)
  - S_start (chess example: board setup)
  - Actions(s) : possible actions from state s (chess example: <haven’t discussed yet>)
  - Suc(s,a): resulting state if choose action a in state s
  - IsEnd(s): whether is an end state (game over)
  - Utility(s): agents’ utility for end state S_end (utility needs to add up to zero)
    - You only get the utility at the end of the game
- Policies
- Deterministic policies or Stochastic Policies (random)
- Expectiminimax recurrence (we add a player => Players = {agent, opp, coin}
- Primitives: max notes, chance nodes, min nodes
- Composition: alternate notes according to model of game
- Value function V…(s): recurrence for expected utility
- Scenarios to think about
  - What if you are playing against multiple opponents?
  - What if you and your partner have to take turns (table tennis)?
  - Some actions allow you to take an extra turn?
- Computation complexity
  - Approach: Tree Search
  - Complexity: branching factor b, dept d (2d – agent plays, then opponent plays)
    - O(d) space (memory), and O(b^2d)
    - For Chess, b ~= 35, d~=50 which is close to the number of atoms in the universe
    - VERY BAD, SO….HOW DO WE SPEED IT UP?? Minimax
- Speeding up Minimax
  - Evaluation Functions - use domain-specific knowledge, compute approximate answer
    - What if we limit the depth? And lean on domain specific knowledge
    - Chess ex: material worth, mobility, king-safety, center-control
  - Alpha-beta pruning - general-purpose, compute exact answer
  - Drop sub-tree nodes when you can see that the nodes at the current level don’t overlap
- Alpha-beta Pruning