(Stanford)

  • Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)
  • Game Playing 1 - Minimax, Alpha-beta Pruning
    • Talking about Two-player, Zero-Sum games (Turn-taking games - Chess, Checkers, different players in control, one at a time)
      • Players = {agent, opp} (chess example: white, black)
      • Sstart (chess example: board setup)
      • Actions(s) : possible actions from state s (chess example: <haven’t discussed yet>)
      • Suc(s,a): resulting state if choose action a in state s
      • IsEnd(s): whether is an end state (game over)
      • Utility(s): agents’ utility for end state Send (utility needs to add up to zero)
        • You only get the utility at the end of the game
    • Policies
    • Deterministic policies or Stochastic Policies (random)
    • Expectiminimax recurrence (we add a player => Players = {agent, opp, coin}
    • Primitives: max notes, chance nodes, min nodes
    • Composition: alternate notes according to model of game
    • Value function V…(s): recurrence for expected utility
    • Scenarios to think about
      • What if you are playing against multiple opponents?
      • What if you and your partner have to take turns (table tennis)?
      • Some actions allow you to take an extra turn?
    • Computation complexity
      • Approach: Tree Search
      • Complexity: branching factor b, dept d (2d – agent plays, then opponent plays)
        • O(d) space (memory), and O(b2d)
        • For Chess, b ~= 35, d~=50 which is close to the number of atoms in the universe
        • VERY BAD, SO….HOW DO WE SPEED IT UP?? Minimax
    • Speeding up Minimax
      • Evaluation Functions - use domain-specific knowledge, compute approximate answer
        • What if we limit the depth? And lean on domain specific knowledge
        • Chess ex: material worth, mobility, king-safety, center-control
      • Alpha-beta pruning - general-purpose, compute exact answer
      • Drop sub-tree nodes when you can see that the nodes at the current level don’t overlap
    • Alpha-beta Pruning