Loading ...
Sorry, an error occurred while loading the content.

alpha-beta code in chapter 5, p132

Expand Messages
  • Wheeler Ruml
    (First off, I d like to say that I love this textbook and use it all the time, for teaching and to get up to research on topics outside my specialty. Hurrah
    Message 1 of 2 , Oct 4, 2001
    View Source
    • 0 Attachment
      (First off, I'd like to say that I love this textbook and use it all
      the time, for teaching and to get up to research on topics outside my
      specialty. Hurrah for Stuart and Peter!)

      I'm not sure that this counts as a bug, but I think that it's
      confusing that the code in figure 5.8 doesn't actually pass up minimax
      values. In some cases, alpha and beta values are passed back up the
      tree, creating some confusion. Take the following example:

      n1
      / \
      / \
      n2 n3
      | / \
      0 / \
      n4 ...
      /
      /
      n5
      |
      -3

      Node n1 is a max node. We init alpha to -inf and beta to inf. At
      node n2, this game reaches a terminal state with evaluation = 0. This
      value is passed up to n1, where it is used to update alpha (n1 being a
      max node) to 0. A call on n3 results in a call on n4 results in a
      call on n5. This is a terminal state, so its value of -3 is passed
      back to n4. n4 is a max node which had inherited an alpha of 0 and a
      beta of inf. According to the code in the book, the alpha value at n4
      is updated to the max of the current value 0 and the child value -3.
      Alpha remains 0. Furthermore, alpha is passed back on exit from n4.
      At n3 (which had been called with alpha = 0 and beta = inf), we are
      updating beta to the min of its current value inf and its child's
      value, which is 0. So beta becomes 0, which is = to alpha, so we
      prune away any other children on n3 and return 0 to the call from n1.
      Note that both n2 and n3 now have value 0 from the perspective of n1!
      I consider this a bug, since n3 should have a minimax value of -3 or
      less. We pruned correctly, but the wrong value was passed back. The
      strategy taken by the code in the book only works if we break ties at
      the top level by preferring the leftmost action involved in the tie.
      The solution is to be careful to always pass back the minimax value,
      and not alpha or beta. See the revised pseudo-code below:


      Max-value (state, alpha, beta):
      when depth-cutoff (state), return SEF(state)
      value <-- -inf
      for each child of state
      value <-- max(value, Min-value (child, alpha, beta)
      when value > or = beta, return value
      return value


      Min-value (state, alpha, beta):
      when depth-cutoff (state), return SEF(state)
      value <-- inf
      for each child of state
      value <-- min(value, Max-value (child, alpha, beta)
      when value < or = alpha, return value
      return value


      The code in the book seems to be sticking close to the Knuth and Moore
      1975 AIJ paper, computing the tree in K&M's fig 3. I haven't used a
      fine-toothed comb, but I don't see K&M mentioning the tie issue
      either. It only comes up when one programs an actual player, but I
      would like to suggest that the 2nd edition make this a little more
      explicit, if not adopt the revised pseudo-code above.

      (Credit goes to Courtney O'Keefe and Jeff Shneidman who brought this
      case to my attention.)

      Thanks again for a wonderful book!


      Wheeler Ruml
      Instructor for CS 182, Harvard University
      --
      Wheeler Ruml, http://www.eecs.harvard.edu/~ruml/, 617-495-2081 voice
      ruml@..., Maxwell Dworkin Lab 219 617-496-1066 fax
    • Wheeler Ruml
      Whoops - the pseudo-code I posted didn t update alpha or beta at all! Sorry about that. Revised: Max-value (state, alpha, beta): when depth-cutoff (state),
      Message 2 of 2 , Oct 5, 2001
      View Source
      • 0 Attachment
        Whoops - the pseudo-code I posted didn't update alpha or beta at
        all! Sorry about that. Revised:


        Max-value (state, alpha, beta):
        when depth-cutoff (state), return SEF(state)
        value <-- -inf
        for each child of state
        value <-- max(value, Min-value (child, alpha, beta)
        alpha <-- max(value,alpha) ;; this is the new line!
        when value > or = beta, return value
        return value


        Min-value (state, alpha, beta):
        when depth-cutoff (state), return SEF(state)
        value <-- inf
        for each child of state
        value <-- min(value, Max-value (child, alpha, beta)
        beta <-- min(value, beta) ;; this is the new line!
        when value < or = alpha, return value
        return value


        Even though this approach prunes a little less than the full
        alpha-beta code given in the book, it is much easier to explain
        (which is why it seems to be the algorithm the book is currently
        trying to explain).


        Wheeler
      Your message has been successfully submitted and would be delivered to recipients shortly.