964Q-learning update equation

  • Dave Musicant
    Jul 31, 2014
      Hello folks -

      In the AIMA chapter on reinforcement learning in the 3rd edition, the
      update equation for Q-learning (equation 21.8) is given as:

      Q(s,a) <- Q(s,a) + alpha * (R(s) + [more terms irrelevant to my
      question] ...)

      My question is regarding the use of R(s) in the above update, as opposed
      to R(s'), where s' is the state that the agent ends up in after taking
      action a. Should it not be the case that the reward used to update the
      Q-value is the reward associated with the action that was taken? Is this
      a typographical error in the equation, where R(s) should really read R(s')?

      Thanks for the info; if it is correct as written, I'd love some help in
      understanding why that's the case.

      Dave Musicant
      Professor of Computer Science
      Carleton College