964Q-learning update equation
- Jul 31, 2014Hello folks -
In the AIMA chapter on reinforcement learning in the 3rd edition, the
update equation for Q-learning (equation 21.8) is given as:
Q(s,a) <- Q(s,a) + alpha * (R(s) + [more terms irrelevant to my
My question is regarding the use of R(s) in the above update, as opposed
to R(s'), where s' is the state that the agent ends up in after taking
action a. Should it not be the case that the reward used to update the
Q-value is the reward associated with the action that was taken? Is this
a typographical error in the equation, where R(s) should really read R(s')?
Thanks for the info; if it is correct as written, I'd love some help in
understanding why that's the case.
Professor of Computer Science