TD-Update Reward calculation
- In AIMA c1995
Figure 20.6 makes use of REWARD, however I don't see how this table
is calculated or updated when 20.6 is used in 20.2.
Additionally, in 20.6 is the U (utility) of all terminal states
simply their Utility? The Running-Average calculation doesn't look
like it will ever actually change if the reward stays the same, and I
don't really see how the reward can change.
Any clarifications would be appreciated.