TD Update propograting rewards
- In fig 20.6 I don't see how in a case like the game of tic tac toe
where the rewards are only ever gained at the terminal state how the
rewards are ever propogated from the terminal nodes.
In the TD-Update when you hit a terminal all you do is update the
running average, I don't ever see a case where the terminal state and
it's sucessor are used together to modify the sucessors utility, so I
don't see how any of the utilities actually get modified.
Any clarification would be appreciated.