Loading ...
Sorry, an error occurred while loading the content.

TD Update propograting rewards

Expand Messages
  • flamesplash <flamesplash@yahoo.com>
    In fig 20.6 I don t see how in a case like the game of tic tac toe where the rewards are only ever gained at the terminal state how the rewards are ever
    Message 1 of 1 , Jan 10, 2003
    • 0 Attachment
      In fig 20.6 I don't see how in a case like the game of tic tac toe
      where the rewards are only ever gained at the terminal state how the
      rewards are ever propogated from the terminal nodes.

      In the TD-Update when you hit a terminal all you do is update the
      running average, I don't ever see a case where the terminal state and
      it's sucessor are used together to modify the sucessors utility, so I
      don't see how any of the utilities actually get modified.

      Any clarification would be appreciated.
    Your message has been successfully submitted and would be delivered to recipients shortly.