I'm trying to implement the Learning Run-Time A* Agent on page 128 of
the international edition.
Trying to follow the iteration in the figure 4.22, I think there is a
mistake because in line (b), when the agent has move from the first
state to the second, the updated value of the first state should be 2
and not 3. The reason is that from that state there are two possible
moves, left and right, and the agent knows the outcome of the move to
the right but not the outcome to the left, thus, its updated value
should be (according to the LRTA*-Cost function) its heuristic, and not
its heuristic + 1, as in the book is printed.
Furthermore, if I'm right, the algorithm won't work because it
could just loop between the two states with the heuristic value 2.
I think one possible solution is to change the line:
a <-- an action b in ACTIONS(ss) ...
a <-- a not yet tried action, or, if all possible actions have been
tried before, an action b in ACTIONS(ss) ...
Am I right ?