Browse Groups

• Dear soul-mates, I would like to comment some issues of the QLearning algorithm in the book. (1) First of all I think there is a mistake in the pseudo code:
Sep 9, 2006 1 of 1
View Source
Dear soul-mates,

I would like to comment some issues of the QLearning algorithm in the book.

(1)
First of all I think there is a mistake in the pseudo code: For the update of
Q[a,s], r' instead of r should be used. Otherwise the values of the final states
will never be taken into account.

(2)
If my point (1) is correct, then the static variable r is not needed in the
code.

(3)
Nsa should be initialized to 0 for all values.

(4)
max(a') should be randomly chosen if the are more than one maximum (between
them)

(5)
f(q,n) must return a value even when q is null, i.e. when the agent has no idea
of the value of Q(a', s')
In the book it works because f(q,n) returns 2 the first 5 iterations, regardless
of the value of Q(a', s') (explained in the page 774 of the International Edition)

(6)
I have been playing with the QLearning algorithm (after the modification
described in (1)) and the simple MDP world example. I have checked it with two
different parameters sets. The one described in the book:
- reward of non-terminal states: -0.02 [r]
- applies the value 2 [rp] to actions done less than 5 [en] times
- the learning rate [rl] is
60 / (60 - 1) + iteration
i.e. the parameter rl is here 60
- the number of trials is 2000 [tn]
and the values:
lr = 5 // learning rate
en = 100 ; // exploration number
rp = 2 ; // value of unknown states
tn = 300 ; // number of trials
r = -0.05

I have computed the q-values of the four actions in state 3,3 for each
iteration. With the first set (see attached graph1.png for the values of a
representative experiment), in most of the experiments, the values are wrong at
the end.
With the second set, the values are correct for all experiments I have performed
so far (see attached graph2.png for an example)

Regards,
--
Ivan F. Villanueva B.
A.I. library: http://www.artificialidea.com
<<< The European Patent Litigation Agreement (EPLA) >>>
<<< will bring Software patents by the backdoor >>>
<<< http://www.no-lobbyists-as-such.com/florian-mueller-blog/epla/ >>>
<<< http://wiki.ffii.de/EplaEn >>>
Your message has been successfully submitted and would be delivered to recipients shortly.
• Changes have not been saved
Press OK to abandon changes or Cancel to continue editing
• Your browser is not supported
Kindly note that Groups does not support 7.0 or earlier versions of Internet Explorer. We recommend upgrading to the latest Internet Explorer, Google Chrome, or Firefox. If you are using IE 9 or later, make sure you turn off Compatibility View.