Loading ...
Sorry, an error occurred while loading the content.

POLICY-ITERATION (chapter 17)

Expand Messages
  • Ivan F. Villanueva B.
    Hello mates, I ve been testing my java code for the algorithm POLICY-ITERATION (chapter 17). It solves the Simplified Bellman Equations (SBE) for the MDP
    Message 1 of 1 , Aug 13 7:09 AM
    • 0 Attachment
      Hello mates,
      I've been testing my java code for the algorithm POLICY-ITERATION (chapter 17).
      It solves the Simplified Bellman Equations (SBE) for the MDP example of the
      book. However, because the first policy is a random one, there are situations
      where the SBE are not solvable. I wonder if it is a known issue and if there is
      an easy workaround for it. An example follows.

      The discount is 1. The random first policy is:

      {41=NORTH, 32=NORTH, 23=WEST, 21=NORTH, 12=NORTH, 31=SOUTH, 13=WEST, 33=EAST, 11=SOUTH}

      Then the linear equations (Bellman Equations) in Matrix form are:

      4,1 3,2 2,3 2,1 1,2 3,1 1,3 3,3 1,1
      + -------------------------------------------------------------------------------
      |
      4,1 | -9/10 0 0 0 0 1/10 0 0 0 21/25
      2,3 | 0 0 -4/5 0 0 0 4/5 0 0 1/25
      3,2 | 0 -9/10 0 0 0 0 0 4/5 0 7/50
      1,2 | 0 0 0 0 -4/5 0 4/5 0 0 1/25
      2,1 | 0 0 0 -1/5 0 1/10 0 0 1/10 1/25
      1,3 | 0 0 0 0 1/10 0 -1/10 0 0 1/25
      3,1 | 1/10 0 0 1/10 0 -1/5 0 0 0 1/25
      3,3 | 0 1/10 0 0 0 0 0 -9/10 0 -19/25
      1,1 | 0 0 0 1/10 0 0 0 0 -1/10 1/25


      The next code in Maxima (a free mathematical software) shows that the linear
      system is not solvable (because the ranks are different):

      ----------------- Maxima --------------------------

      a:matrix([-9/10,0,0,0,0,1/10,0,0,0], [0,0,-4/5,0,0,0,4/5,0,0], [0,-9/10,0,0,0,0,0,4/5,0], [0,0,0,0,-4/5,0,4/5,0,0], [0,0,0,-1/5,0,1/10,0,0,1/10], [0,0,0,0,1/10,0,-1/10,0,0], [1/10,0,0,1/10,0,-1/5,0,0,0], [0,1/10,0,0,0,0,0,-9/10,0], [0,0,0,1/10,0,0,0,0,-1/10]);

      /* returns 8: */
      rank(a);

      b:[21/25,1/25,7/50,1/25,1/25,1/25,1/25,-19/25,1/25];

      ae:addcol(a,b)$

      /* returns 9: */
      rank(ae);

      -----------------------------------------------------

      --
      Ivan F. Villanueva B.
      A.I. library: http://www.artificialidea.com
      <<< The European Patent Litigation Agreement (EPLA) >>>
      <<< will bring Software patents by the backdoor >>>
      <<< http://www.no-lobbyists-as-such.com/florian-mueller-blog/epla/ >>>
      <<< http://wiki.ffii.de/EplaEn >>>
    Your message has been successfully submitted and would be delivered to recipients shortly.