Passive ADP Agent
I happen to have a two questions about the Passive ADP Agent (21.2.2 in the 3rd edition). I wonder if somebody could be so kind to answer them:
1. Why the agent updates utility of its policy every action (s(a)-s')? This seems unnecessary.
2. Why this algorithm has "dynamic programming" in its name? It is a simple counting of transition statistics + policy evaluation which is made by solving linear algebra or iteration. So where is the dynamic programming here? Am I missing something?