i`ve read through the german edition of AIMA, so please excuse if i don't always use correct english
Chapter 17 describes how to calculate utilities for a process with non-deterministic, but defined states.
I'm currently working on an agent which can be decribed as follows:
- The agent works in a fully observable environment
- The actions have a probalistic outcome
The agent has to decide in regular timeframes (e.g. one-minute) whether or not to buy a certain good. The good
the agent has to buy is used from a process. The amount of usage is given by a trend-function overlayed with a
normal distributed function (this means the the mean of the usage is always known but uncertain; the
uncertainity is given by the variance of the overlayed normal distributed function).
The price for the good also follows a trendfunction overlayed by a normal dictributed function. If the agent
buys goods which are not used just in time there are costs for the warehouse.
The agent's objective is to assure that the using process always has enough goods to use and to buy the goods
as cheap as possible. If the agent fails to satisfy the process's need there is a high penalty. The agent can
only buy a given amount of the good at a time (the value is given by the current state of the warehouse and is
known by the agent but also uncertain for the future; there is a known maximum of goods the agent can buy at a
My approach is to build an infinite binomial decision tree (actions: buy or not buy). The states are uncertain
and described by the above stated probalistic functions).
I think the utilities for every sequence of actions can be calculated using the bellman functions. However i'm unsure how to use these functions with probalistic outcomes.
Any hints or ideas?