## optimal predictions

Expand Messages
• There is an optimal way of predicting the future, given past observations. Normally we do not know the true conditional probability distribution p(next event |
Message 1 of 1 , Feb 28, 2002
• 0 Attachment
There is an optimal way of predicting the future, given past
observations.

Normally we do not know the true conditional probability distribution
p(next event | past). But assume we do know that p is in some set P of
distributions. Choose a fixed weight w_q for each q in P such that the
w_q add up to 1 (for simplicity, let P be countable). Then construct
the Bayesmix M(x) = Sum_q w_q q(x), and predict using M instead of the
optimal but unknown p.

How wrong is it to do that? The recent exciting work of Marcus Hutter
(IDSIA) provides general and sharp (!) loss bounds:

Let LM(n) and Lp(n) be the total expected losses of the M-predictor and
the p-predictor, respectively, for the first n events. Then LM(n)-Lp(n)
is at most of the order of sqrt[Lp(n)]. That is, M is not much worse
than p. And in general, no other predictor can do better than that!

In particular, if p is deterministic, then the M-predictor soon won't
make any errors any more.

If P contains ALL computable distributions, then M becomes the
celebrated
enumerable universal prior. That is, after decades of somewhat
stagnating
research we now have sharp loss bounds for Solomonoff's universal (but
incomputable) induction scheme.

Similarly, if we replace M by the Speed Prior S - where S(x) is small if
x is hard to compute by any method - we obtain appropriate loss bounds
for computable S-based induction.

Alternatively, reduce M to what you get if you just add up weighted
estimated future finance data probabilities generated by 1000 commercial
stock-market prediction software packages. If only one of them happens
to work fine (but you do not know which) you still should get rich.

Note that the approach is much more general than what is normally done
in
traditional statistical learning theory, where the often quite
unrealistic
assumption is that the observations are statistically independent.