Loading ...
Sorry, an error occurred while loading the content.

optimal predictions

Expand Messages
  • Juergen Schmidhuber
    There is an optimal way of predicting the future, given past observations. Normally we do not know the true conditional probability distribution p(next event |
    Message 1 of 1 , Feb 28, 2002
    • 0 Attachment
      There is an optimal way of predicting the future, given past
      observations.

      Normally we do not know the true conditional probability distribution
      p(next event | past). But assume we do know that p is in some set P of
      distributions. Choose a fixed weight w_q for each q in P such that the
      w_q add up to 1 (for simplicity, let P be countable). Then construct
      the Bayesmix M(x) = Sum_q w_q q(x), and predict using M instead of the
      optimal but unknown p.

      How wrong is it to do that? The recent exciting work of Marcus Hutter
      (IDSIA) provides general and sharp (!) loss bounds:

      Let LM(n) and Lp(n) be the total expected losses of the M-predictor and
      the p-predictor, respectively, for the first n events. Then LM(n)-Lp(n)
      is at most of the order of sqrt[Lp(n)]. That is, M is not much worse
      than p. And in general, no other predictor can do better than that!

      In particular, if p is deterministic, then the M-predictor soon won't
      make any errors any more.

      If P contains ALL computable distributions, then M becomes the
      celebrated
      enumerable universal prior. That is, after decades of somewhat
      stagnating
      research we now have sharp loss bounds for Solomonoff's universal (but
      incomputable) induction scheme.

      Similarly, if we replace M by the Speed Prior S - where S(x) is small if
      x is hard to compute by any method - we obtain appropriate loss bounds
      for computable S-based induction.

      Alternatively, reduce M to what you get if you just add up weighted
      estimated future finance data probabilities generated by 1000 commercial
      stock-market prediction software packages. If only one of them happens
      to work fine (but you do not know which) you still should get rich.

      Note that the approach is much more general than what is normally done
      in
      traditional statistical learning theory, where the often quite
      unrealistic
      assumption is that the observations are statistically independent.

      To learn more, please read

      Optimality of Universal Bayesian Sequence Prediction for General Loss
      and Alphabet: ftp://ftp.idsia.ch/pub/techrep/IDSIA-02-02.ps.gz

      and also check out Hutter's other recent papers at ICML, ECML, NIPS,
      Int. J. of Foundations of CS: www.idsia.ch/~marcus


      -------------------------------------------------
      Juergen Schmidhuber director
      IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland
      juergen@... http://www.idsia.ch/~juergen
    Your message has been successfully submitted and would be delivered to recipients shortly.