Browse Groups

• ## Re: [GP] Benchmark problems for development

(54)
• NextPrevious
• Let me point out how simple parity really is. All you need is a search space of programs that process inputs sequentially and allow for loops or jumps. A very
Message 1 of 54 , Jan 30, 2002
View Source
Let me point out how simple parity really is.

All you need is a search space of programs that process inputs
sequentially and allow for loops or jumps.

A very simple finite state automaton with just one bit of internal state
can correctly classify arbitrary bitstrings by sequentially processing
them one bit at a time, and switching the internal state bit on or off
depending on whether the current input is 1 or 0.

How to learn such behavior from training examples? The problem is so
simple, you can solve it by random guessing. For instance, you can
use a recurrent net whose program is its weight matrix. The weights are
randomly initialized and then tested until the training set error is
zero. After a few thousand trials, the net generalizes perfectly to
n-bit
parity for ARBITRARY n (not just for n<30 where traditional no-loop-GP

S. Hochreiter and J. Schmidhuber. LSTM can solve hard long time lag
problems. In M. C. Mozer, M. I. Jordan, T. Petsche, eds.,
NIPS 9, pages 473-479, MIT Press, Cambridge MA, 1997.

On the other hand, almost any GP variant that allows for loops will
solve n-bit parity very quickly, too. To my knowledge, the first GP
implementation with loops was:

D. Dickmanns, J. Schmidhuber, and A. Winklhofer. Der genetische
Algorithmus: Eine Implementierung in Prolog.
Fortgeschrittenenpraktikum,
Institut fuer Informatik, Lehrstuhl Prof. Radig, Technische Universitaet
Muenchen, 1987.

Much smarter than guessing is Levin's universal search (LS), of course.
Unlike GP, LS an asymptotically optimal program search method for
non-incremental situations. LS allocates time to program candidates in
an optimal fashion. Appropriate LS variants will solve n-bit parity in
almost no time:

L. A. Levin (1973). Universal sequential search problems. Problems of
Information Transmission, 9(3):265-266.

First applications of LS and incremental LS extensions to the problem of
discovering programs with loops are:

J. Schmidhuber. Discovering neural nets with low Kolmogorov complexity
and high generalization capability. Neural Networks, 10(5):857-873,
1997
(also ICML 1995).

J. Schmidhuber, J. Zhao, and M. Wiering. Shifting inductive bias
with success-story algorithm, adaptive Levin search, and incremental
self-improvement. Machine Learning 28:105-130, 1997 (also ICML 1996).

-------------------------------------------------
Juergen Schmidhuber director
IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland
juergen@... http://www.idsia.ch/~juergen

Roland Olsson wrote:
>
> When developing novel GP methods, it is useful to compare new techniques
> with old ones on a suite of benchmark problems that are easily scalable in
> difficulty. The even-parity problem seems to be the most used such
> benchmark, but results obtained only on that problem are not necessarily
> generally valid.
>
> Even parity seems popular because:
>
> 1. It is quite easy to compute fitness.
>
> 2. It is hard for GP without ADFs.
>
> 3. It is easy to scale by just increasing the number of variables.
>
• ... I ve often thought about a mutation (or crossover) operator that adds a term. That is, you choose a random subtree, thus conceptually breaking the
Message 54 of 54 , Feb 25, 2002
View Source
Nic McPhee wrote:
>
> At 4:24 PM -0800 2/7/02, Terence Soule wrote:
> >I agree with Ronald that difficulty of changing nodes near the root is
> >increased by the standard GP operators. Changing a node near the root
> >with either subtree crossover or mutation requires changing a very large
> >percentage of the code and makes it much more likely that the result will
> >be very detrimental.
> >
> >However, even ignoring the operators used, the nodes near the root
> >certainly seem to have a more significant effect on fitness. For example,
> >replacing a + with a *'s at a root node will almost certainly have a huge
> >effect on the tree's output and hence its fitness. Whereas changing a +
> >to a *'s near the leaf nodes should have a much less dramatic effect.
>
> I think this is an important point and is somewhat independent of the
> tree representation. We had a paper at CEC '99 where we proposed
> what we called the AppNode representation,

I've often thought about a mutation (or crossover) operator that "adds a
term." That is, you choose a random subtree, thus conceptually breaking
the individual into f(g()), where g() is the subtree and f() is
everything else. The child would be f(g() + h()) where h() is either a
random subtree or a subtree from another parent. Of course, you could
use any operator in place of +. If one could somehow encourage h() to
be small (for +) or close to 1 (for *), the effect wouldn't be too
large.

Another way to avoid the "change near the root is drastic" is to do some
learning or parameter optimization. We're currently exploring ways to