Loading ...
Sorry, an error occurred while loading the content.

Re: [GP] Benchmark problems for development

Expand Messages
  • Julio R. Banga
    I fully agree that toy problems are useful, and they are routinely used in most domains, including my own (process engineering). But an undesirable
    Message 1 of 54 , Feb 1, 2002
    • 0 Attachment
      I fully agree that 'toy' problems are useful, and they are
      routinely used in most domains, including my own (process
      engineering). But an undesirable side-effect of using 'toy'
      problems to illustrate the performance of a given algorithm
      is that, most often, the scalability of the method is not discussed
      or explored. So there are many methods out there which are
      really neat solving toy problems, but which can not solve
      larger scale (more realistic) problems. For example, the
      computational effort of many EC methods grows exponentially
      with problem size, so if you want to apply them to, say, a
      realistic control design problem, you would need huge
      computational resources.

      I wonder how GP scales with problem size... I am not very
      familiar with the state-of-the-art in GP, so I would appreciate
      pointers to recent papers discussing its scalability.

      Julio.

      -------------------------------------------------------------
      Dr. Julio R. Banga E-mail: julio@...
      Process Eng. Group http://www.iim.csic.es/~julio/
      Inst. Invest. Marinas http://www.iim.csic.es/~gingproc/
      (C.S.I.C.)
      C/Eduardo Cabello 6 ph: +34 986 214 473 (direct)
      36208 Vigo (SPAIN) fax: +34 986 292 762
      ---------------------------------------------------------------




      At 11:16 PM 1/31/02 -0000, Douglas Kell wrote:
      >Apropos 'toy' problems, in Biology we often - indeed mainly - do
      >experiments using 'model' organisms like yeast, a bacterium called
      >E. coli, a plant called Arabidopsis, the fruit fly Drosophila and so
      >on. Why?
      >
      >Because they are easy to manipulate and understand. If you can't
      >get it to work in E coli you won't understand it, and without the
      >understanding you have no chance of getting it work in rabbits (or
      >whatever), everyone can reproduce the data (quick and easy to do,
      >no need for expensive animal houses that few have - or want -
      >access to, can't do expts in people, etc etc). Analogically, toy
      >problems have (or should have) the same role in the computational
      >domain, but are rightly seen as pointers to useful strategies and
      >ideas, not synonyms of reality outside their domain of discovery.
      >
      >Apropos 'better' methods, I am not sure of the status of No Free
      >Lunch in GP, but we can assume it is not dissimilar to that in EC
      >generally.
      >
      >Apropos methods that work, I replied to a post on the SVM
      >newsgroup group that had asked 'why is SVM so efficient?' by
      >asking whether anyone had actually COMPARED it with anything.
      >There were no replies at all....
      >
      >Here the standard or concept of 'human-competitive' is well
      >explained in John Koza's monographs, mentioned earlier, where
      >what is meant by 'compared to' is underpinned by a position
      >statement that makes sense.
      >
      >That said, benchmark problems are of enormous value at all levels
      >and we should be having them. If whizzy new method or algorithm
      >X can't beat the best known method on any of them it is not likely
      >to be much good; if it does really well, and gives taut,
      >comprehensible and robust solutions/explanations on lots of them
      >then...well done.
      >
      >Douglas.
      >
      >********************************************************************
      >*(Prof.) Douglas B. Kell Phone: +44 1970 622334 *
      >*Director of Research *

      >*Institute of Biological Sciences Fax: +44 1970 622354 *
      >*Cledwyn Building *
      >*University of Wales, Email: dbk@... *
      >*Aberystwyth SY23 3DD, U.K. http://qbab.aber.ac.uk/ *
      >* http://www.abergc.com*
      >* Have a look at the 2002 meeting of the Society for Biomolecular *
      >* Screening http://www.sbsonline.org *
      >********************************************************************
      >
      >
      >To unsubscribe from this group, send an email to:
      >genetic_programming-unsubscribe@yahoogroups.com
      >
      >
      >
      >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
      >
      >
      >
      >
      >
    • Martin C. Martin
      ... I ve often thought about a mutation (or crossover) operator that adds a term. That is, you choose a random subtree, thus conceptually breaking the
      Message 54 of 54 , Feb 25, 2002
      • 0 Attachment
        Nic McPhee wrote:
        >
        > At 4:24 PM -0800 2/7/02, Terence Soule wrote:
        > >I agree with Ronald that difficulty of changing nodes near the root is
        > >increased by the standard GP operators. Changing a node near the root
        > >with either subtree crossover or mutation requires changing a very large
        > >percentage of the code and makes it much more likely that the result will
        > >be very detrimental.
        > >
        > >However, even ignoring the operators used, the nodes near the root
        > >certainly seem to have a more significant effect on fitness. For example,
        > >replacing a + with a *'s at a root node will almost certainly have a huge
        > >effect on the tree's output and hence its fitness. Whereas changing a +
        > >to a *'s near the leaf nodes should have a much less dramatic effect.
        >
        > I think this is an important point and is somewhat independent of the
        > tree representation. We had a paper at CEC '99 where we proposed
        > what we called the AppNode representation,

        I've often thought about a mutation (or crossover) operator that "adds a
        term." That is, you choose a random subtree, thus conceptually breaking
        the individual into f(g()), where g() is the subtree and f() is
        everything else. The child would be f(g() + h()) where h() is either a
        random subtree or a subtree from another parent. Of course, you could
        use any operator in place of +. If one could somehow encourage h() to
        be small (for +) or close to 1 (for *), the effect wouldn't be too
        large.

        Another way to avoid the "change near the root is drastic" is to do some
        learning or parameter optimization. We're currently exploring ways to
        use gradient information to adjust the ephemeral random constants. The
        hope is that the optimization can "balance" things, e.g. adjusting the
        magnitudes of the two trees so they total to about the same.

        - Martin
      Your message has been successfully submitted and would be delivered to recipients shortly.