Loading ...
Sorry, an error occurred while loading the content.

Beowulf Clusters

Expand Messages
  • Mattias Fagerlund
    Hi! I ve been having a bit of a headache with my beowulf cluster code - we currently have several machines of different speeds, so while one machine might
    Message 1 of 19 , Oct 1, 2001
    • 0 Attachment
      Hi!

      I've been having a bit of a headache with my beowulf cluster code - we
      currently have several machines of different speeds, so while one machine
      might perform 5 generations, another might only perform 3 generations.

      Would you generally let the fast machine and the slower machine exchange
      genetic material? And if one machine restarted it's evolution, started
      from zero, if it still allowed others to supply genetic material, it'd be
      like it never restarted within only a handful of generations.

      I'm thinking, if they're within 10 generations of each other, I'll allow
      the exchange, but that's such a neat solution. Another solution might be
      to make the population on the faster machine larger, and on the slower
      machine make it smaller, so that they keep about the same pace...

      Any ideas?

      thanks,
      mattias fagerlund
    • Martin C. Martin
      Our plan is to have one computer hold the population, and farm out evaluations to all computers. So, the faster computers will simply get a higher percentage
      Message 2 of 19 , Oct 1, 2001
      • 0 Attachment
        Our plan is to have one computer hold the population, and farm out
        evaluations to all computers. So, the faster computers will simply get
        a higher percentage of the population to evaluate. This has a lot of
        nice properties, including that we can repeat runs by simply using the
        same seed, assuming evaluation is deterministic.

        I think the "only exchange info if within 10 generations" thing is
        particularly dangerous; in one run, populations may be completely
        separate from almost the start, whereas in another, they'll be sharing
        info most of the way through. That's a very strange dynamic.

        - Martin

        Mattias Fagerlund wrote:
        >
        > Hi!
        >
        > I've been having a bit of a headache with my beowulf cluster code - we
        > currently have several machines of different speeds, so while one machine
        > might perform 5 generations, another might only perform 3 generations.
        >
        > Would you generally let the fast machine and the slower machine exchange
        > genetic material? And if one machine restarted it's evolution, started
        > from zero, if it still allowed others to supply genetic material, it'd be
        > like it never restarted within only a handful of generations.
        >
        > I'm thinking, if they're within 10 generations of each other, I'll allow
        > the exchange, but that's such a neat solution. Another solution might be
        > to make the population on the faster machine larger, and on the slower
        > machine make it smaller, so that they keep about the same pace...
        >
        > Any ideas?
        >
        > thanks,
        > mattias fagerlund
        >
        >
        > To unsubscribe from this group, send an email to:
        > genetic_programming-unsubscribe@yahoogroups.com
        >
        >
        >
        > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
      • Jules Gilbert
        Don t forget: you are allowing your happen-stance machine architecture to dictate the direction of your research, albeit machanical research. My first point
        Message 3 of 19 , Oct 1, 2001
        • 0 Attachment
          Don't forget:  you are allowing your happen-stance machine architecture to dictate the direction of your research, albeit machanical research.

          My first point is simple:  Are you sure that you want to do that?   ( More strongly, this is not a good idea!)

          Second, when one CPU finishes before another, use the spare CPU to accomplish another job -- perhaps processing another instance of genetic change.

          Hope this helps.

          --jg
           
           
           

          Mattias Fagerlund wrote:

          Hi!

          I've been having a bit of a headache with my beowulf cluster code - we
          currently have several machines of different speeds, so while one machine
          might perform 5 generations, another might only perform 3 generations.

          Would you generally let the fast machine and the slower machine exchange
          genetic material? And if one machine restarted it's evolution, started
          from zero, if it still allowed others to supply genetic material, it'd be
          like it never restarted within only a handful of generations.

          I'm thinking, if they're within 10 generations of each other, I'll allow
          the exchange, but that's such a neat solution. Another solution might be
          to make the population on the faster machine larger, and on the slower
          machine make it smaller, so that they keep about the same pace...

          Any ideas?

          thanks,
          mattias fagerlund
           


          To unsubscribe from this group, send an email to:
          genetic_programming-unsubscribe@yahoogroups.com
           
           

          Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.

        • Martin Pelikan
          ... We did some work on parallel machines, and the truth is that the different populations can start being apart even by 10 generations or more, even if the
          Message 4 of 19 , Oct 1, 2001
          • 0 Attachment
            Lee Spector wrote:
            >
            > Mattias,
            >
            > All of the CPUs in our cluster are the same speed so I haven't had to deal
            > with this, but your idea of scaling population sizes with clock speed seems
            > like a good approach to me. Restarting a node and then importing
            > individuals from late in other runs seems like a bad idea (as you seem to
            > think as well).
            >
            > -Lee
            >

            We did some work on parallel machines, and the truth is that the different
            populations can start being apart even by 10 generations or more, even if the
            processors are of the same speed. Things like caching, multitasking, etc may
            affect the speed significantly. I do not see this as a big problem though.

            But let's say we want synchronize the processing of the sub-populations so that
            they are processed at the same speed, and we have very different processors. One
            of the things to consider then might be to use some of the programming models
            that allow for *load balancing*. One of these is Charm++ which allows you to
            create more sub-populations than you have processors (say 4 per processor),
            monitors the load on different processors, idle time, and other performance, and
            reschedules the sub-population processes as required. The best processor could
            therefore process 8 sub-populations while the slowest could end up with 1 only.
            The communication between the processes can be done by sending messages and the
            system would handle the rest. More on charm++ can be found at
            http://charm.cs.uiuc.edu

            I know there exist some MPI versions that also allow load balancing to take
            place, I don't really remember any at the moment.

            Anyway, I think the best thing to do would be use one of these languages, and
            take the advantage of their being able to solve the discussed problem without
            additional heuristics.

            all the best,

            Martin
            --
            ----------------------------------------------
            Martin Pelikan
            Illinois Genetic Algorithms Laboratory
            University of Illinois at Urbana Champaign
            117 Transportation Building
            104 S. Mathews Avenue, Urbana, IL 61801
            tel: (217) 333-2346, fax: (217) 244-5705
            ----------------------------------------------
          • Terrance Soule
            Hi, So far I ve taken a very simple, albetit not very interesting, approach here at the U of I. For any experiment I run I want at least 20 (usually more like
            Message 5 of 19 , Oct 1, 2001
            • 0 Attachment
              Hi,

              So far I've taken a very simple, albetit not very interesting, approach
              here at the U of I. For any experiment I run I want at least 20 (usually
              more like 50 or 100) replications to make sure the results are
              statisitically meaningful. I have the master node farm one replication to
              each avaiable node. As a node finishes one replication it passes the
              results back to the master node and begins a new trial.

              There are some drawbacks to this approach. If the number of replications
              is fewer than the number of nodes, some nodes are idle. However, if the
              number of replications is greater than the number of nodes, nodes which
              finish first are given another replication to run. So load balancing is
              reasonable for problems with many replications and few nodes.

              Of course, this approach doesn't support island models in which some
              genetic material is supposed to be exchanged between populations.

              However, the big advantage to this approach is that its very simple to
              program and there is very little communication overhead. (the only
              communication is at the end of a trial when data from a replication is
              collected by the master node.) Interprocess communication can be a serious
              problem if significant portions of a popualtion are being exchanged every
              generation.

              Hope this helps,
              Terry Soule
              University of Idaho
              64 nodes and growing

              On Mon, 1 Oct 2001, Mattias Fagerlund wrote:

              > Hi!
              >
              > I've been having a bit of a headache with my beowulf cluster code - we
              > currently have several machines of different speeds, so while one machine
              > might perform 5 generations, another might only perform 3 generations.
              >
              > Would you generally let the fast machine and the slower machine exchange
              > genetic material? And if one machine restarted it's evolution, started
              > from zero, if it still allowed others to supply genetic material, it'd be
              > like it never restarted within only a handful of generations.
              >
              > I'm thinking, if they're within 10 generations of each other, I'll allow
              > the exchange, but that's such a neat solution. Another solution might be
              > to make the population on the faster machine larger, and on the slower
              > machine make it smaller, so that they keep about the same pace...
              >
              > Any ideas?
              >
              > thanks,
              > mattias fagerlund
              >
              >
              >
              > To unsubscribe from this group, send an email to:
              > genetic_programming-unsubscribe@yahoogroups.com
              >
              >
              >
              > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
              >
              >
              >
            • Lee Spector
              Mattias, All of the CPUs in our cluster are the same speed so I haven t had to deal with this, but your idea of scaling population sizes with clock speed seems
              Message 6 of 19 , Oct 1, 2001
              • 0 Attachment
                Mattias,

                All of the CPUs in our cluster are the same speed so I haven't had to deal
                with this, but your idea of scaling population sizes with clock speed seems
                like a good approach to me. Restarting a node and then importing
                individuals from late in other runs seems like a bad idea (as you seem to
                think as well).

                -Lee


                >Hi!
                >
                >I've been having a bit of a headache with my beowulf cluster code - we
                >currently have several machines of different speeds, so while one machine
                >might perform 5 generations, another might only perform 3 generations.
                >
                >Would you generally let the fast machine and the slower machine exchange
                >genetic material? And if one machine restarted it's evolution, started
                >from zero, if it still allowed others to supply genetic material, it'd be
                >like it never restarted within only a handful of generations.
                >
                >I'm thinking, if they're within 10 generations of each other, I'll allow
                >the exchange, but that's such a neat solution. Another solution might be
                >to make the population on the faster machine larger, and on the slower
                >machine make it smaller, so that they keep about the same pace...
                >
                >Any ideas?
                >
                >thanks,
                >mattias fagerlund
                >
                >
                >
                >To unsubscribe from this group, send an email to:
                >genetic_programming-unsubscribe@yahoogroups.com
                >
                >
                >
                >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

                --
                Lee Spector
                Associate Professor of Computer Science lspector@...
                Cognitive Science, Hampshire College http://hampshire.edu/lspector/
                Amherst, MA 01002 413-559-5352, Fax: 413-559-5438
              • Douglas Kell
                Yes, the most straightforward solution, which seems to me rather convenient, is simply to scale the population size that is dealt with by a given machine
                Message 7 of 19 , Oct 1, 2001
                • 0 Attachment
                  Yes, the most straightforward solution, which seems to me rather
                  convenient, is simply to scale the population size that is dealt with
                  by a given machine (whether explicitly in demes or effectively as a
                  single metapopulation) according to the machine's effective speed.
                  This way each 'generation' takes the same time on machines of
                  very different spec. Might have some interesting properties too, and
                  there are some especially interesting and (so far as I know in
                  parallel machines) relatively uncharted ideas in which you might
                  also make the mutation/recomination rate on each machine a
                  function of the size of each population, as a way of tipping the
                  exploration/exploitation ratio in your favour.

                  Interesting thread.
                  Douglas.

                  On 1 Oct 2001, at 14:56, Lee Spector wrote:

                  >
                  > Mattias,
                  >
                  > All of the CPUs in our cluster are the same speed so I haven't had to deal
                  > with this, but your idea of scaling population sizes with clock speed seems
                  > like a good approach to me. Restarting a node and then importing
                  > individuals from late in other runs seems like a bad idea (as you seem to
                  > think as well).
                  >
                  > -Lee
                  >
                  >
                  > >Hi!
                  > >
                  > >I've been having a bit of a headache with my beowulf cluster code - we
                  > >currently have several machines of different speeds, so while one machine
                  > >might perform 5 generations, another might only perform 3 generations.
                  > >
                  > >Would you generally let the fast machine and the slower machine exchange
                  > >genetic material? And if one machine restarted it's evolution, started
                  > >from zero, if it still allowed others to supply genetic material, it'd be
                  > >like it never restarted within only a handful of generations.
                  > >
                  > >I'm thinking, if they're within 10 generations of each other, I'll allow
                  > >the exchange, but that's such a neat solution. Another solution might be
                  > >to make the population on the faster machine larger, and on the slower
                  > >machine make it smaller, so that they keep about the same pace...
                  > >
                  > >Any ideas?
                  > >
                  > >thanks,
                  > >mattias fagerlund
                  > >
                  > >
                  > >
                  > >To unsubscribe from this group, send an email to:
                  > >genetic_programming-unsubscribe@yahoogroups.com
                  > >
                  > >
                  > >
                  > >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                  >
                  > --
                  > Lee Spector
                  > Associate Professor of Computer Science lspector@...
                  > Cognitive Science, Hampshire College http://hampshire.edu/lspector/
                  > Amherst, MA 01002 413-559-5352, Fax: 413-559-5438
                  >
                  >
                  > To unsubscribe from this group, send an email to:
                  > genetic_programming-unsubscribe@yahoogroups.com
                  >
                  >
                  >
                  > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                  >
                  >
                  >


                  ********************************************************************
                  *(Prof.) Douglas B. Kell Phone: +44 1970 622334 *
                  *Director of Research *
                  *Institute of Biological Sciences Fax: +44 1970 622354 *
                  *Cledwyn Building *
                  *University of Wales, Email: dbk@... *
                  *Aberystwyth SY23 3DD, U.K. http://qbab.aber.ac.uk/ *
                  * http://www.abergc.com*
                  ********************************************************************
                  ********************************************************************
                • Mattias Fagerlund
                  ... Well, this has, as I understand it, traditionally been avoided because 1. it scales worse than multiple separate demes, only exchanging about 5% of the
                  Message 8 of 19 , Oct 1, 2001
                  • 0 Attachment
                    > Our plan is to have one computer hold the population, and farm out
                    > evaluations to all computers. So, the faster computers will simply get
                    > a higher percentage of the population to evaluate. This has a lot of
                    > nice properties, including that we can repeat runs by simply using the
                    > same seed, assuming evaluation is deterministic.

                    Well, this has, as I understand it, traditionally been avoided because

                    1. it scales worse than multiple separate demes, only exchanging about 5%
                    of the population (I guess that's 5% each way=10%) cuts down on
                    communication

                    2. separate demes have been suggested to have beneficial properties over
                    one huge population.

                    I not really in any position to comment on either claim, but it's
                    something to think about.

                    > I think the "only exchange info if within 10 generations" thing is
                    > particularly dangerous; in one run, populations may be completely
                    > separate from almost the start, whereas in another, they'll be sharing
                    > info most of the way through. That's a very strange dynamic.

                    Yes, this is true, especially since one run might pass the boundry over
                    to the next run, and never ever meet the rest of the demes again (they'd
                    be on earlier runs all the time). Restarting all runs at the same time
                    would take a lot of weird synchronization.

                    m
                  • Mattias Fagerlund
                    ... Ah, this would only be true if I was actually involved in research, but I m more of a layman. I m developing a GP system (my fifth to date, I think). This
                    Message 9 of 19 , Oct 1, 2001
                    • 0 Attachment
                      > Don't forget: you are allowing your happen-stance machine architecture
                      > to dictate the direction of your research, albeit machanical research.

                      Ah, this would only be true if I was actually involved in research, but
                      I'm more of a layman. I'm developing a GP system (my fifth to date, I
                      think). This time I'm aiming at something I can actually release.

                      However, I'll try to make my system so versatile that researchers could
                      use it (it's probably more useful for students than researchers, though),
                      and as such, I will take your words into consideration.

                      > My first point is simple: Are you sure that you want to do that? (
                      > More strongly, this is not a good idea!)
                      >
                      > Second, when one CPU finishes before another, use the spare CPU to
                      > accomplish another job -- perhaps processing another instance of
                      > genetic change.
                      >
                      > Hope this helps.

                      thanks!

                      m
                    • Mattias Fagerlund
                      First off, I d like to thank all for their insightful answers! I m very grateful! To reward you, I shall yet again tax your patience with another question; the
                      Message 10 of 19 , Oct 1, 2001
                      • 0 Attachment
                        First off, I'd like to thank all for their insightful answers! I'm very
                        grateful!

                        To reward you, I shall yet again tax your patience with another question;
                        the traditional GP Beowulf cluster seems to define a toroidal world,
                        where each deme is thought to inhabit a separate island that can only
                        communicate with 4 other demes (the island model). This makes for a nice
                        separation of demes which is thought (?) to have advantageous properties.
                        Though I did implement this for my previous Beowulf sever (I'm like a
                        government agency, I make everything in triplicates ;), I have been
                        considering whether there actually are benefits to this. Would I stand to
                        lose significantly from simply caching the last X% of the submitted
                        individuals, and just picking random amongst those when emmigrants are
                        chosen? I would lose the deme separation. Any research (preferably
                        online) that I should read?

                        (now I'm guessing that you are thinking "why doesn't he implement both,
                        try it out, and write a paper ;)

                        regards,
                        m
                      • Martin C. Martin
                        ... As someone else pointed out, you need to do multiple runs of a genetic algorithm to get statistical significance. So, my plan for the 512 processor Cray
                        Message 11 of 19 , Oct 2, 2001
                        • 0 Attachment
                          Mattias Fagerlund wrote:
                          >
                          > > Our plan is to have one computer hold the population, and farm out
                          > > evaluations to all computers. So, the faster computers will simply get
                          > > a higher percentage of the population to evaluate. This has a lot of
                          > > nice properties, including that we can repeat runs by simply using the
                          > > same seed, assuming evaluation is deterministic.
                          >
                          > Well, this has, as I understand it, traditionally been avoided because
                          >
                          > 1. it scales worse than multiple separate demes, only exchanging about 5%
                          > of the population (I guess that's 5% each way=10%) cuts down on
                          > communication

                          As someone else pointed out, you need to do multiple runs of a genetic
                          algorithm to get statistical significance. So, my plan for the 512
                          processor Cray T3E was to only use 16 or 32 nodes for a single run, and
                          perform 32 or 16 runs in parallel. At those sizes, the communication
                          overhead isn't a big deal. In fact, if you want to get fancy, you can
                          send each node it's next individual before it finishes it's current one.

                          If you use 1 byte per node, then your individuals are only a few k each
                          at most. Depending on how long an eval takes, I think communication
                          won't be the bottleneck.

                          > 2. separate demes have been suggested to have beneficial properties over
                          > one huge population.

                          You can certainly implement demes in this framework. You can implement
                          demes on a single processor. And if you do it this way, the size and
                          topology of your demes can be chosen for their beneficial properties,
                          not because of the peculiar properties of your hardware.

                          Best,
                          Martin
                        • Sean Luke
                          ... I think the trend in EC is towards rather long evaluation times. If evaluation time is the dominant factor, then heck, even on systems which are piggish
                          Message 12 of 19 , Oct 2, 2001
                          • 0 Attachment
                            On Tuesday, October 2, 2001, at 12:53 PM, Martin C. Martin wrote:

                            > If you use 1 byte per node, then your individuals are only a few k each
                            > at most. Depending on how long an eval takes, I think communication
                            > won't be the bottleneck.

                            I think the trend in EC is towards rather long evaluation times. If
                            evaluation time is the dominant factor, then heck, even on systems which
                            are piggish in transfer size, communication lag *still* won't be the
                            bottleneck. If memory isn't a problem, I would optimize for an internal
                            representation which evaluates rapidly and can be easily modified rather
                            than one which can be stored and transferred efficiently.

                            Sean
                          • Martin C. Martin
                            ... Agreed. It turns out that, for my Ph.D. thesis, I had individuals with ~1000 nodes stored as a linear array in prefix order. To speed evaluation, I kept
                            Message 13 of 19 , Oct 2, 2001
                            • 0 Attachment
                              Sean Luke wrote:
                              >
                              > On Tuesday, October 2, 2001, at 12:53 PM, Martin C. Martin wrote:
                              >
                              > > If you use 1 byte per node, then your individuals are only a few k each
                              > > at most. Depending on how long an eval takes, I think communication
                              > > won't be the bottleneck.
                              >
                              > I think the trend in EC is towards rather long evaluation times. If
                              > evaluation time is the dominant factor, then heck, even on systems which
                              > are piggish in transfer size, communication lag *still* won't be the
                              > bottleneck. If memory isn't a problem, I would optimize for an internal
                              > representation which evaluates rapidly and can be easily modified rather
                              > than one which can be stored and transferred efficiently.

                              Agreed. It turns out that, for my Ph.D. thesis, I had individuals with
                              ~1000 nodes stored as a linear array in prefix order. To speed
                              evaluation, I kept a function pointer in each node. Since you want data
                              to be word aligned, and since I needed to keep more than just the
                              function pointer in the genome, that meant I needed two words per
                              individual. On a 32 bit machine, that's 8 bytes per node. For a
                              population of 10,000, that's 8*1000*10,000 = 80 Meg. On a 64 bit
                              machine, that's 160 Meg. Even with a population of 4,000 that's 32/64
                              Meg.

                              The solution was to keep the population in a compact representation (1
                              byte per node), then compute the bigger representation just before
                              evaluation. Computing the bigger representation is trivial; the single
                              byte is an index into a table, and all you do is copy certain fields
                              into the bigger array. So, I could have my cake and eat it too. :)

                              Cheers,
                              Martin
                            • Bill LANGDON
                              ... I think DGP (Dave Andre) and GPQuick both do this.
                              Message 14 of 19 , Oct 2, 2001
                              • 0 Attachment
                                >
                                >
                                > Sean Luke wrote:
                                > >
                                > > On Tuesday, October 2, 2001, at 12:53 PM, Martin C. Martin wrote:
                                > >
                                > > > If you use 1 byte per node, then your individuals are only a few k each
                                > > > at most. Depending on how long an eval takes, I think communication
                                > > > won't be the bottleneck.
                                > >
                                > > I think the trend in EC is towards rather long evaluation times. If
                                > > evaluation time is the dominant factor, then heck, even on systems which
                                > > are piggish in transfer size, communication lag *still* won't be the
                                > > bottleneck. If memory isn't a problem, I would optimize for an internal
                                > > representation which evaluates rapidly and can be easily modified rather
                                > > than one which can be stored and transferred efficiently.
                                >
                                > Agreed. It turns out that, for my Ph.D. thesis, I had individuals with
                                > ~1000 nodes stored as a linear array in prefix order. To speed
                                > evaluation, I kept a function pointer in each node. Since you want data
                                > to be word aligned, and since I needed to keep more than just the
                                > function pointer in the genome, that meant I needed two words per
                                > individual. On a 32 bit machine, that's 8 bytes per node. For a
                                > population of 10,000, that's 8*1000*10,000 = 80 Meg. On a 64 bit
                                > machine, that's 160 Meg. Even with a population of 4,000 that's 32/64
                                > Meg.
                                >
                                > The solution was to keep the population in a compact representation (1
                                > byte per node), then compute the bigger representation just before
                                > evaluation. Computing the bigger representation is trivial; the single
                                > byte is an index into a table, and all you do is copy certain fields
                                > into the bigger array. So, I could have my cake and eat it too. :)

                                I think DGP (Dave Andre) and GPQuick both do this.

                                > Cheers,
                                > Martin
                                >
                                >
                                > To unsubscribe from this group, send an email to:
                                > genetic_programming-unsubscribe@yahoogroups.com
                                >
                                >
                                >
                                > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                                >
                                >
                              • Sean Luke
                                ... Well, yes and no. When I wrote the code for the RoboCup project, I used lil-gp and hacked the living daylights out of it. A few of those hacks showed up
                                Message 15 of 19 , Oct 2, 2001
                                • 0 Attachment
                                  On Tuesday, October 2, 2001, at 01:36 PM, Martin C. Martin wrote:

                                  > Agreed. It turns out that, for my Ph.D. thesis, I had individuals with
                                  > ~1000 nodes stored as a linear array in prefix order. To speed
                                  > evaluation, I kept a function pointer in each node. Since you want data
                                  > to be word aligned, and since I needed to keep more than just the
                                  > function pointer in the genome, that meant I needed two words per
                                  > individual. On a 32 bit machine, that's 8 bytes per node. For a
                                  > population of 10,000, that's 8*1000*10,000 = 80 Meg. On a 64 bit
                                  > machine, that's 160 Meg. Even with a population of 4,000 that's 32/64
                                  > Meg.
                                  >
                                  > The solution was to keep the population in a compact representation (1
                                  > byte per node), then compute the bigger representation just before
                                  > evaluation. Computing the bigger representation is trivial; the single
                                  > byte is an index into a table, and all you do is copy certain fields
                                  > into the bigger array. So, I could have my cake and eat it too. :)

                                  Well, yes and no. When I wrote the code for the RoboCup project, I used
                                  lil-gp and hacked the living daylights out of it. A few of those hacks
                                  showed up in my STGP lil-gp kernel on the web. lil-gp is a really
                                  well-designed system. But one lesson I took away from that project was
                                  that a packed array representation (lil-gp uses 4 bytes per node) is not
                                  as easy to hack as I would have liked. Writing custom breeding
                                  operators for it is a nontrivial endeavor, especially when you add in
                                  special constraints like typing, and special-purpose nodes with unusual
                                  needs, like storing arbitrary-length statistics on a per-node basis.
                                  1-byte-per-node arrangements like DGP are IMHO even harder to hack weird
                                  stuff into. And as a researcher, that's basically what I am: a
                                  weird-stuff-hacker.

                                  So when I sat down to design a big, full-featured EC system for my
                                  thesis work, I decided on an ordering of needs for my system. First, I
                                  needed portability. Second, I needed hackability and maintainability.
                                  Third, I'd like reasonable evaluation speed. Things that were far down
                                  the list: breeding speed (dominated by evaluation times for me, it's
                                  almost inconsequential) and memory consumption (RAM is cheap).

                                  There are traditional computer science tradeoffs between (A) memory
                                  consumption, (B) speed, and (C)
                                  portability/modifiability/maintainability of code. I do think that
                                  these tradeoffs apply well to GP coding projects: you just can't have
                                  your cake and eat it too in all situations. For me, I decided that B
                                  and C were the critical items, with C being most important, and A was
                                  not important. So I designed a system that emphasized C, tried to do B,
                                  and to heck with A. So I abandoned a packed-array representation and
                                  went with a direct tree representation with lots of bells and whistles
                                  (like back-pointers). I also went with Java as my language (I almost
                                  did it in hand-optimized Common Lisp, but it wouldn't run on an
                                  important target platform of the time, an Alpha farm I had available).
                                  The result is, I'm proud to admit, the public-domain system with the
                                  largest per-node memory usage, at about 30 bytes per node. :-) But
                                  it's highly portable, IMHO very hackable, and acceptably fast. And when
                                  I did my extreme-bloating experiments, with many individuals >10,000
                                  nodes in size (!!) plus lots of weird internal statistics embedded in
                                  each node, and a population size of 1000, I never got much higher than
                                  512 megs -- that is, I was usually within the RAM constraints of my
                                  machines. So it was a good tradeoff I think.

                                  Adil Qureshi made similar design decisions when he put together GPsys,
                                  which uses about 24 bytes per node. But most other systems have gone
                                  the memory-is-important route, resulting in lil-gp (4 bytes per node, 17
                                  bytes for ERCs), DGP (1 byte per node), and GPquick (1 byte per node I
                                  think). Dunno about GPC++. Nic McPhee's Java-based Sutherland system
                                  takes a different approach. It's tree-based, but shares subtrees among
                                  individuals. Nic reports that this results in huge memory savings, and
                                  I must admit it's a really cool notion, though I'm worried about how
                                  tough it would be to add odd stuff into it. Maybe not a big deal.

                                  But I do think that optimizing for memory should be done only if you
                                  absolutely must. RAM is cheap, evaluation time is dominant, and large
                                  individuals result in exponentially larger search spaces (and thus
                                  "searching for larger individuals" is a hard justification to buy).
                                  Even so there are things that can be done independent of representation:
                                  John Koza proposed an interesting method for pre-determining breeding,
                                  which allows you to cut memory requirements in half doing generational
                                  evolution as well, though I think it's difficult to implement if you've
                                  got multiple threading. Adil's got Koza's mechanism in GPsys; I decided
                                  to go for steady-state, among others, in ECJ.

                                  Of course your requirements could be for *massive* hackability, even at
                                  the expense of speed and maintainability. That's what Lee Spector's
                                  doing with his GP stuff, all coded in Common Lisp, using Funky Lisp
                                  Goodness that other languages wish they had.

                                  Sean
                                • Martin C. Martin
                                  ... At the risk of straying off topic, I don t see why it should be any harder to hack weird stuff. No matter what your representation, you need a list
                                  Message 16 of 19 , Oct 2, 2001
                                  • 0 Attachment
                                    Sean Luke wrote:
                                    >
                                    > On Tuesday, October 2, 2001, at 01:36 PM, Martin C. Martin wrote:
                                    >
                                    > But one lesson I took away from that project was
                                    > that a packed array representation (lil-gp uses 4 bytes per node) is not
                                    > as easy to hack as I would have liked. Writing custom breeding
                                    > operators for it is a nontrivial endeavor, especially when you add in
                                    > special constraints like typing, and special-purpose nodes with unusual
                                    > needs, like storing arbitrary-length statistics on a per-node basis.
                                    > 1-byte-per-node arrangements like DGP are IMHO even harder to hack weird
                                    > stuff into. And as a researcher, that's basically what I am: a
                                    > weird-stuff-hacker.

                                    At the risk of straying off topic, I don't see why it should be any
                                    harder to hack weird stuff. No matter what your representation, you
                                    need a list somewhere of all possible nodes, their arity, the function
                                    which implements them, etc. So in both cases you have a table. In the
                                    "compact" representation, each node in the individual is just an index
                                    into that array. Depending on how you do the typing, you can put the
                                    type information in the array. That worked fine for my thesis.

                                    If you want extra info per instance of node, rather than per type of
                                    node, you may need to put it in the individual as well, but that
                                    shouldn't be a big deal if you're using object oriented standards; just
                                    change the data type you use for Node from "byte" to "struct."

                                    This seems straightforward to me. What am I missing?

                                    - Martin
                                  • Sean Luke
                                    ... Hey, Martin! Well, structs are fixed in size. So as a quick example of an obvious hack that I needed almost out of the bat: what do you do if the amount
                                    Message 17 of 19 , Oct 2, 2001
                                    • 0 Attachment
                                      On Tuesday, October 2, 2001, at 03:58 PM, Martin C. Martin wrote:

                                      > At the risk of straying off topic, I don't see why it should be any
                                      > harder to hack weird stuff. No matter what your representation, you
                                      > need a list somewhere of all possible nodes, their arity, the function
                                      > which implements them, etc. So in both cases you have a table. In the
                                      > "compact" representation, each node in the individual is just an index
                                      > into that array. Depending on how you do the typing, you can put the
                                      > type information in the array. That worked fine for my thesis.
                                      >
                                      > If you want extra info per instance of node, rather than per type of
                                      > node, you may need to put it in the individual as well, but that
                                      > shouldn't be a big deal if you're using object oriented standards; just
                                      > change the data type you use for Node from "byte" to "struct."

                                      Hey, Martin! Well, structs are fixed in size. So as a quick example of
                                      an obvious hack that I needed almost out of the bat: what do you do if
                                      the amount and kind of data varies from node to node? I think you have
                                      four options:

                                      - Each item in the array is a big huge union
                                      - Items are indexes into to a big table of data values
                                      - Pointers to "node objects" must be allocated per slot in the array
                                      - Custom-compact it into a byte array with special hard-coded "length"
                                      info embedded in the array

                                      That's all I could think of on short notice. So anyway big huge union
                                      fails if the data can be arbitrarily large; at any rate, it has a big
                                      loss even if just a couple nodes nodes, occasionally, might need to have
                                      a lot of items stored in them. The table mechanism (which is what
                                      lil-gp does for ERCs) is realistic only if you have a small number of
                                      permutations in your population, else it effectively reduces to the
                                      "pointer" mechanism. The custom-compact mechanism significantly
                                      complicates crossover etc. (besides, it's custom, we're looking for
                                      general-purpose). That leaves pointers to node objects: if your
                                      pointers are to void* data, you've got at *least* 4 bytes minimum, plus
                                      data storage. If you're doing it Right and using pointers to class
                                      instances, this typically bumps you to 8-12 bytes minimum, plus data
                                      storage, depending on the language.

                                      So let's say you bite the bullet and add pointers -- now rather than
                                      simple C structs with simple copying, you need to add allocation and
                                      deallocation on a per-node basis, an awful lot of code reworking just
                                      for that little extra feature, and you probably increased your memory
                                      consumption up to at least 8 bytes per node, plus incurred heap
                                      fragmentation overhead. Or at least it was for lil-gp anyway.

                                      Now: what if, say, you want nodes to have arbitrary numbers of child
                                      nodes? Hacking this into lil-gp and fixing the breeding operators was
                                      not an enjoyable experience. It's been a while, I don't recall the
                                      exact other times I had to rewrite a bunch of stuff just to add Feature
                                      X to a compact array representation, but it was enough times, and got
                                      kludgey enough that I figured, bag it, I can think of better things to
                                      waste neurons on. So I went with a tree representation and a 6-8 fold
                                      increase over lil-gp's 4-byte arrangement. It sure made it easier to
                                      tack in lots and lots of goodies. Memory's been running faster than
                                      Moore's law lately anyway.

                                      Knuth's Law holds for GP I think. When you optimize, you have to be
                                      very careful it doesn't result in you having to go back in again and
                                      yank out a lot of stuff just to add Feature X. In my experience this is
                                      a serious weak point of compact array representations, at least for the
                                      oddities I was hoping to add in. Plus, going with trees sure made
                                      things easier for me to grok four months later going back through my
                                      code. Not to say I'm somehow "against" compact arrays -- if the memory
                                      requirements demand it, I'll use one in a heartbeat. I've just never
                                      needed to.

                                      And truth be told, I'm being a hypocrite here: I've recently been
                                      puzzling about how (or if) to add Koza's architecture-altering
                                      operations to ECJ in a clean and extensible way, and it would have been
                                      *trivial* if I hadn't painted myself in a corner by settling on a
                                      specific ADF optimization earlier on, in the name of (you guessed it)
                                      memory consumption. Dang!

                                      Sean
                                    • Bill LANGDON
                                      For speed, the interpreter in GPquick converts each GP node to a C++ function pointer (ie 4-byte pointer to compiled code). Executing the tree becomes simply
                                      Message 18 of 19 , Oct 3, 2001
                                      • 0 Attachment
                                        For speed, the interpreter in GPquick converts each GP node to a C++
                                        function pointer (ie 4-byte pointer to compiled code). Executing the
                                        tree becomes simply running through this array of function
                                        calls. (Most machines can do this rapidly with indirect and
                                        auto-increment instructions). Being compact, there is a reasonable
                                        chance it all fits into the CPU's cache.

                                        Where a GP individual is going to be run more than once, it might pay
                                        to use this form of interpreter even if storing the tree in another
                                        form. Ie pass once through the whole tree creating the array of
                                        function pointers as you go.

                                        As Simon Handley and other have pointed out (eg AiGP2 chapter 13),
                                        where your GP nodes do _not_ have side-effects, considerable savings in
                                        memory and execution (via caches) can be made by storing trees as
                                        DAGs.

                                        Thank you

                                        Bill

                                        W. B. Langdon, Phone +44 20 7679 4436
                                        Computer Science, Fax +44 20 7387 1397
                                        University College, London,
                                        Gower Street,
                                        London, WC1E 6BT, UK
                                        http://www.cs.ucl.ac.uk/staff/W.Langdon

                                        EuroGP Ireland 3-5 April 2002 http://evonet.dcs.napier.ac.uk/eurogp2002
                                        GECCO New York 9-13 July 2002 http://www.isgec.org/GECCO-2002
                                        GP+EM Journal http://www.wkap.nl/journalhome.htm/1389-2576
                                        GP Bibliography http://www.cs.bham.ac.uk/~wbl/biblio/
                                      Your message has been successfully submitted and would be delivered to recipients shortly.