Loading ...
Sorry, an error occurred while loading the content.
 

Re: Bloat defeated! (Maybe...)

Expand Messages
  • Colin Green
    ... Yup, I remember this from A level chemistry lessons. ... Yes I like it. We ve discussed the two different approaches on the group before (mutation per
    Message 1 of 24 , Mar 2, 2006
      --- In neat@yahoogroups.com, Ian Badcoe <ian_badcoe@...> wrote:

      > OK, so this is to do with the rate equations of
      > dynamic equilibria...
      >
      > Something which anybody with a background in chemistry
      > will be familiar with...

      Yup, I remember this from A level chemistry lessons.



      > So you can see the general situation, the strength of
      > the reverse process gets stronger and stronger as you
      > move to larger and larger genomes, so that the
      > quilibrium position of between each g(n) and g(n + 1)
      > gets further towards the smaller end.

      Yes I like it. We've discussed the two different approaches on the
      group before (mutation per genome vs per gene) and I argued for 'per
      genome' on the basis that 'per gene' is too strong a force on
      increasing complexification. I guess this is how I ended up with the
      phased searching where complexity increases dramatically and then is
      stripped away equally dramatically. A system with a natural equlibrium
      seems a far more elegant solution, and potentially a more efficient
      one - phased searching is addressing the symptoms of a problem whereas
      a system that finds an equlibrium doesn't exhibit the problem in the
      first place.

      I wonder though if there is an argument for periodically resetting the
      equlibrium point to allow the search more free reign. A search will
      happily complexify and search in unfit spaces early on, but as
      selection pressure pushes us forward the propensity to search is
      reduced. It's the old problem of how many accumulative mutations are
      requried to find a beneficial modification.

      So perhaps phased searching could still be employed, but instead of
      dramatic complexifying vs pruning phases, we have softer phases where
      we simply move the equilibrium point up and down.



      > Obviously all that is in the absence of selection.
      > Selection pushes those equilibria around with great
      > ease (which is why my current run, with an expected
      > size of two, is currently hovering around an average
      > size of 100...

      Nice result. So all 100 genes must be contributing to fitness in some
      way *and* are lilely to not be redundant e.g. unnecessary chains where
      a single connection would do. BTW do you do semi-clever deletions
      whereby a chain can be collapsed to a single connection? I do this and
      I think it makes sense to do some simple semi-intelligent mutation
      slike this - so long as you don't start doing really complex
      mutations. A sort of keep it simple, but not *that* simple, approach.


      >
      > Personally, I'm beginning to think that the
      > relationship between how you run delete probability
      > (e.g. per gene or per genome) and how you handle
      > unmatched genes during cross over are pretty crucial
      > for the control of genome size. Unless you do both
      > right, the ratio of m_a and m_d give far less control
      > over the rate of bloat than you would ever expect...

      I broadly agree - although other implementation details/parameters
      could affect the ability to set up a system that finds an equilibrium.


      > This is all assuming you have deletion as a mutation,
      > if you do it another way then obviously there is no
      > bassis for comparison.

      It still might be possible to set up and equilibrium because of
      speciation (in traditional NEAT) and culling of old (usually large)
      species - but it's harder to understand.

      Thanks for taking the time to post.

      Colin
    • Colin Green
      ... I just wanted to add that this effectively what my phased search is doing - switching between an equlibrium point of 0 (pruning) and infinity
      Message 2 of 24 , Mar 2, 2006
        Colin Green wrote:

        >
        >I wonder though if there is an argument for periodically resetting the
        >equlibrium point
        >
        I just wanted to add that this effectively what my phased search is
        doing - switching between an equlibrium point of 0 (pruning) and
        infinity (complexifying), as you said at the end of your email.

        Colin.
      • Ian Badcoe
        ... I too had this thought, although in a different form. I was thinking that we could make some assumption such as: when the average genome size is N, then
        Message 3 of 24 , Mar 2, 2006
          At 19:47 02/03/2006, you wrote:
          >Colin Green wrote:
          >
          > >
          > >I wonder though if there is an argument for periodically resetting the
          > >equlibrium point
          > >

          I too had this thought, although in a different form. I was thinking
          that we could make some assumption such as:

          "when the average genome size is N, then suppose that the
          minimum useful genome size to have around must be at least f(N) and
          recalculate the addition/deletion ratio to target f(N)"

          - where f(N) < N

          Examples might be:

          f(N) = kN; // k ~ 0.5 - 0.75?
          f(N) = sqrt(N);
          f(N) = max(N - 10, 10);

          >I just wanted to add that this effectively what my phased search is
          >doing - switching between an equlibrium point of 0 (pruning) and
          >infinity (complexifying), as you said at the end of your email.

          Note that "equilibrium at infinity" isn't any sort of
          equilibrium (except in a few of the wilder religions) :)

          Ian

          In 15 minutes everybody will be in the future.
        • Colin Green
          ... Or start low and let the equilibrium point gradually approach N asymptotically at times of fitness stagnation. ... Oh yeh :) Colin.
          Message 4 of 24 , Mar 3, 2006
            Ian Badcoe wrote:

            >
            >I too had this thought, although in a different form. I was thinking
            >that we could make some assumption such as:
            >
            > "when the average genome size is N, then suppose that the
            >minimum useful genome size to have around must be at least f(N) and
            >recalculate the addition/deletion ratio to target f(N)"
            >
            > - where f(N) < N
            >
            > Examples might be:
            >
            > f(N) = kN; // k ~ 0.5 - 0.75?
            > f(N) = sqrt(N);
            > f(N) = max(N - 10, 10);
            >
            >
            >

            Or start low and let the equilibrium point gradually approach N
            asymptotically at times of fitness stagnation.

            >
            > Note that "equilibrium at infinity" isn't any sort of
            >equilibrium (except in a few of the wilder religions) :)
            >
            >

            Oh yeh :)

            Colin.
          • Colin Green
            Some idle ramblings... :) I see there is a short thread on the GP list discussing bloat. There they are discussing modifying the fitness function to punish
            Message 5 of 24 , Mar 4, 2006
              Some idle ramblings... :)

              I see there is a short thread on the GP list discussing bloat. There
              they are discussing modifying the fitness function to punish bloat -
              'parsimony pressure' as someone termed it. E.g. if using tournament
              selection then if two genomes have the same fitness on the task at hand
              then the shortest genome wins, a sort of secondary objective if you
              like. I think there is a risk with that approach of restricting a search
              from exploring areas of the search space with additional complexity with
              no immediate gain. We want /some/ exploration of complexity, just not
              too much. The use of a secondary objective like this makes for a very
              definite limit or barrier to the complexity of genomes being searched in
              times of fitness stagnation - only selection based on the primary
              objective can choose more complex genomes.

              The techniques discussed on here of using 'softer' controls for bloat
              make more sense to me, e.g. Ian's equlibrium idea or my technqiue of
              allowing complexity to rise up to a point N points above the previous
              low [following a pruning phase]. These techniques don't define a hard
              'complexity barrier' and therefore give a possible path out of fitness
              stagnation where there may not be a path that consists of a single step
              (mutation/crossover).

              Actually my technqiue does sort of define a hard barrier, but it is
              above the averge complexity of the population following a pruning, hence
              it does give some leeway for exploration where a secondary fitness
              objective wouldn't.

              Both techniques (of soft control) control population complexity by
              adjusting the balance of additive and destructive mutations. In my case
              I use extremes of no additive mutations + a high deletion rate (and no
              crossover BTW) versus normal operation where the equlibrium point is at
              infinity (actually Ian that does /sort of/ make sense). So the actual
              complexity of genomes will (should!) be distributed around the
              equilibrium point (normal distribution?), with the bulk of genomes near
              the eq point and the odd few much further out - which seems ideal, right?

              Ian, as you said the way we handle recombination - e.g. disjoint and
              excess genes - is important because this is another means by which
              complexity can be affected. One solution is to ensure that the all of
              the offspring from a recombination round have the same total complexity
              as all of the parents, a sort of zero-sum game scenario. That way only
              the mutation operators have an affect on complexity and we are happy
              that they are balanced. Alternatively we allow non-zero-sum
              recombnation, but set up a balance similar to the one used with
              additive/deletive mutations, so that once again the overall system will
              find an equilibrium point.

              Having said all that it just occured to me that a secondary 'parsimony
              pressure' could be applied in a similarly 'soft' manner, e.g. don't
              apply the secondary function until complexity has risen beyond some
              level and then only gradually increase its weighting. With this approach
              though we assume that the underlying pressure is upwards and that we are
              defining a soft upper limit to control it, therefore once again
              [possibly] setting up an equilibrium point where the genomes are
              distributed around that point, although possibly in an unevenly shaped
              distribution because the upward and downward pressures are applied by
              different processes.

              In summation then I think both approaches equate (or could be made to
              equate) to the same thing, and it's just a matter of which we we think
              is simplest and easiest to implement.

              Colin.
            • Kenneth Stanley
              This is an interesting discussion and I wanted to think about it for a minute at a more general level. Basically I m wondering, how would you guys articulate
              Message 6 of 24 , Mar 7, 2006
                This is an interesting discussion and I wanted to think about it for
                a minute at a more general level.

                Basically I'm wondering, how would you guys articulate the ideal
                situation with respect to the distribution and direction of change
                in complexities in a population when it is far from a solution?
                This seems to be harder to pin down than it initially seems.

                I think we'd all agree at first that ideally, you want to find the
                minimum necessary complexity for a task. But the truth is that the
                general principle of finding the "smallest possible" doesn't
                necessarily inform practice to a great extent since we don't know
                what the smallest possible is anyway. It's rather just a
                theoretical goal.

                A more practical perspective is to ask what is the overall ideal
                complexity distribution in a situation where you don't have a
                solution and don't know the complexity of a solution? And it seems
                to me the goal should be to increase the *range* of complexity in
                the population. That is, I would not articulate the objective in
                terms of average complexity, or max complexity, but in terms of
                range and distribution (hopefully nicely distributed across that
                range).

                The reason is that barring any notion of the "right size," the right
                approach is to not put all your eggs in any one basket. You have to
                assume that the longer you search without success, the less you
                really know about where you should be. You want to probe the waters
                of deeper complexity, but you don't want to give up on your simple
                spaces yet either, since any time they could yield gold, especially
                if they show improvement (although not a solution).

                Or would you disagree with this perspective? Would you perhaps say
                that the whole population should move together in a certain
                direction? It becomes an interesting question with NEAT since NEAT
                can support a robust range of complexities, which is different from
                GP where the population tends to bump itself up together because of
                the free-for-all with no speciation. And if the complexity range
                is really a key factor, then methods that attempt to limit bloat may
                be taking an overly simplified view because they look at the goal
                with respect to a single value (i.e. max or average complexity) as
                opposed to a vector of values (the whole set of complexities).
                Therefore they be may ignoring the more significant methodological
                question of how to keep a healthy range, or how rapidly to increase
                that range in the face of stagnation.

                So what is the right idea? If you aren't close to a solution, where
                do you want complexities to go?

                ken

                --- In neat@yahoogroups.com, Colin Green <cgreen@...> wrote:
                >
                > Some idle ramblings... :)
                >
                > I see there is a short thread on the GP list discussing bloat.
                There
                > they are discussing modifying the fitness function to punish
                bloat -
                > 'parsimony pressure' as someone termed it. E.g. if using
                tournament
                > selection then if two genomes have the same fitness on the task at
                hand
                > then the shortest genome wins, a sort of secondary objective if
                you
                > like. I think there is a risk with that approach of restricting a
                search
                > from exploring areas of the search space with additional
                complexity with
                > no immediate gain. We want /some/ exploration of complexity, just
                not
                > too much. The use of a secondary objective like this makes for a
                very
                > definite limit or barrier to the complexity of genomes being
                searched in
                > times of fitness stagnation - only selection based on the primary
                > objective can choose more complex genomes.
                >
                > The techniques discussed on here of using 'softer' controls for
                bloat
                > make more sense to me, e.g. Ian's equlibrium idea or my technqiue
                of
                > allowing complexity to rise up to a point N points above the
                previous
                > low [following a pruning phase]. These techniques don't define a
                hard
                > 'complexity barrier' and therefore give a possible path out of
                fitness
                > stagnation where there may not be a path that consists of a single
                step
                > (mutation/crossover).
                >
                > Actually my technqiue does sort of define a hard barrier, but it
                is
                > above the averge complexity of the population following a pruning,
                hence
                > it does give some leeway for exploration where a secondary fitness
                > objective wouldn't.
                >
                > Both techniques (of soft control) control population complexity by
                > adjusting the balance of additive and destructive mutations. In my
                case
                > I use extremes of no additive mutations + a high deletion rate
                (and no
                > crossover BTW) versus normal operation where the equlibrium point
                is at
                > infinity (actually Ian that does /sort of/ make sense). So the
                actual
                > complexity of genomes will (should!) be distributed around the
                > equilibrium point (normal distribution?), with the bulk of genomes
                near
                > the eq point and the odd few much further out - which seems ideal,
                right?
                >
                > Ian, as you said the way we handle recombination - e.g. disjoint
                and
                > excess genes - is important because this is another means by which
                > complexity can be affected. One solution is to ensure that the all
                of
                > the offspring from a recombination round have the same total
                complexity
                > as all of the parents, a sort of zero-sum game scenario. That way
                only
                > the mutation operators have an affect on complexity and we are
                happy
                > that they are balanced. Alternatively we allow non-zero-sum
                > recombnation, but set up a balance similar to the one used with
                > additive/deletive mutations, so that once again the overall system
                will
                > find an equilibrium point.
                >
                > Having said all that it just occured to me that a
                secondary 'parsimony
                > pressure' could be applied in a similarly 'soft' manner, e.g.
                don't
                > apply the secondary function until complexity has risen beyond
                some
                > level and then only gradually increase its weighting. With this
                approach
                > though we assume that the underlying pressure is upwards and that
                we are
                > defining a soft upper limit to control it, therefore once again
                > [possibly] setting up an equilibrium point where the genomes are
                > distributed around that point, although possibly in an unevenly
                shaped
                > distribution because the upward and downward pressures are applied
                by
                > different processes.
                >
                > In summation then I think both approaches equate (or could be made
                to
                > equate) to the same thing, and it's just a matter of which we we
                think
                > is simplest and easiest to implement.
                >
                > Colin.
                >
              • Colin Green
                ... I suspect that part of the answer here lies with tabu search. I don t think that there is one general purpose optimal distribution as such. I think it
                Message 7 of 24 , Mar 7, 2006
                  --- In neat@yahoogroups.com, "Kenneth Stanley" <kstanley@...> wrote:

                  > A more practical perspective is to ask what is the overall ideal
                  > complexity distribution in a situation where you don't have a
                  > solution and don't know the complexity of a solution?

                  I suspect that part of the answer here lies with tabu search. I don't
                  think that there is one general purpose optimal distribution as such.
                  I think it makes sense to search around the current champs for a while
                  (with a normal distribution say) and then to cordon off regions of the
                  search space that have been fairly well searched. This would have the
                  side effect of forcing the search into more (and less) complex regions
                  and would also result in some fairly odd/complex looking distribution
                  'curves' I would imagine - each of which would be optimal for that
                  point in time given the path of the search through the search space so
                  far.

                  Anyway this is just a quick thought for you to mull over - I'll try
                  and fill out this idea when I have more time.

                  Colin.
                • John Arrowwood
                  ... The issue there is, what constitutes searched ? Is it a particular network structure, or is it the combination of network structure and connection
                  Message 8 of 24 , Mar 7, 2006
                    On 3/7/06, Colin Green <cgreen@...> wrote:
                    --- In neat@yahoogroups.com, "Kenneth Stanley" <kstanley@...> wrote:

                    > A more practical perspective is to ask what is the overall ideal
                    > complexity distribution in a situation where you don't have a
                    > solution and don't know the complexity of a solution?

                    I suspect that part of the answer here lies with tabu search. I don't
                    think that there is one general purpose optimal distribution as such.
                    I think it makes sense to search around the current champs for a while
                    (with a normal distribution say) and then to cordon off regions of the
                    search space that have been fairly well searched.

                    The issue there is, what constitutes 'searched'?  Is it a particular network structure, or is it the combination of network structure and connection weights? 

                    Do we make an effort to prevent redundant searching by keeping track of everybody in the population that we have ever tested, and prevent new offspring from being identical to an old one? 

                    Do we go even further and keep track of a 'distance' metric between every individual and every individual that has gone before, so as to 'spread out' the search?  That is, the lower the fitness of an individual, the farther away the closest 'similar' individual must be?  The point being that the more the fitness drops, the farther out we search.  The more it rises, the less we move away, in the hopes of narrowing in on the peak fitness area.

                    Of course, that last idea is not very practical.  But there may be merit to the concept.  Maybe have the amount of mutation applied be proportional to the fitness?  The less fit it is, the more we tweak it.  The only problem there is, what if you hit on the right structure but with the wrong weights?  It will be low fitness, so you change it even more, or worse, throw it away. 

                    Or maybe this concept applies only to weight mutation?  Suppose you have 100 individuals in your population who all have the same network structure.  Create a forumula that says that the worst individual in that population gets assigned all new weights at random from the full available weight range (+/- 1 usually, but not always).  The closer their fitness to the champion, the less percentage of their connections get tweaked, and the less of a tweak it is.  Once you get to the champion, it is tweaking 0% of the weights, 0% of the available range.  Elitism, in other words.  If I'm not mistaken, that would cause the search to very rapidly scour the landscape for a particular topology.  Thoughts?

                    -- John
                  • Ian Badcoe
                    Hi Ken, I m not sure I can reply to all the detailed points in this right now, because I m not sure my ideas are so developed (call my current status informed
                    Message 9 of 24 , Mar 8, 2006
                      Hi Ken,

                      I'm not sure I can reply to all the detailed
                      points in
                      this right now, because I'm not sure my ideas are so
                      developed (call my current status "informed
                      phenomenology"). But my generally:

                      1) "Bloat" and "increasing complexity" are not
                      the same thing. In our (my) terminology "bloat"
                      means
                      out-of-control growth of genome size. E.g. which
                      happens whether or not there is a need for it in
                      the
                      search. I feel that my result when I
                      investigated
                      keeping all unmatched genes vs. 50% chance
                      of keeping
                      an unmatched gene very clearly illustrated
                      that the
                      former was causing an unconditional drive to
                      bigger
                      genomes. A drive which people have had to address
                      in
                      various ways to keep their systems
                      reasonable. My
                      "trimming" and Colin's "phased
                      searching"
                      both
                      addressed this directly and I believe your
                      original
                      speciation
                      helps keep this under control
                      as a side
                      effect. Also many people may have
                      generally set their
                      add mutations lower to try and
                      counter this, when a
                      higher rate might have given a
                      better search...


                      2) The dynamic equilibrium approach to add/del
                      mutations
                      does give a range of sizes in a population,
                      obviously
                      that range will only be moderate unless the
                      population
                      is huge. This approach also gives a random
                      drift in
                      genome size verses time (see those graphs
                      I posted
                      into the groups file area) this constitutes
                      an on-going
                      search
                      over genome size.

                      3) everything I discussed in this thread was pretty
                      much without regard to speciation. So any
                      mechanism
                      for keeping (or creating) a range of sizes and
                      other
                      diversity via speciation would be something layered
                      on
                      top of what I did here. I would like to regard
                      the mechanism I discribed as something that can
                      apply
                      to each species independently irrespective of what
                      speciation mechanism is in place. Thus each
                      species
                      would maintain a different dynamic equilibrium, and
                      focus in a different part of the search space.

                      I'm not sure how this would apply with the
                      traditional

                      speciation
                      , e.g. with individuals being assigned to
                      species rather than the species defined by the
                      individuals in it, but it might work out...

                      > This is an interesting discussion and I wanted to
                      think
                      > about it for a minute at a more general level.

                      > Basically I'm wondering, how would you guys
                      articulate
                      > the ideal situation with respect to the distribution
                      > and direction of change in complexities in a
                      population
                      > when it is far from a solution? This seems to be
                      harder
                      > to pin down than it initially seems.

                      There are two cases. If you are far from the solution
                      but have a clear direction of progress (and for our
                      purposes "clear progress" might mean "happening more
                      often than once in N generations") then I wouldn't say
                      we particularly needed a wide range of complexities.

                      ((or to put it another way, since it is making
                      progress,
                      then the current range of complexities must be OK))

                      In the other case, if you are not making progress then
                      there could be a case for fostering a wide range of
                      complexities _BUT_:

                      i) just because you haven't made progress with a
                      certain
                      size, doesn't mean that you cannot make progress
                      with
                      it. So would you choose to go immediately for some
                      bigger (expensive) genomes? I would think that if
                      there is any progress to be made with a simpler
                      genome, then you would want to do that first.

                      [[there's a whole load of reasoning here about the
                      shape of the fitness landscape and the minimum
                      size
                      of a genome required to make _any_ inroad on the
                      problem]]

                      ii) diversity in complexity != diversity in strategy
                      (they intersect, but neither is a subset of the
                      other)



                      iii) how much are we really worried about junk genes?
                      There are two reasons for avoiding them: (a) if
                      they
                      reduce fitness, or (b) if they slow the search.
                      Now these two are different cases for case (a) I
                      think we need not worry, because things which
                      reduce
                      fitness are good (e.g. over neutral things which
                      tell
                      us
                      nothing about the solution space) and will be
                      eliminated by simple selection.

                      For case (b), how much should some junk genes slow
                      the search? There's two sides to this: (1)
                      slowing
                      of the fitness function, obviously this is a real
                      problem in extreme cases but not for any of the
                      junk genes that can be detected as NOPs and
                      eliminated
                      from the phenotype.

                      The other side is that of slowing the actual search
                      and whilst this is a core consideration of the
                      design of NEAT, and one that I am sure still
                      applies,
                      it need not be as severe as it sometimes is. Junk
                      genes can slow the search by moving us into
                      a more-complex are of the fitness landscape - this
                      is unavoidable but should be self-limiting as long
                      as some less-complex individuals
                      remain around
                      (either they can progress, in whcih case they will
                      do it faster than the complex ones, or else we
                      _need_
                      to be complex and pay this cost). The other way
                      that junk genes can slow the search is if they soak
                      up mutations (by definition mostly neutral) which
                      could be usefully expended on functional genes.

                      This last point brings us back to the same issue
                      that came up in my original explanation of the
                      kinetics of adding and removing genes. e.g. as
                      long
                      as mutation is a "per gene" activity and not a
                      "per genome" activity, junk genes will not dilute
                      the mutation of any functional genes, because every
                      gene will have its own mutation allocation (its
                      a mutation allocation situation, in fact, this is
                      the
                      mutation allocation situation explanation).

                      > I think we'd all agree at first that ideally, you
                      want
                      > to find the minimum necessary complexity for a task.

                      > But the truth is that the general principle of
                      finding
                      > the "smallest possible" doesn't necessarily inform
                      > practice to a great extent since we don't know what
                      > the smallest possible is anyway. It's rather just a

                      > theoretical goal.

                      I'd not even say that _minimum_ complexity was the
                      goal,
                      I'd say the real goal is "sufficient simplicity".
                      e.g. as long as the system stays simple enough that it
                      can complete the search, I would not mind if it had
                      some % junk dna in it when the search ended. It is
                      usually very easy to remove junk genes from a genome
                      (that's what my "trimming" approach does).

                      > A more practical perspective is to ask what is the
                      > overall ideal complexity distribution in a situation
                      > where you don't have a solution and don't know the
                      > complexity of a solution? And it seems to me the
                      goal
                      > should be to increase the *range* of complexity in
                      the
                      > population. That is, I would not articulate the
                      > objective in terms of average complexity, or max
                      > complexity, but in terms of range and distribution
                      > (hopefully nicely distributed across that range).
                      >
                      > The reason is that barring any notion of the "right
                      > size," the right approach is to not put all your
                      eggs
                      > in any one basket. You have to assume that the
                      longer
                      > you search without success, the less you really know
                      > about where you should be. You want to probe the
                      waters
                      > of deeper complexity, but you don't want to give up
                      on
                      > your simple spaces yet either, since any time they
                      could
                      > yield gold, especially if they show improvement
                      > (although not a solution).

                      Sure but variation over time has to also be crucial
                      .
                      These fitness
                      landscapes are (by definition or you
                      wouldn't
                      use a genetic techique) too complex to sample
                      with a single population at one point in time. Given
                      that,
                      one must then take into account the reverse
                      consideration: given that you are localised to a part
                      (or several parts with speciation) of the search
                      space,
                      you must sample densely enough within that part to
                      really
                      reveal its potential.

                      e.g. spreading your population too thinly is also bad.

                      > Or would you disagree with this perspective? Would
                      you
                      > perhaps say that the whole population should move
                      > together in a certain direction?

                      A whole _species_ should move together on a random
                      walk.
                      That is what I dislike about trad speciation, the
                      search
                      for the right species for each new individuals binds
                      the
                      species (somewhat) together, so it is the whole
                      population
                      that searches, not each individual species (obviously
                      that's a relative thing, it's not 100% one way or the
                      other).

                      > It becomes an
                      > interesting question with NEAT since NEAT can
                      support a
                      > robust range of complexities, which is different
                      from
                      > GP where the population tends to bump itself up
                      together
                      > because of the free-for-all with no speciation.
                      And if
                      > the complexity range is really a key factor, then
                      > methods that attempt to limit bloat may be taking an
                      > overly simplified view because they look at the goal

                      > with respect to a single value (i.e. max or average
                      > complexity) as opposed to a vector of values (the
                      whole
                      > set of complexities). Therefore they be may
                      ignoring
                      > the more significant methodological question of how
                      to
                      > keep a healthy range, or how rapidly to increase
                      that
                      > range in the face of stagnation.

                      Again I think you are talking about a "reasonable
                      complexity range" which we need, I completely agree
                      with
                      you on that.

                      "Bloat" is different, bloat is the "equilibrium at
                      infinity" which drives to larger and larger genomes.

                      You can maintain a reasonable complexity range, even
                      in
                      the presence of bloat, but it is harder work and you
                      have
                      less control over the fine-turning.

                      > So what is the right idea? If you aren't close to a
                      > solution, where do you want complexities to go?

                      I think that in that situation you don'e, and
                      cannot
                      know, so you have to try a range of things, but you
                      cannot
                      try them all at once.

                      > ken

                      Ian

                      p.s. you can tell it's a good discussion when it takes
                      two days to type the reply.




                      ___________________________________________________________
                      Win a BlackBerry device from O2 with Yahoo!. Enter now. http://www.yahoo.co.uk/blackberry
                    • Colin Green
                      Hi All, I wanted to just pick up this thread again because I think we were broaching on some fairly fundamental stuff, specifically Kens query about what the
                      Message 10 of 24 , Mar 21, 2006
                        Hi All,

                        I wanted to just pick up this thread again because I think we were
                        broaching on some fairly fundamental stuff, specifically Kens query
                        about what the optimum distribution of genomes might be in a search.
                        Also I'm aware that some emails were left hanging with no response, so
                        hopefully this will address some questions (though not necessarily
                        answer them!), albeit perhaps not in the way you might have been
                        expecting. Let me try and explain my train of thinking by going back to
                        basics.

                        The traditional evolutionary search uses a population of genomes, each
                        of which represents a point in the search space. Thus the search 'front'
                        contains multiple points at any point in time (generation), but how did
                        we arrive at this type of scheme and why is it so popular? Why not just
                        have one single search point and walk it around the search space - after
                        all most of us haven't progressed to distributed/parallel multi-CPU
                        projects yet so why not just search one point at a time?

                        A few immediate/obvious answers come to mind:
                        1) We would get stuck in a local maxima.
                        2) Recombination wouldn't be possible. Recombination basically being a
                        method of generating new promising search points that are some way
                        distant from the current search front - more promising than any
                        similarly distant /random/ point we could come up with (probably).
                        3) A population allows us to sample several points at a time around a
                        promising area.

                        Of course, without something like speciation the population will tend to
                        be dominated by offspring from a champ and the search front will
                        probably get stuck in a local maxima. This is where speciation comes in,
                        ensuring that the population doesn't cluster around one point. But I
                        wonder now if we haven't just replaced one problem with another similar
                        problem, which is that our search front will now be a bunch of clusters
                        around a maxima that never move on, instead of just one cluster.

                        Species culling was meant to tackle this, but I question the
                        effectiveness of this approach, e.g. it's very likely that species
                        themselves may cluster around a broad enough maxima, and so culling a
                        species will just lead to existing species redistributing themselves
                        over the newly vacated search area and a new species popping up around
                        the same cluster of species (since it spawned of one of them). There may
                        be cases where species are able to break away from clusters and walk the
                        search space effectively, but I suspect that this clustering phenomenon
                        remains a problem in the general case.

                        Enter stage right the idea of the tabu search. [I'm not sure if this is
                        strictly the correct term, but I'll use it for now for want of a better
                        term]. The tabu search introduces the idea of a memory of where in the
                        search space we have already visited, so that we may avoid these areas
                        and progress the search forward. This means that the natural clustering
                        of the search front will still occur, but will be moved on as the tabu
                        mechanism is applied. Previously I suggested ring fencing areas that had
                        previously been searched, although perhaps a better idea is to use a
                        sort of density map of areas that have been searched. The areas of high
                        density have a weighting assigned to them to limit further searching in
                        those areas, so if surrounding areas begin to match that weighting then
                        the old searched areas may actually be revisted - becasue their relative
                        weight/density is now lower than the surrounding areas.

                        There is the problem of how you build and maintain such a density map,
                        but I choose to put that aside for the moment and concentrate of the
                        concept rather than get bogged down in implementation. Let's assume we
                        could maintain a search density map without too much CPU and memory
                        overhead, well then couldn't we just discard the population based models
                        and speciation and revert to a simple single point search front? The
                        point would naturally move away from well searched areas as it navigates
                        away from high denisty areas on the map. In fact are population based
                        searches and techniques such as speciation just ways of approximating
                        just such a search without the overhead of maintaining a map? - which I
                        admit may be impractical to implement.

                        This does leave the loose end of recombination - how would we jump to
                        new promising search points if we just had one point to start from? Our
                        single point search wouldn't work very well if it only ever moved to
                        adjacent areas in the search space, it would take an age to move to
                        somewhere in a completely different part of the space. Whereas a
                        population based search (potentially) samples multiple points
                        distributed throughout the search space (when it isn't clustering
                        anyhow). Well we could define some mechanism for 'jumping' the single
                        point around the search space, well away from previously searched areas.
                        Ian is doing something like this I think - e.g. by creating fairly
                        extreme mutants - although I could be remembering wrong there. OR we
                        could recomine previous disparate points in the historical search path,
                        OR perhaps there is a case for a hybrid approach of a population based
                        search combined with a density map - thus maintaining something
                        analagous to a gene pool, with it's associated diversity of solutions.

                        The practicalities of a denisty map are potentially a problem because
                        you have such a large area to map, but we could use a sort of limited
                        density map where the old areas in the map get removed to keep the map
                        from growing out of control - which is sort of how traditional tabu
                        search works I think. That is, if you avoid all of the recentrly
                        searched areas then you can forget about avoiding the old search areas,
                        because you are now so far away you are unlikely to find your way back.

                        Colin
                      • Ian Badcoe
                        Hi Colin, I m listening, even if nobody else is :-p I hear what you are saying, but I am not so sure about your conclusions. Personally, I think that with the
                        Message 11 of 24 , Mar 27, 2006
                          Hi Colin,

                          I'm listening, even if nobody else is :-p

                          I hear what you are saying, but I am not so sure about your conclusions.

                          Personally, I think that with the sort of "difficult" problems we
                          have advanced to these days, the corssover from the population(s) is
                          far more significant than any ability of a population to avoid local
                          minima. Because a local minimum can easily now be large enough to
                          swallow a whole population.

                          Crossover gives you parallel exploration of loosely linked parts of
                          the problem. e.g. if one is trying to evolve a "splat nerdler" then
                          crossover provides a way to combine "splat-47", "splat-48" and
                          "splat-26385" with "nerdler-A", "nerdler-B" and
                          "nerdler-XP-professional" without having to evolve all 9 combinations
                          individually.

                          OTOH we may need fully functional modularity before we can see the
                          real power of that...

                          Local minima may be less of a problem than we think,
                          however. Intuitively, people think of the fitness landscape as just
                          that, a landscape. With all of 2.5 dimensions. Typical fitness
                          spaces are 10, 100, or even 10,000 dimensional. And it seems to me
                          that one feature of these highly dimensional spaces is that there is
                          much less chance of there actually being a minimum than in our
                          dimnsionally-challenged world.

                          Consider, to be a minimum, a point must be up-hill in _every_
                          direction you can step away from it. Lets say that in 2D there are 8
                          directions you can step away from a point, so with simple stats we
                          could say that the chance of any point on the surface being a minimum
                          is 1 in 256...

                          In 3D there are 26 ways to step away from a point means chance of
                          minimum is 1 in 2^26 = 1 in 67,000,000-ish

                          In 10 D, there are 2 ^ 59048 ways to step away from a point. And my
                          windows calculator confidently assert that that leaves the chance at
                          1 in 1.6564714947473066827641241827646e+17775 (dunno whether to
                          believe it be the take home message is "1 in big").

                          And that's before even considering crossover, whose whole effect is
                          to increase the number of ways of leaving a point, so...

                          What we begin to see here is that the numbers involved in these
                          searches are so vast, that it is quite possible to imagine that a
                          population could pause for ages in an apparent local minimum, when
                          there actually is a path of improvements away from its current
                          location, it just takes a very long time before any mutations hit
                          that path, because the path is something silly like 1 / 2^1000 of the
                          total possible local points...

                          Human intuition fails here, because we cannot imagine standing in a
                          landscape where there is so much sheer "angle" around us that we
                          cannot merely cast an eye over it and immediately see which way is
                          down (up... , better).

                          But in higher dimensions that is exactly the sort of thing that can
                          happen. There can be as much space _adjoining_ one point as we would
                          usually think of as finding on a whole landscape...

                          Food for thought.

                          Ian

                          In 15 minutes everybody will be in the future.
                        • Ian Badcoe
                          Hi, This seems to be have been running quite happily for over a week now. Colin - I m just putting the finishing touches to the system tray version, RL got in
                          Message 12 of 24 , Apr 3, 2006
                            Hi,
                            This seems to be have been running quite happily
                            for over a week now.

                            Colin - I'm just putting the finishing touches to
                            the system tray version, RL got in the way a bit, and
                            they I decided I had to refactor the job code into a
                            litbrary so that the two separate exe's could share
                            it...

                            But it is nearly done now.

                            Ian



                            ___________________________________________________________
                            To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com
                          • Colin Green
                            Righto, Could you not refactor RL into a seperate library as well and archive it off somewhere? ;) Have been running the World Community Grid agent to discover
                            Message 13 of 24 , Apr 3, 2006
                              Righto,

                              Could you not refactor RL into a seperate library as well and archive it
                              off somewhere? ;)
                              Have been running the World Community Grid agent to discover AID's
                              therapies in the mean time. Fascinating stuff this distributed computing
                              lark. Perhaps one day we could submit a proposal for a NEAT based search.

                              Colin.

                              P.S. New SharpNEAT release now available with fancy new progress graphs.
                            • Ian Badcoe
                              ... As a friend of mine had stuck to the side of his monitor: I haven t lost my mind, it s backed up on tape somewhere... ... Who to? Is there some sort of
                              Message 14 of 24 , Apr 3, 2006
                                At 19:56 03/04/2006, you wrote:
                                >Righto,
                                >
                                >Could you not refactor RL into a seperate library as well and archive it
                                >off somewhere? ;)

                                As a friend of mine had stuck to the side of his monitor:

                                I haven't lost my mind, it's backed up on tape somewhere...

                                >Have been running the World Community Grid agent to discover AID's
                                >therapies in the mean time. Fascinating stuff this distributed computing
                                >lark. Perhaps one day we could submit a proposal for a NEAT based search.

                                Who to? Is there some sort of grid-computing authority now-a-days?

                                OK, I'm putting the screen-saver and stand alone system-tray program
                                in a zip file on the group's files section in the SLEAT folder.

                                Just unzip it somewhere handy, read the readme and then mail me with
                                plaintive "what do you mean" messages...

                                Ian

                                In 15 minutes everybody will be in the future.
                              • Kenneth Stanley
                                Ian, I m not sure if I m doing something wrong but the screensaver for me is just a blank screen? Is there anything you know that might be causing this? ken
                                Message 15 of 24 , Apr 8, 2006
                                  Ian, I'm not sure if I'm doing something wrong but the screensaver
                                  for me is just a blank screen? Is there anything you know that
                                  might be causing this?

                                  ken

                                  --- In neat@yahoogroups.com, Ian Badcoe <ian_badcoe@...> wrote:
                                  >
                                  > Hi,
                                  > This seems to be have been running quite happily
                                  > for over a week now.
                                  >
                                  > Colin - I'm just putting the finishing touches to
                                  > the system tray version, RL got in the way a bit, and
                                  > they I decided I had to refactor the job code into a
                                  > litbrary so that the two separate exe's could share
                                  > it...
                                  >
                                  > But it is nearly done now.
                                  >
                                  > Ian
                                  >
                                  >
                                  >
                                  > ___________________________________________________________
                                  > To help you stay safe and secure online, we've developed the all
                                  new Yahoo! Security Centre. http://uk.security.yahoo.com
                                  >
                                • Ian Badcoe
                                  Hi Ken I didn t realise that anybody except Colin was going to try it! ... Erm, no, I can t think what that might do that. It s only the simplest thing using
                                  Message 16 of 24 , Apr 9, 2006
                                    Hi Ken I didn't realise that anybody except Colin was going to try it!

                                    >Ian, I'm not sure if I'm doing something wrong but the screensaver
                                    >for me is just a blank screen? Is there anything you know that
                                    >might be causing this?

                                    Erm, no, I can't think what that might do that. It's only the
                                    simplest thing using some crude GDI graphics to display the number of
                                    jobs it has and a simple scrolling graph of best-fitness,
                                    average-fitness and average-size.

                                    Are you sure you set up the working directory correctly? I think
                                    there would be no graphics if there were no jobs at all... You can
                                    easily tell because it will have created the "todo", "current_SS" and
                                    "done" directories in it.

                                    There is a log file (master.log or master_SS.log, depending whether
                                    you took it before or after I updated it) which may reveal something...

                                    Was it using CPU? You can tell because if you have the task-manager
                                    "performance" tab open when the screensaver starts, you can look at
                                    the history graph when the screensaver closes...

                                    Otherwise, what OS are you on?

                                    --

                                    Does this mean you are considering giving it some serious CPU? Or
                                    were you just looking. If you are going to give it some CPU then
                                    I'll send you some job files ;-)

                                    I was considering better graphics, e.g. a continuous shown TTT game
                                    between the current champion and some sort of
                                    semi-random/semi-perfect player, but since I only got the one
                                    volunteer and he isn't using the graphical version...

                                    Ian


                                    In 15 minutes everybody will be in the future.
                                  • Kenneth Stanley
                                    Hi Ian, I was just checking it out to see what it does. It did create the files. Unfortunately I don t have time now to really check into what s going on,
                                    Message 17 of 24 , Apr 9, 2006
                                      Hi Ian, I was just checking it out to see what it does. It did
                                      create the files. Unfortunately I don't have time now to really
                                      check into what's going on, but here's the master log:

                                      Sat Apr 08 17:21:31 2006 : Initialised
                                      Sat Apr 08 17:21:31 2006 : no more work to do
                                      Sat Apr 08 17:21:33 2006 : stopping
                                      Sat Apr 08 17:21:33 2006 : Initialised
                                      Sat Apr 08 17:21:33 2006 : Initialised
                                      Sat Apr 08 17:21:33 2006 : configuring
                                      Sat Apr 08 17:21:36 2006 : example parameters dumped to: C:\NEAT
                                      versions\Ttt_3\default.prm
                                      Sat Apr 08 17:21:39 2006 : path set to: C:\NEAT versions\Ttt_3
                                      Sat Apr 08 17:21:39 2006 : ending configuration
                                      Sat Apr 08 17:21:39 2006 : Initialised
                                      Sat Apr 08 17:21:39 2006 : no more work to do
                                      Sat Apr 08 17:21:49 2006 : stopping
                                      Sat Apr 08 17:21:49 2006 : Initialised
                                      Sat Apr 08 17:21:49 2006 : Initialised
                                      Sat Apr 08 17:21:49 2006 : no more work to do
                                      Sat Apr 08 17:21:57 2006 : stopping
                                      Sat Apr 08 17:21:58 2006 : Initialised
                                      Sat Apr 08 17:21:58 2006 : Initialised
                                      Sat Apr 08 17:21:58 2006 : no more work to do
                                      Sat Apr 08 17:22:09 2006 : stopping
                                      Sat Apr 08 17:22:09 2006 : Initialised

                                      ken

                                      --- In neat@yahoogroups.com, Ian Badcoe <ian_badcoe@...> wrote:
                                      >
                                      > Hi Ken I didn't realise that anybody except Colin was going to try
                                      it!
                                      >
                                      > >Ian, I'm not sure if I'm doing something wrong but the screensaver
                                      > >for me is just a blank screen? Is there anything you know that
                                      > >might be causing this?
                                      >
                                      > Erm, no, I can't think what that might do that. It's only the
                                      > simplest thing using some crude GDI graphics to display the number
                                      of
                                      > jobs it has and a simple scrolling graph of best-fitness,
                                      > average-fitness and average-size.
                                      >
                                      > Are you sure you set up the working directory correctly? I think
                                      > there would be no graphics if there were no jobs at all... You
                                      can
                                      > easily tell because it will have created the "todo", "current_SS"
                                      and
                                      > "done" directories in it.
                                      >
                                      > There is a log file (master.log or master_SS.log, depending
                                      whether
                                      > you took it before or after I updated it) which may reveal
                                      something...
                                      >
                                      > Was it using CPU? You can tell because if you have the task-
                                      manager
                                      > "performance" tab open when the screensaver starts, you can look
                                      at
                                      > the history graph when the screensaver closes...
                                      >
                                      > Otherwise, what OS are you on?
                                      >
                                      > --
                                      >
                                      > Does this mean you are considering giving it some serious CPU? Or
                                      > were you just looking. If you are going to give it some CPU then
                                      > I'll send you some job files ;-)
                                      >
                                      > I was considering better graphics, e.g. a continuous shown TTT
                                      game
                                      > between the current champion and some sort of
                                      > semi-random/semi-perfect player, but since I only got the one
                                      > volunteer and he isn't using the graphical version...
                                      >
                                      > Ian
                                      >
                                      >
                                      > In 15 minutes everybody will be in the future.
                                      >
                                    • Ian Badcoe
                                      It looks like you didn t copy the job into the todo folder, so the program has nothing to do... Ian ...
                                      Message 18 of 24 , Apr 10, 2006
                                        It looks like you didn't copy the job into the "todo"
                                        folder, so the program has nothing to do...

                                        Ian

                                        --- Kenneth Stanley <kstanley@...> wrote:

                                        > Hi Ian, I was just checking it out to see what it
                                        > does. It did
                                        > create the files. Unfortunately I don't have time
                                        > now to really
                                        > check into what's going on, but here's the master
                                        > log:
                                        >
                                        > Sat Apr 08 17:21:31 2006 : Initialised
                                        > Sat Apr 08 17:21:31 2006 : no more work to do
                                        > Sat Apr 08 17:21:33 2006 : stopping
                                        > Sat Apr 08 17:21:33 2006 : Initialised
                                        > Sat Apr 08 17:21:33 2006 : Initialised
                                        > Sat Apr 08 17:21:33 2006 : configuring
                                        > Sat Apr 08 17:21:36 2006 : example parameters dumped
                                        > to: C:\NEAT
                                        > versions\Ttt_3\default.prm
                                        > Sat Apr 08 17:21:39 2006 : path set to: C:\NEAT
                                        > versions\Ttt_3
                                        > Sat Apr 08 17:21:39 2006 : ending configuration
                                        > Sat Apr 08 17:21:39 2006 : Initialised
                                        > Sat Apr 08 17:21:39 2006 : no more work to do
                                        > Sat Apr 08 17:21:49 2006 : stopping
                                        > Sat Apr 08 17:21:49 2006 : Initialised
                                        > Sat Apr 08 17:21:49 2006 : Initialised
                                        > Sat Apr 08 17:21:49 2006 : no more work to do
                                        > Sat Apr 08 17:21:57 2006 : stopping
                                        > Sat Apr 08 17:21:58 2006 : Initialised
                                        > Sat Apr 08 17:21:58 2006 : Initialised
                                        > Sat Apr 08 17:21:58 2006 : no more work to do
                                        > Sat Apr 08 17:22:09 2006 : stopping
                                        > Sat Apr 08 17:22:09 2006 : Initialised
                                        >
                                        > ken
                                        >
                                        > --- In neat@yahoogroups.com, Ian Badcoe
                                        > <ian_badcoe@...> wrote:
                                        > >
                                        > > Hi Ken I didn't realise that anybody except Colin
                                        > was going to try
                                        > it!
                                        > >
                                        > > >Ian, I'm not sure if I'm doing something wrong
                                        > but the screensaver
                                        > > >for me is just a blank screen? Is there anything
                                        > you know that
                                        > > >might be causing this?
                                        > >
                                        > > Erm, no, I can't think what that might do that.
                                        > It's only the
                                        > > simplest thing using some crude GDI graphics to
                                        > display the number
                                        > of
                                        > > jobs it has and a simple scrolling graph of
                                        > best-fitness,
                                        > > average-fitness and average-size.
                                        > >
                                        > > Are you sure you set up the working directory
                                        > correctly? I think
                                        > > there would be no graphics if there were no jobs
                                        > at all... You
                                        > can
                                        > > easily tell because it will have created the
                                        > "todo", "current_SS"
                                        > and
                                        > > "done" directories in it.
                                        > >
                                        > > There is a log file (master.log or master_SS.log,
                                        > depending
                                        > whether
                                        > > you took it before or after I updated it) which
                                        > may reveal
                                        > something...
                                        > >
                                        > > Was it using CPU? You can tell because if you
                                        > have the task-
                                        > manager
                                        > > "performance" tab open when the screensaver
                                        > starts, you can look
                                        > at
                                        > > the history graph when the screensaver closes...
                                        > >
                                        > > Otherwise, what OS are you on?
                                        > >
                                        > > --
                                        > >
                                        > > Does this mean you are considering giving it some
                                        > serious CPU? Or
                                        > > were you just looking. If you are going to give
                                        > it some CPU then
                                        > > I'll send you some job files ;-)
                                        > >
                                        > > I was considering better graphics, e.g. a
                                        > continuous shown TTT
                                        > game
                                        > > between the current champion and some sort of
                                        > > semi-random/semi-perfect player, but since I only
                                        > got the one
                                        > > volunteer and he isn't using the graphical
                                        > version...
                                        > >
                                        > > Ian
                                        > >
                                        > >
                                        > > In 15 minutes everybody will be in the future.
                                        > >
                                        >
                                        >
                                        >
                                        >
                                        >
                                        >
                                        >
                                        > Yahoo! Groups Links
                                        >
                                        >
                                        > neat-unsubscribe@yahoogroups.com
                                        >
                                        >
                                        >
                                        >
                                        >
                                        >




                                        ___________________________________________________________
                                        Win a BlackBerry device from O2 with Yahoo!. Enter now. http://www.yahoo.co.uk/blackberry
                                      Your message has been successfully submitted and would be delivered to recipients shortly.