Loading ...
Sorry, an error occurred while loading the content.

Re: Combining evolution with evolution (request for references)

Expand Messages
  • Kenneth Stanley
    Julian, thanks for sharing your recent work. I hadn t been aware of that paper and it is indeed an interesting study. I m guessing that it probably has not
    Message 1 of 14 , Apr 1, 2008
    • 0 Attachment
      Julian, thanks for sharing your recent work. I hadn't been aware of
      that paper and it is indeed an interesting study.

      I'm guessing that it probably has not been done before exactly
      because it is so simple. In particular, running evolution with a
      population of one is not very popular these days, and so people
      probably don't even bother considering it.

      Of course, as your paper discusses, it then leads to the question of
      how you might expand it into a population-based approach, and a lot
      of the existing methods might embody ways that could be done to
      various degrees. NEAT seems to be a possible such expansion in the
      sense that it has multiple species of topologies and roughly evolves
      weights and topologies on two different time scales with its usual
      parameter settings.

      In any case, your paper an elegant demonstration of how much you can
      learn from a simple concept.

      One question I have is what is meant when you mention things
      like "all connections on?" It sounds like the topology is chosen
      from some finite constrained set of topologies. Is that true? Or
      is any potential topology possible? It makes me wonder whether the
      connection set from which you are choosing somehow biases the result
      to radically favor the different time scales, though I'm not sure
      how.

      In general, my intuitive explanation of the phenomenon is that if
      you stick with only a single topology it is likely to be the wrong
      one, so you need to look at several to see if they have potential.
      Then you ensure steady progress.

      ken


      --- In neat@yahoogroups.com, "Julian Togelius" <julian@...> wrote:
      >
      > Hi all,
      >
      > I'd like to connect to the recent discussion in this group on
      > combining neuroevolution with backpropagation, and use it as a
      > convenient excuse to expose you to a recent paper of mine. In
      > particular, I'd like to ask you if you know who did this before we
      > (me, Faustino Gomez, Juergen Schmidhuber) did. Because the
      algorithm
      > we are presenting in this paper is so simple someone must have
      come up
      > with it before, it's just that our literature search has been
      > fruitless.
      >
      > The basic idea is to combine hillclimbing (1+1 evolution
      strategy) of
      > network topologies, with hillclimbing of network weights, but at a
      > faster timescale. So after each ("global") topology mutation, we
      > search ("local") weight space for a number of steps (e.g. 50), and
      > only accepts the global mutation if the fitness after local
      search is
      > higher than before the global mutation. The reason we do this is
      that
      > global mutations are so often disruptive. Yes, this is the same
      idea
      > as "innovation protection" in NEAT, but here we're boiling it
      down a
      > bare minimum.
      >
      > We tried two versions of a moderately difficult control task: one
      with
      > a set of 7 carefully selected inputs to the neural network, and
      one
      > with 15 not-so-carefully-selected inputs to the network. We found
      > that, when compared to simple hillclimbing in weight space, or to
      > climbing weights and topologies at the same time, performed
      slightly
      > better on the small-input version of the problem problem. On the
      > big-input version, the difference was dramatic, in that using
      > different timescales was really the only way to solve the problem!
      > (Incidentally, learning topology at the "local" level and weights
      at
      > the "global" level worked just as well.)
      >
      > The paper is here:
      > http://julian.togelius.com/Togelius2008Learning.pdf
      >
      > Again, this is so obvious that someone must have done this
      before. The
      > literature search only pointed us to attempts to combine topology
      > evolution with backpropagation, which is related but not the same
      > thing. We submitted the paper to WCCI (and had it accepted) in the
      > hope that the reviewers would know, but that they didn't. So now I
      > wonder if anyone of you might know. It would be good to know about
      > this before go on with further experiments...
      >
      > Grateful for any suggestions
      > Julian
      > --
      > Julian Togelius
      > IDSIA
      > Galleria 2
      > 6928 Manno-Lugano
      > Switzerland
      > julian@...
      > http://julian.togelius.com
      > http://www.idsia.ch/~togelius
      > +41-764-110679
      > +46-705-192088
      >
    • petar_chervenski
      Well actually the NEAT parameters may be variable. For example stagnating species can increase their weight mutation rate (as a last chance to discover a good
      Message 2 of 14 , Apr 2, 2008
      • 0 Attachment
        Well actually the NEAT parameters may be variable. For example
        stagnating species can increase their weight mutation rate (as a last
        chance to discover a good point in the search space, given the
        topology). Or maybe when a species is close to a solution, its weight
        mutation power can be lowered so that it will search the local search
        space better..

        --- In neat@yahoogroups.com, "Kenneth Stanley" <kstanley@...> wrote:
        >
        > Julian, thanks for sharing your recent work. I hadn't been aware
        of
        > that paper and it is indeed an interesting study.
        >
        > I'm guessing that it probably has not been done before exactly
        > because it is so simple. In particular, running evolution with a
        > population of one is not very popular these days, and so people
        > probably don't even bother considering it.
        >
        > Of course, as your paper discusses, it then leads to the question
        of
        > how you might expand it into a population-based approach, and a lot
        > of the existing methods might embody ways that could be done to
        > various degrees. NEAT seems to be a possible such expansion in the
        > sense that it has multiple species of topologies and roughly
        evolves
        > weights and topologies on two different time scales with its usual
        > parameter settings.
        >
        > In any case, your paper an elegant demonstration of how much you
        can
        > learn from a simple concept.
        >
        > One question I have is what is meant when you mention things
        > like "all connections on?" It sounds like the topology is chosen
        > from some finite constrained set of topologies. Is that true? Or
        > is any potential topology possible? It makes me wonder whether the
        > connection set from which you are choosing somehow biases the
        result
        > to radically favor the different time scales, though I'm not sure
        > how.
        >
        > In general, my intuitive explanation of the phenomenon is that if
        > you stick with only a single topology it is likely to be the wrong
        > one, so you need to look at several to see if they have potential.
        > Then you ensure steady progress.
        >
        > ken
        >
        >
        > --- In neat@yahoogroups.com, "Julian Togelius" <julian@> wrote:
        > >
        > > Hi all,
        > >
        > > I'd like to connect to the recent discussion in this group on
        > > combining neuroevolution with backpropagation, and use it as a
        > > convenient excuse to expose you to a recent paper of mine. In
        > > particular, I'd like to ask you if you know who did this before
        we
        > > (me, Faustino Gomez, Juergen Schmidhuber) did. Because the
        > algorithm
        > > we are presenting in this paper is so simple someone must have
        > come up
        > > with it before, it's just that our literature search has been
        > > fruitless.
        > >
        > > The basic idea is to combine hillclimbing (1+1 evolution
        > strategy) of
        > > network topologies, with hillclimbing of network weights, but at
        a
        > > faster timescale. So after each ("global") topology mutation, we
        > > search ("local") weight space for a number of steps (e.g. 50),
        and
        > > only accepts the global mutation if the fitness after local
        > search is
        > > higher than before the global mutation. The reason we do this is
        > that
        > > global mutations are so often disruptive. Yes, this is the same
        > idea
        > > as "innovation protection" in NEAT, but here we're boiling it
        > down a
        > > bare minimum.
        > >
        > > We tried two versions of a moderately difficult control task:
        one
        > with
        > > a set of 7 carefully selected inputs to the neural network, and
        > one
        > > with 15 not-so-carefully-selected inputs to the network. We found
        > > that, when compared to simple hillclimbing in weight space, or to
        > > climbing weights and topologies at the same time, performed
        > slightly
        > > better on the small-input version of the problem problem. On the
        > > big-input version, the difference was dramatic, in that using
        > > different timescales was really the only way to solve the
        problem!
        > > (Incidentally, learning topology at the "local" level and
        weights
        > at
        > > the "global" level worked just as well.)
        > >
        > > The paper is here:
        > > http://julian.togelius.com/Togelius2008Learning.pdf
        > >
        > > Again, this is so obvious that someone must have done this
        > before. The
        > > literature search only pointed us to attempts to combine topology
        > > evolution with backpropagation, which is related but not the same
        > > thing. We submitted the paper to WCCI (and had it accepted) in
        the
        > > hope that the reviewers would know, but that they didn't. So now
        I
        > > wonder if anyone of you might know. It would be good to know
        about
        > > this before go on with further experiments...
        > >
        > > Grateful for any suggestions
        > > Julian
        > > --
        > > Julian Togelius
        > > IDSIA
        > > Galleria 2
        > > 6928 Manno-Lugano
        > > Switzerland
        > > julian@
        > > http://julian.togelius.com
        > > http://www.idsia.ch/~togelius
        > > +41-764-110679
        > > +46-705-192088
        > >
        >
      • Julian Togelius
        Thanks for your comments, JT and Ken! JT: Yes, NEAT contains a similar idea, as part of a rather complex algorithm. We re trying to boil it down to the bare
        Message 3 of 14 , Apr 2, 2008
        • 0 Attachment
          Thanks for your comments, JT and Ken!

          JT: Yes, NEAT contains a similar idea, as part of a rather complex algorithm. We're trying to boil it down to the bare bones though, and analyze what effect using different time scales for weights and topologies ("memetic learning") might have in itself. Some big differences to NEAT are that the memetic climber has a "population" of one individual, and can be implemented in a few lines of code. NB that I am sure NEAT is the better overall neuroevolution algorithm, right now we are not trying to compete in that respect...

          Ken: You're right that people don't even bother testing hill-climbers, which I think is a shame. They do get stuck quite a lot, but for some problems (e.g. evolving DFAs) a random-restart hill-climber reaches the optimum faster than most population-based algorithms. (Of, course for problems with many local optima population-based EAs are typically preferable.)

          The current topology representation is simply an array of bits that decide which of the connections in a standard MLP are to be used, and which are to be ignored, when propagating activations. I'd certainly like to try the algorithm with a more open-ended topology representation at some point, but for now we're focused on keeping the algorithm as simple as possible, so as to best explore the basic idea of interleaving global and local search to alleviate the destructiveness of topology mutations.

          There are some very simple extensions of the basic principle to population-based search we are currently trying out; they involve just replacing the hill-climbing in weight space with a population-based algorithm, such as an evolution strategy. Actually, the local search algorithm could be almost any global continuous optimization algorithm - not necessarily an evolutionary one. There seems to be quite a few people who have tried combining topology search with backpropagation, but oddly enough nobody seems to have tried any optimization algorithm, not even the humble hill-climber...

          Julian

          On 02/04/2008, Kenneth Stanley <kstanley@...> wrote:

          Julian, thanks for sharing your recent work. I hadn't been aware of
          that paper and it is indeed an interesting study.

          I'm guessing that it probably has not been done before exactly
          because it is so simple. In particular, running evolution with a
          population of one is not very popular these days, and so people
          probably don't even bother considering it.

          Of course, as your paper discusses, it then leads to the question of
          how you might expand it into a population-based approach, and a lot
          of the existing methods might embody ways that could be done to
          various degrees. NEAT seems to be a possible such expansion in the
          sense that it has multiple species of topologies and roughly evolves
          weights and topologies on two different time scales with its usual
          parameter settings.

          In any case, your paper an elegant demonstration of how much you can
          learn from a simple concept.

          One question I have is what is meant when you mention things
          like "all connections on?" It sounds like the topology is chosen
          from some finite constrained set of topologies. Is that true? Or
          is any potential topology possible? It makes me wonder whether the
          connection set from which you are choosing somehow biases the result
          to radically favor the different time scales, though I'm not sure
          how.

          In general, my intuitive explanation of the phenomenon is that if
          you stick with only a single topology it is likely to be the wrong
          one, so you need to look at several to see if they have potential.
          Then you ensure steady progress.

          ken



          --- In neat@yahoogroups.com, "Julian Togelius" <julian@...> wrote:
          >
          > Hi all,
          >
          > I'd like to connect to the recent discussion in this group on
          > combining neuroevolution with backpropagation, and use it as a
          > convenient excuse to expose you to a recent paper of mine. In
          > particular, I'd like to ask you if you know who did this before we
          > (me, Faustino Gomez, Juergen Schmidhuber) did. Because the
          algorithm
          > we are presenting in this paper is so simple someone must have
          come up
          > with it before, it's just that our literature search has been
          > fruitless.
          >
          > The basic idea is to combine hillclimbing (1+1 evolution
          strategy) of
          > network topologies, with hillclimbing of network weights, but at a
          > faster timescale. So after each ("global") topology mutation, we
          > search ("local") weight space for a number of steps (e.g. 50), and
          > only accepts the global mutation if the fitness after local
          search is
          > higher than before the global mutation. The reason we do this is
          that
          > global mutations are so often disruptive. Yes, this is the same
          idea
          > as "innovation protection" in NEAT, but here we're boiling it
          down a
          > bare minimum.
          >
          > We tried two versions of a moderately difficult control task: one
          with
          > a set of 7 carefully selected inputs to the neural network, and
          one
          > with 15 not-so-carefully-selected inputs to the network. We found
          > that, when compared to simple hillclimbing in weight space, or to
          > climbing weights and topologies at the same time, performed
          slightly
          > better on the small-input version of the problem problem. On the
          > big-input version, the difference was dramatic, in that using
          > different timescales was really the only way to solve the problem!
          > (Incidentally, learning topology at the "local" level and weights
          at
          > the "global" level worked just as well.)
          >
          > The paper is here:
          > http://julian.togelius.com/Togelius2008Learning.pdf
          >
          > Again, this is so obvious that someone must have done this
          before. The
          > literature search only pointed us to attempts to combine topology
          > evolution with backpropagation, which is related but not the same
          > thing. We submitted the paper to WCCI (and had it accepted) in the
          > hope that the reviewers would know, but that they didn't. So now I
          > wonder if anyone of you might know. It would be good to know about
          > this before go on with further experiments...
          >
          > Grateful for any suggestions
          > Julian
          > --
          > Julian Togelius
          > IDSIA
          > Galleria 2
          > 6928 Manno-Lugano
          > Switzerland
          > julian@...
          > http://julian.togelius.com
          > http://www.idsia.ch/~togelius
          > +41-764-110679
          > +46-705-192088
          >




          --
          Julian Togelius
          IDSIA
          Galleria 2
          6928 Manno-Lugano
          Switzerland
          julian@...
          http://julian.togelius.com
          http://www.idsia.ch/~togelius
          +41-764-110679
          +46-705-192088
        • Kenneth Stanley
          Julian, here are a couple other questions/comments: I was curious why you chose to call it memetic? Usually I think of memetic as meaning taking the result
          Message 4 of 14 , Apr 2, 2008
          • 0 Attachment
            Julian, here are a couple other questions/comments:

            I was curious why you chose to call it "memetic?" Usually I think
            of memetic as meaning taking the result of non-evolutionary learning
            and incorporating it back into the genome for further evolution. I
            can see how you can conceptually think of the two time scales as
            falling into such a frame work in an abstract sense, but do you
            worry that the term might be confusing since both timescales are in
            fact evolution? Or perhaps one could say that neither are evolution
            since neither really uses a population.

            About the topology search: I'd be interested in whether the kinds
            of topologies that are possible affect the impact of evolving on
            different timescales. For example, your current version chooses
            connections inside a standard MLP, and there may be something about
            the MLP architecture that makes it necessary or perhaps better to
            vary timescales (i.e. the MLP is a big summation of all its hidden
            node outputs). Imagine for example a cascade architecture where one
            thing depends entirely on another (as in cascade correlation). I
            wonder if in that case the results would come out different? Note
            that I really have no idea whether they would or not, but it is just
            something I am curious about.

            It's a thought-provoking work overall.

            ken

            --- In neat@yahoogroups.com, "Julian Togelius" <julian@...> wrote:
            >
            > Thanks for your comments, JT and Ken!
            >
            > JT: Yes, NEAT contains a similar idea, as part of a rather complex
            > algorithm. We're trying to boil it down to the bare bones though,
            and
            > analyze what effect using different time scales for weights and
            topologies
            > ("memetic learning") might have in itself. Some big differences to
            NEAT are
            > that the memetic climber has a "population" of one individual, and
            can be
            > implemented in a few lines of code. NB that I am sure NEAT is the
            better
            > overall neuroevolution algorithm, right now we are not trying to
            compete in
            > that respect...
            >
            > Ken: You're right that people don't even bother testing hill-
            climbers, which
            > I think is a shame. They do get stuck quite a lot, but for some
            problems
            > (e.g. evolving DFAs) a random-restart hill-climber reaches the
            optimum
            > faster than most population-based algorithms. (Of, course for
            problems with
            > many local optima population-based EAs are typically preferable.)
            >
            > The current topology representation is simply an array of bits
            that decide
            > which of the connections in a standard MLP are to be used, and
            which are to
            > be ignored, when propagating activations. I'd certainly like to
            try the
            > algorithm with a more open-ended topology representation at some
            point, but
            > for now we're focused on keeping the algorithm as simple as
            possible, so as
            > to best explore the basic idea of interleaving global and local
            search to
            > alleviate the destructiveness of topology mutations.
            >
            > There are some very simple extensions of the basic principle to
            > population-based search we are currently trying out; they involve
            just
            > replacing the hill-climbing in weight space with a population-based
            > algorithm, such as an evolution strategy. Actually, the local
            search
            > algorithm could be almost any global continuous optimization
            algorithm - not
            > necessarily an evolutionary one. There seems to be quite a few
            people who
            > have tried combining topology search with backpropagation, but
            oddly enough
            > nobody seems to have tried any optimization algorithm, not even
            the humble
            > hill-climber...
            >
            > Julian
            >
            > On 02/04/2008, Kenneth Stanley <kstanley@...> wrote:
            > >
            > > Julian, thanks for sharing your recent work. I hadn't been
            aware of
            > > that paper and it is indeed an interesting study.
            > >
            > > I'm guessing that it probably has not been done before exactly
            > > because it is so simple. In particular, running evolution with a
            > > population of one is not very popular these days, and so people
            > > probably don't even bother considering it.
            > >
            > > Of course, as your paper discusses, it then leads to the
            question of
            > > how you might expand it into a population-based approach, and a
            lot
            > > of the existing methods might embody ways that could be done to
            > > various degrees. NEAT seems to be a possible such expansion in
            the
            > > sense that it has multiple species of topologies and roughly
            evolves
            > > weights and topologies on two different time scales with its
            usual
            > > parameter settings.
            > >
            > > In any case, your paper an elegant demonstration of how much you
            can
            > > learn from a simple concept.
            > >
            > > One question I have is what is meant when you mention things
            > > like "all connections on?" It sounds like the topology is chosen
            > > from some finite constrained set of topologies. Is that true? Or
            > > is any potential topology possible? It makes me wonder whether
            the
            > > connection set from which you are choosing somehow biases the
            result
            > > to radically favor the different time scales, though I'm not sure
            > > how.
            > >
            > > In general, my intuitive explanation of the phenomenon is that if
            > > you stick with only a single topology it is likely to be the
            wrong
            > > one, so you need to look at several to see if they have
            potential.
            > > Then you ensure steady progress.
            > >
            > > ken
            > >
            > >
            > > --- In neat@yahoogroups.com <neat%40yahoogroups.com>, "Julian
            Togelius"
            > > <julian@> wrote:
            > > >
            > > > Hi all,
            > > >
            > > > I'd like to connect to the recent discussion in this group on
            > > > combining neuroevolution with backpropagation, and use it as a
            > > > convenient excuse to expose you to a recent paper of mine. In
            > > > particular, I'd like to ask you if you know who did this
            before we
            > > > (me, Faustino Gomez, Juergen Schmidhuber) did. Because the
            > > algorithm
            > > > we are presenting in this paper is so simple someone must have
            > > come up
            > > > with it before, it's just that our literature search has been
            > > > fruitless.
            > > >
            > > > The basic idea is to combine hillclimbing (1+1 evolution
            > > strategy) of
            > > > network topologies, with hillclimbing of network weights, but
            at a
            > > > faster timescale. So after each ("global") topology mutation,
            we
            > > > search ("local") weight space for a number of steps (e.g. 50),
            and
            > > > only accepts the global mutation if the fitness after local
            > > search is
            > > > higher than before the global mutation. The reason we do this
            is
            > > that
            > > > global mutations are so often disruptive. Yes, this is the same
            > > idea
            > > > as "innovation protection" in NEAT, but here we're boiling it
            > > down a
            > > > bare minimum.
            > > >
            > > > We tried two versions of a moderately difficult control task:
            one
            > > with
            > > > a set of 7 carefully selected inputs to the neural network, and
            > > one
            > > > with 15 not-so-carefully-selected inputs to the network. We
            found
            > > > that, when compared to simple hillclimbing in weight space, or
            to
            > > > climbing weights and topologies at the same time, performed
            > > slightly
            > > > better on the small-input version of the problem problem. On
            the
            > > > big-input version, the difference was dramatic, in that using
            > > > different timescales was really the only way to solve the
            problem!
            > > > (Incidentally, learning topology at the "local" level and
            weights
            > > at
            > > > the "global" level worked just as well.)
            > > >
            > > > The paper is here:
            > > > http://julian.togelius.com/Togelius2008Learning.pdf
            > > >
            > > > Again, this is so obvious that someone must have done this
            > > before. The
            > > > literature search only pointed us to attempts to combine
            topology
            > > > evolution with backpropagation, which is related but not the
            same
            > > > thing. We submitted the paper to WCCI (and had it accepted) in
            the
            > > > hope that the reviewers would know, but that they didn't. So
            now I
            > > > wonder if anyone of you might know. It would be good to know
            about
            > > > this before go on with further experiments...
            > > >
            > > > Grateful for any suggestions
            > > > Julian
            > > > --
            > > > Julian Togelius
            > > > IDSIA
            > > > Galleria 2
            > > > 6928 Manno-Lugano
            > > > Switzerland
            > > > julian@
            > > > http://julian.togelius.com
            > > > http://www.idsia.ch/~togelius <http://www.idsia.ch/%7Etogelius>
            > > > +41-764-110679
            > > > +46-705-192088
            > > >
            > >
            > >
            > >
            >
            >
            >
            > --
            > Julian Togelius
            > IDSIA
            > Galleria 2
            > 6928 Manno-Lugano
            > Switzerland
            > julian@...
            > http://julian.togelius.com
            > http://www.idsia.ch/~togelius <http://www.idsia.ch/%7Etogelius>
            > +41-764-110679
            > +46-705-192088
            >
          • Derek James
            Isn t this an awful lot like the cascade-correlation algorithm, in which hidden nodes are added incrementally and the network is retrained after the addition
            Message 5 of 14 , Apr 2, 2008
            • 0 Attachment
              Isn't this an awful lot like the cascade-correlation algorithm, in
              which hidden nodes are added incrementally and the network is
              retrained after the addition if each new hidden node with backprop?


              --- In neat@yahoogroups.com, "Julian Togelius" <julian@...> wrote:
              >
              > Hi all,
              >
              > I'd like to connect to the recent discussion in this group on
              > combining neuroevolution with backpropagation, and use it as a
              > convenient excuse to expose you to a recent paper of mine. In
              > particular, I'd like to ask you if you know who did this before we
              > (me, Faustino Gomez, Juergen Schmidhuber) did. Because the algorithm
              > we are presenting in this paper is so simple someone must have come up
              > with it before, it's just that our literature search has been
              > fruitless.
              >
              > The basic idea is to combine hillclimbing (1+1 evolution strategy) of
              > network topologies, with hillclimbing of network weights, but at a
              > faster timescale. So after each ("global") topology mutation, we
              > search ("local") weight space for a number of steps (e.g. 50), and
              > only accepts the global mutation if the fitness after local search is
              > higher than before the global mutation. The reason we do this is that
              > global mutations are so often disruptive. Yes, this is the same idea
              > as "innovation protection" in NEAT, but here we're boiling it down a
              > bare minimum.
              >
              > We tried two versions of a moderately difficult control task: one with
              > a set of 7 carefully selected inputs to the neural network, and one
              > with 15 not-so-carefully-selected inputs to the network. We found
              > that, when compared to simple hillclimbing in weight space, or to
              > climbing weights and topologies at the same time, performed slightly
              > better on the small-input version of the problem problem. On the
              > big-input version, the difference was dramatic, in that using
              > different timescales was really the only way to solve the problem!
              > (Incidentally, learning topology at the "local" level and weights at
              > the "global" level worked just as well.)
              >
              > The paper is here:
              > http://julian.togelius.com/Togelius2008Learning.pdf
              >
              > Again, this is so obvious that someone must have done this before. The
              > literature search only pointed us to attempts to combine topology
              > evolution with backpropagation, which is related but not the same
              > thing. We submitted the paper to WCCI (and had it accepted) in the
              > hope that the reviewers would know, but that they didn't. So now I
              > wonder if anyone of you might know. It would be good to know about
              > this before go on with further experiments...
              >
              > Grateful for any suggestions
              > Julian
              > --
              > Julian Togelius
              > IDSIA
              > Galleria 2
              > 6928 Manno-Lugano
              > Switzerland
              > julian@...
              > http://julian.togelius.com
              > http://www.idsia.ch/~togelius
              > +41-764-110679
              > +46-705-192088
              >
            • Julian Togelius
              Ken, It might be that word memetic is indeed misleading - we just didn t come up with a better name for it! Multiple timescale evolution doesn t really sound
              Message 6 of 14 , Apr 3, 2008
              • 0 Attachment
                Ken,

                It might be that word memetic is indeed misleading - we just didn't come up with a better name for it! "Multiple timescale evolution" doesn't really sound very catchy...

                The rather crude type of topology mutation we are using probably favors using different time scales, especially in comparison to something like your connection splitting mutation, which is designed to be as non-destructive as possible. On the other hand, connection removal mutations will always be destructive in the short term, no matter how you design them, and probably always necessary in order to handle problems where you have "deceptive inputs", i.e. inputs which cause search to get stuck in a local optimum. (I describe a few such cases in the beginning of the paper, but note also Igel's 2003 paper on CMA-ES where he finds that adding a bias term considerably decreases the efficacy of the algorithm. So even such an advanced neuroevolution algorithm suffers from the problem of not being able to ignore inputs.)

                As for the MLP architecture, I think this is not a deciding factor. Very many architectures can be represented as subsets of an MLP. (Though not the networks that come out of cascade correlation, which are very nonstandard in that they are as deep as they are broad.) Of course, it would be interesting to redo the experiments with something like a fully recurrent MLP as the base architecture. I could be wrong, in fact I'm pretty good at being wrong...

                Btw, I'm glad you find the paper interesting.

                Julian


                On 03/04/2008, Kenneth Stanley <kstanley@...> wrote:

                Julian, here are a couple other questions/comments:

                I was curious why you chose to call it "memetic?" Usually I think
                of memetic as meaning taking the result of non-evolutionary learning
                and incorporating it back into the genome for further evolution. I
                can see how you can conceptually think of the two time scales as
                falling into such a frame work in an abstract sense, but do you
                worry that the term might be confusing since both timescales are in
                fact evolution? Or perhaps one could say that neither are evolution
                since neither really uses a population.

                About the topology search: I'd be interested in whether the kinds
                of topologies that are possible affect the impact of evolving on
                different timescales. For example, your current version chooses
                connections inside a standard MLP, and there may be something about
                the MLP architecture that makes it necessary or perhaps better to
                vary timescales (i.e. the MLP is a big summation of all its hidden
                node outputs). Imagine for example a cascade architecture where one
                thing depends entirely on another (as in cascade correlation). I
                wonder if in that case the results would come out different? Note
                that I really have no idea whether they would or not, but it is just
                something I am curious about.

                It's a thought-provoking work overall.

                ken

                --- In neat@yahoogroups.com, "Julian Togelius" <julian@...> wrote:
                >

                > Thanks for your comments, JT and Ken!
                >
                > JT: Yes, NEAT contains a similar idea, as part of a rather complex
                > algorithm. We're trying to boil it down to the bare bones though,
                and
                > analyze what effect using different time scales for weights and
                topologies
                > ("memetic learning") might have in itself. Some big differences to
                NEAT are
                > that the memetic climber has a "population" of one individual, and
                can be
                > implemented in a few lines of code. NB that I am sure NEAT is the
                better
                > overall neuroevolution algorithm, right now we are not trying to
                compete in
                > that respect...
                >
                > Ken: You're right that people don't even bother testing hill-
                climbers, which
                > I think is a shame. They do get stuck quite a lot, but for some
                problems
                > (e.g. evolving DFAs) a random-restart hill-climber reaches the
                optimum
                > faster than most population-based algorithms. (Of, course for
                problems with
                > many local optima population-based EAs are typically preferable.)
                >
                > The current topology representation is simply an array of bits
                that decide
                > which of the connections in a standard MLP are to be used, and
                which are to
                > be ignored, when propagating activations. I'd certainly like to
                try the
                > algorithm with a more open-ended topology representation at some
                point, but
                > for now we're focused on keeping the algorithm as simple as
                possible, so as
                > to best explore the basic idea of interleaving global and local
                search to
                > alleviate the destructiveness of topology mutations.
                >
                > There are some very simple extensions of the basic principle to
                > population-based search we are currently trying out; they involve
                just
                > replacing the hill-climbing in weight space with a population-based
                > algorithm, such as an evolution strategy. Actually, the local
                search
                > algorithm could be almost any global continuous optimization
                algorithm - not
                > necessarily an evolutionary one. There seems to be quite a few
                people who
                > have tried combining topology search with backpropagation, but
                oddly enough
                > nobody seems to have tried any optimization algorithm, not even
                the humble
                > hill-climber...
                >
                > Julian
                >
                > On 02/04/2008, Kenneth Stanley <kstanley@...> wrote:
                > >
                > > Julian, thanks for sharing your recent work. I hadn't been
                aware of
                > > that paper and it is indeed an interesting study.
                > >
                > > I'm guessing that it probably has not been done before exactly
                > > because it is so simple. In particular, running evolution with a
                > > population of one is not very popular these days, and so people
                > > probably don't even bother considering it.
                > >
                > > Of course, as your paper discusses, it then leads to the
                question of
                > > how you might expand it into a population-based approach, and a
                lot
                > > of the existing methods might embody ways that could be done to
                > > various degrees. NEAT seems to be a possible such expansion in
                the
                > > sense that it has multiple species of topologies and roughly
                evolves
                > > weights and topologies on two different time scales with its
                usual
                > > parameter settings.
                > >
                > > In any case, your paper an elegant demonstration of how much you
                can
                > > learn from a simple concept.
                > >
                > > One question I have is what is meant when you mention things
                > > like "all connections on?" It sounds like the topology is chosen
                > > from some finite constrained set of topologies. Is that true? Or
                > > is any potential topology possible? It makes me wonder whether
                the
                > > connection set from which you are choosing somehow biases the
                result
                > > to radically favor the different time scales, though I'm not sure
                > > how.
                > >
                > > In general, my intuitive explanation of the phenomenon is that if
                > > you stick with only a single topology it is likely to be the
                wrong
                > > one, so you need to look at several to see if they have
                potential.
                > > Then you ensure steady progress.
                > >
                > > ken
                > >
                > >
                > > --- In neat@yahoogroups.com <neat%40yahoogroups.com>, "Julian
                Togelius"
                > > <julian@> wrote:
                > > >
                > > > Hi all,
                > > >
                > > > I'd like to connect to the recent discussion in this group on
                > > > combining neuroevolution with backpropagation, and use it as a
                > > > convenient excuse to expose you to a recent paper of mine. In
                > > > particular, I'd like to ask you if you know who did this
                before we
                > > > (me, Faustino Gomez, Juergen Schmidhuber) did. Because the
                > > algorithm
                > > > we are presenting in this paper is so simple someone must have
                > > come up
                > > > with it before, it's just that our literature search has been
                > > > fruitless.
                > > >
                > > > The basic idea is to combine hillclimbing (1+1 evolution
                > > strategy) of
                > > > network topologies, with hillclimbing of network weights, but
                at a
                > > > faster timescale. So after each ("global") topology mutation,
                we
                > > > search ("local") weight space for a number of steps (e.g. 50),
                and
                > > > only accepts the global mutation if the fitness after local
                > > search is
                > > > higher than before the global mutation. The reason we do this
                is
                > > that
                > > > global mutations are so often disruptive. Yes, this is the same
                > > idea
                > > > as "innovation protection" in NEAT, but here we're boiling it
                > > down a
                > > > bare minimum.
                > > >
                > > > We tried two versions of a moderately difficult control task:
                one
                > > with
                > > > a set of 7 carefully selected inputs to the neural network, and
                > > one
                > > > with 15 not-so-carefully-selected inputs to the network. We
                found
                > > > that, when compared to simple hillclimbing in weight space, or
                to
                > > > climbing weights and topologies at the same time, performed
                > > slightly
                > > > better on the small-input version of the problem problem. On
                the
                > > > big-input version, the difference was dramatic, in that using
                > > > different timescales was really the only way to solve the
                problem!
                > > > (Incidentally, learning topology at the "local" level and
                weights
                > > at
                > > > the "global" level worked just as well.)
                > > >
                > > > The paper is here:
                > > > http://julian.togelius.com/Togelius2008Learning.pdf
                > > >
                > > > Again, this is so obvious that someone must have done this
                > > before. The
                > > > literature search only pointed us to attempts to combine
                topology
                > > > evolution with backpropagation, which is related but not the
                same
                > > > thing. We submitted the paper to WCCI (and had it accepted) in
                the
                > > > hope that the reviewers would know, but that they didn't. So
                now I
                > > > wonder if anyone of you might know. It would be good to know
                about
                > > > this before go on with further experiments...
                > > >
                > > > Grateful for any suggestions
                > > > Julian
                > > > --
                > > > Julian Togelius
                > > > IDSIA
                > > > Galleria 2
                > > > 6928 Manno-Lugano
                > > > Switzerland
                > > > julian@
                > > > http://julian.togelius.com
                > > > http://www.idsia.ch/~togelius <http://www.idsia.ch/%7Etogelius>
                > > > +41-764-110679
                > > > +46-705-192088
                > > >
                > >
                > >
                > >
                >
                >
                >
                > --
                > Julian Togelius
                > IDSIA
                > Galleria 2
                > 6928 Manno-Lugano
                > Switzerland
                > julian@...
                > http://julian.togelius.com
                > http://www.idsia.ch/~togelius <http://www.idsia.ch/%7Etogelius>
                > +41-764-110679
                > +46-705-192088
                >




                --
                Julian Togelius
                IDSIA
                Galleria 2
                6928 Manno-Lugano
                Switzerland
                julian@...
                http://julian.togelius.com
                http://www.idsia.ch/~togelius
                +41-764-110679
                +46-705-192088
              • Julian Togelius
                Derek, Yes and no. Probably more no than yes, when I think about. Cascade correlation only grows nets, whereas the memetic climber cuts connections as well,
                Message 7 of 14 , Apr 3, 2008
                • 0 Attachment
                  Derek,

                  Yes and no. Probably more no than yes, when I think about. Cascade correlation only grows nets, whereas the memetic climber cuts connections as well, and the networks that come out of cascade correlation have a very particular and unusual topology. Most importantly, though, cascade correlation is a gradient-based supervised learning algorithm, whereas the memetic climber is a stochastic search algorithm.

                  But yes, it's a combination of topology modification with another form of search/learning. Much like many of those experiments with combining topology evolution with gradient-based learning, such as backprop, for the weights. What puzzles me is why the people seem not to have taken the obvious next step, and replaced backprop with evolution for the weights as well.

                  Julian

                  On 03/04/2008, Derek James <djames@...> wrote:

                  Isn't this an awful lot like the cascade-correlation algorithm, in
                  which hidden nodes are added incrementally and the network is
                  retrained after the addition if each new hidden node with backprop?

                  --- In neat@yahoogroups.com, "Julian Togelius" <julian@...> wrote:
                  >

                  > Hi all,
                  >
                  > I'd like to connect to the recent discussion in this group on
                  > combining neuroevolution with backpropagation, and use it as a
                  > convenient excuse to expose you to a recent paper of mine. In
                  > particular, I'd like to ask you if you know who did this before we
                  > (me, Faustino Gomez, Juergen Schmidhuber) did. Because the algorithm
                  > we are presenting in this paper is so simple someone must have come up
                  > with it before, it's just that our literature search has been
                  > fruitless.
                  >
                  > The basic idea is to combine hillclimbing (1+1 evolution strategy) of
                  > network topologies, with hillclimbing of network weights, but at a
                  > faster timescale. So after each ("global") topology mutation, we
                  > search ("local") weight space for a number of steps (e.g. 50), and
                  > only accepts the global mutation if the fitness after local search is
                  > higher than before the global mutation. The reason we do this is that
                  > global mutations are so often disruptive. Yes, this is the same idea
                  > as "innovation protection" in NEAT, but here we're boiling it down a
                  > bare minimum.
                  >
                  > We tried two versions of a moderately difficult control task: one with
                  > a set of 7 carefully selected inputs to the neural network, and one
                  > with 15 not-so-carefully-selected inputs to the network. We found
                  > that, when compared to simple hillclimbing in weight space, or to
                  > climbing weights and topologies at the same time, performed slightly
                  > better on the small-input version of the problem problem. On the
                  > big-input version, the difference was dramatic, in that using
                  > different timescales was really the only way to solve the problem!
                  > (Incidentally, learning topology at the "local" level and weights at
                  > the "global" level worked just as well.)
                  >
                  > The paper is here:
                  > http://julian.togelius.com/Togelius2008Learning.pdf
                  >
                  > Again, this is so obvious that someone must have done this before. The
                  > literature search only pointed us to attempts to combine topology
                  > evolution with backpropagation, which is related but not the same
                  > thing. We submitted the paper to WCCI (and had it accepted) in the
                  > hope that the reviewers would know, but that they didn't. So now I
                  > wonder if anyone of you might know. It would be good to know about
                  > this before go on with further experiments...
                  >
                  > Grateful for any suggestions
                  > Julian
                  > --
                  > Julian Togelius
                  > IDSIA
                  > Galleria 2
                  > 6928 Manno-Lugano
                  > Switzerland
                  > julian@...
                  > http://julian.togelius.com
                  > http://www.idsia.ch/~togelius
                  > +41-764-110679
                  > +46-705-192088
                  >




                  --
                  Julian Togelius
                  IDSIA
                  Galleria 2
                  6928 Manno-Lugano
                  Switzerland
                  julian@...
                  http://julian.togelius.com
                  http://www.idsia.ch/~togelius
                  +41-764-110679
                  +46-705-192088
                • Kenneth Stanley
                  ... form of ... combining topology ... weights. ... obvious next ... I think that some people did see themselves as taking the next step when neuroevoltuion
                  Message 8 of 14 , Apr 4, 2008
                  • 0 Attachment
                    >
                    > But yes, it's a combination of topology modification with another
                    form of
                    > search/learning. Much like many of those experiments with
                    combining topology
                    > evolution with gradient-based learning, such as backprop, for the
                    weights.
                    > What puzzles me is why the people seem not to have taken the
                    obvious next
                    > step, and replaced backprop with evolution for the weights as well.
                    >

                    I think that some people did see themselves as taking the next step
                    when neuroevoltuion algorithms started proliferating in the 90s.
                    I think a lot of those early algorithms were motivated by people who
                    saw themselves as cutting backprop completely out of the loop. You
                    know, if you look back at some of the papers on neuroevolution at
                    that time, some would evolve topology and combine it with backprop,
                    and some would simply evolve everything (weights and topology, which
                    are sometimes called TWEANNs for topology and weight evolving
                    artificial neural networks). Probably some of those people evolving
                    TWEANNs saw themselves as taking the next step you describe. (Xin
                    Yao's 1999 review covers a lot of those methods from that time)

                    The difference is that no one did it with a population of one, but
                    that probably just did not occur to them because as soon as you go
                    to a non-gradient type of problem (like evolving topologies)
                    populations seem natural at first thought. Of course what you've
                    shown is that you can make it work at least in some cases without a
                    population.

                    ken
                  • Julian Togelius
                    ... Actually, the big difference is that we re doing it at different timescales. This is really the whole reason it works. The simplest extension is to mutate
                    Message 9 of 14 , Apr 4, 2008
                    • 0 Attachment
                      > I think that some people did see themselves as taking the next step
                      > when neuroevoltuion algorithms started proliferating in the 90s.
                      > I think a lot of those early algorithms were motivated by people who
                      > saw themselves as cutting backprop completely out of the loop. You
                      > know, if you look back at some of the papers on neuroevolution at
                      > that time, some would evolve topology and combine it with backprop,
                      > and some would simply evolve everything (weights and topology, which
                      > are sometimes called TWEANNs for topology and weight evolving
                      > artificial neural networks). Probably some of those people evolving
                      > TWEANNs saw themselves as taking the next step you describe. (Xin
                      > Yao's 1999 review covers a lot of those methods from that time)
                      >
                      > The difference is that no one did it with a population of one, but
                      > that probably just did not occur to them because as soon as you go
                      > to a non-gradient type of problem (like evolving topologies)
                      > populations seem natural at first thought. Of course what you've
                      > shown is that you can make it work at least in some cases without a
                      > population.

                      Actually, the big difference is that we're doing it at different
                      timescales. This is really the whole reason it works.

                      The simplest extension is to mutate both topology and weights at the
                      same timescale, either in the very same mutation or in different
                      mutation types (say, half of the time it mutates topology and half of
                      the it mutates weights). The algorithm we call "simultaneous climber"
                      in the paper does just that: it applies both topology and weight
                      mutation at the same time. It doesn't work at all - in fact it's much
                      worse than a simple hillclimber that only searches weight space. The
                      reason is that topology mutations are destructive, and needs to be
                      offset with local weight search to explore the potential of the
                      topology, otherwise no topology mutations would survive.

                      Tino and me have been trying to work our way through the references in
                      Yao's paper, but many of the early TWEANN papers are very hard to get
                      hold of (Journal of Parallel Computing from 1990 etc.). What we've
                      found are examples of either evolving topologies and using backprop to
                      set the weights, or evolving topologies and weights at the same time
                      (i.e. topology and weights are treated the same way, and topology
                      changes are not followed by any kind of weight search before the new
                      topology is evaluated). So what puzzles me is why people have not
                      simply tried replacing backprop with evolution, while keeping the idea
                      that topology changes are evaluated only after a period of weight
                      search.

                      Either this has been done in one of these papers we haven't found or
                      gotten hold of yet, or the idea wasn't so obvious after all...

                      Julian

                      > ken
                      >
                      >



                      --
                      Julian Togelius
                      IDSIA
                      Galleria 2
                      6928 Manno-Lugano
                      Switzerland
                      julian@...
                      http://julian.togelius.com
                      http://www.idsia.ch/~togelius
                      +41-764-110679
                      +46-705-192088
                    • Kenneth Stanley
                      I see what you re saying- no one thought to really explore the interaction of the two different time scales. But don t most of the older TWEANN neuroevolution
                      Message 10 of 14 , Apr 5, 2008
                      • 0 Attachment
                        I see what you're saying- no one thought to really explore the
                        interaction of the two different time scales.

                        But don't most of the older TWEANN neuroevolution algorithms have a
                        different topological mutation rate from the weight mutation rate? In
                        other words, while they do not explicitly enact two different time
                        scales (as you thought to do), is not the effect the same implicitly?

                        I'm not certain about this issue because I don't remember precisely
                        how people set their relative mutation rates for different types of
                        changes in various systems, but I would guess that most have weights
                        changing more often than topologies. At least that is how NEAT was
                        originally set up, and it seemed to straightforward thing to do.

                        Or do you feel that perhaps having different mutation rates is not
                        really the same as having explicitly differing time scales that
                        separate periods of weight an topology search?

                        ken

                        --- In neat@yahoogroups.com, "Julian Togelius" <julian@...> wrote:
                        >
                        > > I think that some people did see themselves as taking the next step
                        > > when neuroevoltuion algorithms started proliferating in the 90s.
                        > > I think a lot of those early algorithms were motivated by people who
                        > > saw themselves as cutting backprop completely out of the loop. You
                        > > know, if you look back at some of the papers on neuroevolution at
                        > > that time, some would evolve topology and combine it with backprop,
                        > > and some would simply evolve everything (weights and topology, which
                        > > are sometimes called TWEANNs for topology and weight evolving
                        > > artificial neural networks). Probably some of those people evolving
                        > > TWEANNs saw themselves as taking the next step you describe. (Xin
                        > > Yao's 1999 review covers a lot of those methods from that time)
                        > >
                        > > The difference is that no one did it with a population of one, but
                        > > that probably just did not occur to them because as soon as you go
                        > > to a non-gradient type of problem (like evolving topologies)
                        > > populations seem natural at first thought. Of course what you've
                        > > shown is that you can make it work at least in some cases without a
                        > > population.
                        >
                        > Actually, the big difference is that we're doing it at different
                        > timescales. This is really the whole reason it works.
                        >
                        > The simplest extension is to mutate both topology and weights at the
                        > same timescale, either in the very same mutation or in different
                        > mutation types (say, half of the time it mutates topology and half of
                        > the it mutates weights). The algorithm we call "simultaneous climber"
                        > in the paper does just that: it applies both topology and weight
                        > mutation at the same time. It doesn't work at all - in fact it's much
                        > worse than a simple hillclimber that only searches weight space. The
                        > reason is that topology mutations are destructive, and needs to be
                        > offset with local weight search to explore the potential of the
                        > topology, otherwise no topology mutations would survive.
                        >
                        > Tino and me have been trying to work our way through the references in
                        > Yao's paper, but many of the early TWEANN papers are very hard to get
                        > hold of (Journal of Parallel Computing from 1990 etc.). What we've
                        > found are examples of either evolving topologies and using backprop to
                        > set the weights, or evolving topologies and weights at the same time
                        > (i.e. topology and weights are treated the same way, and topology
                        > changes are not followed by any kind of weight search before the new
                        > topology is evaluated). So what puzzles me is why people have not
                        > simply tried replacing backprop with evolution, while keeping the idea
                        > that topology changes are evaluated only after a period of weight
                        > search.
                        >
                        > Either this has been done in one of these papers we haven't found or
                        > gotten hold of yet, or the idea wasn't so obvious after all...
                        >
                        > Julian
                        >
                        > > ken
                        > >
                        > >
                        >
                        >
                        >
                        > --
                        > Julian Togelius
                        > IDSIA
                        > Galleria 2
                        > 6928 Manno-Lugano
                        > Switzerland
                        > julian@...
                        > http://julian.togelius.com
                        > http://www.idsia.ch/~togelius
                        > +41-764-110679
                        > +46-705-192088
                        >
                      • Julian Togelius
                        ... Apparently... I d really like to know if the stuff is out there somewhere. It would be embarrassing to go and publish more papers on algorithms based on
                        Message 11 of 14 , Apr 6, 2008
                        • 0 Attachment
                          > I see what you're saying- no one thought to really explore the
                          > interaction of the two different time scales.

                          Apparently... I'd really like to know if the stuff is out there
                          somewhere. It would be embarrassing to go and publish more papers on
                          algorithms based on this idea if it turns out to already have been
                          done.

                          > But don't most of the older TWEANN neuroevolution algorithms have a
                          > different topological mutation rate from the weight mutation rate? In
                          > other words, while they do not explicitly enact two different time
                          > scales (as you thought to do), is not the effect the same implicitly?
                          >
                          > I'm not certain about this issue because I don't remember precisely
                          > how people set their relative mutation rates for different types of
                          > changes in various systems, but I would guess that most have weights
                          > changing more often than topologies. At least that is how NEAT was
                          > originally set up, and it seemed to straightforward thing to do.
                          >
                          > Or do you feel that perhaps having different mutation rates is not
                          > really the same as having explicitly differing time scales that
                          > separate periods of weight an topology search?

                          Yes. It's not the same thing at all. In the algorithms you mention,
                          even if the topology mutation rate is set very low, the problem is
                          that each new topology is evaluated immediately, before a good set of
                          weights has been found. As topology mutations are typically very
                          disruptive, this makes it hard for a new topology to survive. The
                          reason these algorithms work at all is probably that they are
                          population-based and have a rather low selection pressure. If you just
                          perform topology and weight mutations in a hill climber, and evaluate
                          the fitness of the network after each topology mutation, the hill
                          climber will not work at all.

                          The memetic climber evaluates the fitness of a new topology only after
                          some local search to find better weights for the new topology. This
                          way, far more new topologies can be accepted. So the real difference
                          to the algorithms you mention is not the proportion of weight or
                          topology mutations, but that they essentially treat both types of
                          mutations the same way, instead of interleaving local with global
                          search.

                          As I understand it, you solve the problem with topology mutations in
                          two ways in NEAT (if I remember the details right, was a while since I
                          read the NEAT paper last time). The most important method is
                          innovation protection, where you allow a new topology to live on for a
                          while even if it is initially substandard. Innovation protection is a
                          very similar idea to the memetic climber, and is the most important
                          prior work I've been able to find (and is of course cited as such in
                          the paper). The other method is that you design your topology
                          mutations to be as non-disruptive as possible, as is the case with
                          link splitting. Obviously, this works really well; what I'm trying to
                          do is isolate a core mechanism and understand it properly.

                          The more I try to explain this, the clearer it is that I lack the
                          proper terminology...

                          Julian

                          > ken
                          >
                          >
                          > --- In neat@yahoogroups.com, "Julian Togelius" <julian@...> wrote:
                          > >
                          >
                          > > > I think that some people did see themselves as taking the next step
                          > > > when neuroevoltuion algorithms started proliferating in the 90s.
                          > > > I think a lot of those early algorithms were motivated by people who
                          > > > saw themselves as cutting backprop completely out of the loop. You
                          > > > know, if you look back at some of the papers on neuroevolution at
                          > > > that time, some would evolve topology and combine it with backprop,
                          > > > and some would simply evolve everything (weights and topology, which
                          > > > are sometimes called TWEANNs for topology and weight evolving
                          > > > artificial neural networks). Probably some of those people evolving
                          > > > TWEANNs saw themselves as taking the next step you describe. (Xin
                          > > > Yao's 1999 review covers a lot of those methods from that time)
                          > > >
                          > > > The difference is that no one did it with a population of one, but
                          > > > that probably just did not occur to them because as soon as you go
                          > > > to a non-gradient type of problem (like evolving topologies)
                          > > > populations seem natural at first thought. Of course what you've
                          > > > shown is that you can make it work at least in some cases without a
                          > > > population.
                          > >
                          > > Actually, the big difference is that we're doing it at different
                          > > timescales. This is really the whole reason it works.
                          > >
                          > > The simplest extension is to mutate both topology and weights at the
                          > > same timescale, either in the very same mutation or in different
                          > > mutation types (say, half of the time it mutates topology and half of
                          > > the it mutates weights). The algorithm we call "simultaneous climber"
                          > > in the paper does just that: it applies both topology and weight
                          > > mutation at the same time. It doesn't work at all - in fact it's much
                          > > worse than a simple hillclimber that only searches weight space. The
                          > > reason is that topology mutations are destructive, and needs to be
                          > > offset with local weight search to explore the potential of the
                          > > topology, otherwise no topology mutations would survive.
                          > >
                          > > Tino and me have been trying to work our way through the references in
                          > > Yao's paper, but many of the early TWEANN papers are very hard to get
                          > > hold of (Journal of Parallel Computing from 1990 etc.). What we've
                          > > found are examples of either evolving topologies and using backprop to
                          > > set the weights, or evolving topologies and weights at the same time
                          > > (i.e. topology and weights are treated the same way, and topology
                          > > changes are not followed by any kind of weight search before the new
                          > > topology is evaluated). So what puzzles me is why people have not
                          > > simply tried replacing backprop with evolution, while keeping the idea
                          > > that topology changes are evaluated only after a period of weight
                          > > search.
                          > >
                          > > Either this has been done in one of these papers we haven't found or
                          > > gotten hold of yet, or the idea wasn't so obvious after all...
                          > >
                          > > Julian
                          > >
                          > > > ken
                          > > >
                          > > >
                          > >
                          > >
                          > >
                          > > --
                          > > Julian Togelius
                          > > IDSIA
                          > > Galleria 2
                          > > 6928 Manno-Lugano
                          > > Switzerland
                          > > julian@...
                          >
                          > > http://julian.togelius.com
                          > > http://www.idsia.ch/~togelius
                          > > +41-764-110679
                          > > +46-705-192088
                          > >
                          >
                          >



                          --
                          Julian Togelius
                          IDSIA
                          Galleria 2
                          6928 Manno-Lugano
                          Switzerland
                          julian@...
                          http://julian.togelius.com
                          http://www.idsia.ch/~togelius
                          +41-764-110679
                          +46-705-192088
                        • Kenneth Stanley
                          Thanks Julian, that explanation made sense. I see the distinction now. ken ... on ... have a ... rate? In ... time ... implicitly? ... precisely ... types of
                          Message 12 of 14 , Apr 8, 2008
                          • 0 Attachment
                            Thanks Julian, that explanation made sense. I see the distinction
                            now.

                            ken

                            --- In neat@yahoogroups.com, "Julian Togelius" <julian@...> wrote:
                            >
                            > > I see what you're saying- no one thought to really explore the
                            > > interaction of the two different time scales.
                            >
                            > Apparently... I'd really like to know if the stuff is out there
                            > somewhere. It would be embarrassing to go and publish more papers
                            on
                            > algorithms based on this idea if it turns out to already have been
                            > done.
                            >
                            > > But don't most of the older TWEANN neuroevolution algorithms
                            have a
                            > > different topological mutation rate from the weight mutation
                            rate? In
                            > > other words, while they do not explicitly enact two different
                            time
                            > > scales (as you thought to do), is not the effect the same
                            implicitly?
                            > >
                            > > I'm not certain about this issue because I don't remember
                            precisely
                            > > how people set their relative mutation rates for different
                            types of
                            > > changes in various systems, but I would guess that most have
                            weights
                            > > changing more often than topologies. At least that is how NEAT
                            was
                            > > originally set up, and it seemed to straightforward thing to do.
                            > >
                            > > Or do you feel that perhaps having different mutation rates is
                            not
                            > > really the same as having explicitly differing time scales that
                            > > separate periods of weight an topology search?
                            >
                            > Yes. It's not the same thing at all. In the algorithms you mention,
                            > even if the topology mutation rate is set very low, the problem is
                            > that each new topology is evaluated immediately, before a good set
                            of
                            > weights has been found. As topology mutations are typically very
                            > disruptive, this makes it hard for a new topology to survive. The
                            > reason these algorithms work at all is probably that they are
                            > population-based and have a rather low selection pressure. If you
                            just
                            > perform topology and weight mutations in a hill climber, and
                            evaluate
                            > the fitness of the network after each topology mutation, the hill
                            > climber will not work at all.
                            >
                            > The memetic climber evaluates the fitness of a new topology only
                            after
                            > some local search to find better weights for the new topology. This
                            > way, far more new topologies can be accepted. So the real
                            difference
                            > to the algorithms you mention is not the proportion of weight or
                            > topology mutations, but that they essentially treat both types of
                            > mutations the same way, instead of interleaving local with global
                            > search.
                            >
                            > As I understand it, you solve the problem with topology mutations
                            in
                            > two ways in NEAT (if I remember the details right, was a while
                            since I
                            > read the NEAT paper last time). The most important method is
                            > innovation protection, where you allow a new topology to live on
                            for a
                            > while even if it is initially substandard. Innovation protection
                            is a
                            > very similar idea to the memetic climber, and is the most important
                            > prior work I've been able to find (and is of course cited as such
                            in
                            > the paper). The other method is that you design your topology
                            > mutations to be as non-disruptive as possible, as is the case with
                            > link splitting. Obviously, this works really well; what I'm trying
                            to
                            > do is isolate a core mechanism and understand it properly.
                            >
                            > The more I try to explain this, the clearer it is that I lack the
                            > proper terminology...
                            >
                            > Julian
                            >
                            > > ken
                            > >
                            > >
                            > > --- In neat@yahoogroups.com, "Julian Togelius" <julian@> wrote:
                            > > >
                            > >
                            > > > > I think that some people did see themselves as taking the
                            next step
                            > > > > when neuroevoltuion algorithms started proliferating in the
                            90s.
                            > > > > I think a lot of those early algorithms were motivated by
                            people who
                            > > > > saw themselves as cutting backprop completely out of the
                            loop. You
                            > > > > know, if you look back at some of the papers on
                            neuroevolution at
                            > > > > that time, some would evolve topology and combine it with
                            backprop,
                            > > > > and some would simply evolve everything (weights and
                            topology, which
                            > > > > are sometimes called TWEANNs for topology and weight
                            evolving
                            > > > > artificial neural networks). Probably some of those people
                            evolving
                            > > > > TWEANNs saw themselves as taking the next step you
                            describe. (Xin
                            > > > > Yao's 1999 review covers a lot of those methods from that
                            time)
                            > > > >
                            > > > > The difference is that no one did it with a population of
                            one, but
                            > > > > that probably just did not occur to them because as soon as
                            you go
                            > > > > to a non-gradient type of problem (like evolving topologies)
                            > > > > populations seem natural at first thought. Of course what
                            you've
                            > > > > shown is that you can make it work at least in some cases
                            without a
                            > > > > population.
                            > > >
                            > > > Actually, the big difference is that we're doing it at
                            different
                            > > > timescales. This is really the whole reason it works.
                            > > >
                            > > > The simplest extension is to mutate both topology and weights
                            at the
                            > > > same timescale, either in the very same mutation or in
                            different
                            > > > mutation types (say, half of the time it mutates topology and
                            half of
                            > > > the it mutates weights). The algorithm we call "simultaneous
                            climber"
                            > > > in the paper does just that: it applies both topology and
                            weight
                            > > > mutation at the same time. It doesn't work at all - in fact
                            it's much
                            > > > worse than a simple hillclimber that only searches weight
                            space. The
                            > > > reason is that topology mutations are destructive, and needs
                            to be
                            > > > offset with local weight search to explore the potential of
                            the
                            > > > topology, otherwise no topology mutations would survive.
                            > > >
                            > > > Tino and me have been trying to work our way through the
                            references in
                            > > > Yao's paper, but many of the early TWEANN papers are very
                            hard to get
                            > > > hold of (Journal of Parallel Computing from 1990 etc.). What
                            we've
                            > > > found are examples of either evolving topologies and using
                            backprop to
                            > > > set the weights, or evolving topologies and weights at the
                            same time
                            > > > (i.e. topology and weights are treated the same way, and
                            topology
                            > > > changes are not followed by any kind of weight search before
                            the new
                            > > > topology is evaluated). So what puzzles me is why people have
                            not
                            > > > simply tried replacing backprop with evolution, while keeping
                            the idea
                            > > > that topology changes are evaluated only after a period of
                            weight
                            > > > search.
                            > > >
                            > > > Either this has been done in one of these papers we haven't
                            found or
                            > > > gotten hold of yet, or the idea wasn't so obvious after all...
                            > > >
                            > > > Julian
                            > > >
                            > > > > ken
                            > > > >
                            > > > >
                            > > >
                            > > >
                            > > >
                            > > > --
                            > > > Julian Togelius
                            > > > IDSIA
                            > > > Galleria 2
                            > > > 6928 Manno-Lugano
                            > > > Switzerland
                            > > > julian@
                            > >
                            > > > http://julian.togelius.com
                            > > > http://www.idsia.ch/~togelius
                            > > > +41-764-110679
                            > > > +46-705-192088
                            > > >
                            > >
                            > >
                            >
                            >
                            >
                            > --
                            > Julian Togelius
                            > IDSIA
                            > Galleria 2
                            > 6928 Manno-Lugano
                            > Switzerland
                            > julian@...
                            > http://julian.togelius.com
                            > http://www.idsia.ch/~togelius
                            > +41-764-110679
                            > +46-705-192088
                            >
                          Your message has been successfully submitted and would be delivered to recipients shortly.