Loading ...
Sorry, an error occurred while loading the content.

Starting fully connected

Expand Messages
  • Oliver Coleman
    Hi all, Apologies if some or all of the below has been discussed previously, I searched but found nothing about it. I m in the process of verifying my
    Message 1 of 2 , Sep 2, 2013
    • 0 Attachment
      Hi all,

      Apologies if some or all of the below has been discussed previously, I searched but found nothing about it.

      I'm in the process of verifying my implementation of NEAT by comparing its performance against published results (eg double pole balancing in http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf). I was having a lot of trouble reproducing the performance results, until I specified that the starting population networks should be fully connected (every input connected to every output) which I only thought to do after reading the below from the NEAT FAQ and realising that the published results would have used the original code that is not able to start without fully connected networks. I was then able to get the same performance as described in the linked article. In the NEAT FAQ on the users page it says:

      "Can I start NEAT with some inputs disconnected? (IMPORTANT) 
      The reason this question is important is that if we can start NEAT with some inputs disconnected, then we can allow NEAT to decide which inputs are important. This process has two good effects: (1) You can start minimally even in problems with many inputs and (2) you don't need to know a priori what the important features of the domain are. Recent results (http://nn.cs.utexas.edu/keyword?whiteson:gecco05) reported in this paper suggest that this approach can be quite effective."

      What has been others' experience with initial population connectivity? I noticed in SharpNEAT that the initial connectivity can be specified as a percentage. Have people found this sort of parameter to be useful? At the moment I'm experimenting with starting with initial populations which have been generated by applying the mutation operators with an increased mutation rate (adding between 0 and the maximum number of connections (ie fully connected)), with these results (averaged over 50 runs, with maximum of 500 generations):

      Fully connected with randomly set initial weight values:
         All runs solved in average of 26 generations.
      No initial connections:
         Only 4 runs solved.
      Each member in population has between 0 and the maximum number of connections:
         All runs solved in average of 93 generations.
      No initial connections (add connection mutation rate = 0.5):
         All runs solved in average of 40 generations.

      When starting with no initial connections it seems evolution really struggles to find a solution and is getting stuck in local optima. After 100 or 200 generations there will be around 15 species, but none of these manages to improve (I tried enabling and disabling the killing off of species after 15 generations but this made no difference). Increasing the number of generations to 1000 didn't seem to help (I did about 10 runs with 1000 generations, none of them found a solution). It seems that in this case with these parameter settings NEAT is unable to add the necessary structure despite protection of innovation via speciation.

      The results when starting with a population where members have between 0 and the maximum number of connections took about 3.5 times as long as with a population where all members start fully connected, further indicating that starting with a minimal topology doesn't help for this task. Perhaps these minimally connected networks take up a large part of the population and slow down the evolution of the larger networks.

      Finally, to test the hypothesis that NEAT is unable to add the necessary structure I increased the add connection mutation rate from 0.05 to 0.5. NEAT was able to find a solution for all runs in an average of 40 generations. I think Ken Stanley mentioned this as an important factor recently, perhaps in a similar context, but I don't remember the details, sorry Ken!

      So this opens the question of whether it's generally better to start minimally and have a higher structural mutation rate, or start with a significant number of initial connections and have a lower add connection mutation rate. I have not implemented the feature selection mutation operator that connects an input to all output connections (http://nn.cs.utexas.edu/keyword?whiteson:gecco05), perhaps this is the best solution for some kinds of tasks, however it may be fairly task dependent (eg I'm not sure how useful it would be for HyperNEAT where the only inputs are substrate neuron coordinates and in nearly all cases they will all be useful).

      Cheers,
      Oliver
    • kenstanley01
      Hi Oliver, these are interesting results and questions. I think there are a number of subtle considerations interacting behind the phenomena you re observing.
      Message 2 of 2 , Sep 5, 2013
      • 0 Attachment
        Hi Oliver, these are interesting results and questions.   I think there are a number of subtle considerations interacting behind the phenomena you're observing. 

        First, of course we need to keep in mind that pole balancing is unique in a lot of ways.  It seems to have very few local optima (because it is solved faster by small populations) and can also be solved with a very small network.  Even fully-connected starting networks are very small, which means that not including a single connection is like removing a significant percentage of the initial search space.  So it's possible we'd see different results in different domains, e.g. ones with many inputs and/or outputs.

        Second, the reason it is not being solved in some of your cases is the usual reason for evolution getting stuck in any objective-driven problem: deception.  It just so happens (apparently) that taking away initial connections makes the problem significantly more deceptive.  That may be because (as NEAT has shown) the most efficient solution is a tiny fully-connected networks with single recurrent loop at the top and no hidden nodes.  But if you do not include all the connections at first, then you may be increasing the chance that it will start producing higher-fitness networks with additional hidden nodes, which are unnecessary.  If those initially appear more fit (as they may), then you've essentially sent evolution on a wild goose chase for a more complex-than-necessary solution, which will take longer to optimize.  So it is a kind of deception specific to this domain, in part because its optimal solution is so unusually simple.

        Third, the results might be different under different reward schemes, so that's another caution against too quickly drawing general conclusions.  For example, given the hypothesis that deception is behind the worse-performing setups, novelty search might lead to a different ranking of the different setups.  It might even do better starting with some connections not present.  I'm not sure if that's the case, but it might be, and if it is, then it is even more unclear what to conclude in general about how best to start.

        Overall, I think part of what you've shown is just how complicated all these interacting issues are.  The degree to which the phenomenon you observed extends to other domains is also not yet clear, so I think these are questions that we can't yet fully answer (at least based on my own knowledge).  On a practical level, I think withholding some connections might work better for networks with more initial inputs and outputs, but again it's probably domain dependent. 

        I'd also be careful about checking to make sure the initial numbering of the historical markings in these partially-connected networks is 100% consistent.   In other words, there is a chance of a bug wherein the same connections get different numbers depending on which are included.  (That would be another confound for the results of course.) 

        Best,

        ken  


        --- In neat@yahoogroups.com, <neat@yahoogroups.com> wrote:

        Hi all,

        Apologies if some or all of the below has been discussed previously, I searched but found nothing about it.

        I'm in the process of verifying my implementation of NEAT by comparing its performance against published results (eg double pole balancing in http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf). I was having a lot of trouble reproducing the performance results, until I specified that the starting population networks should be fully connected (every input connected to every output) which I only thought to do after reading the below from the NEAT FAQ and realising that the published results would have used the original code that is not able to start without fully connected networks. I was then able to get the same performance as described in the linked article. In the NEAT FAQ on the users page it says:

        "Can I start NEAT with some inputs disconnected? (IMPORTANT) 
        The reason this question is important is that if we can start NEAT with some inputs disconnected, then we can allow NEAT to decide which inputs are important. This process has two good effects: (1) You can start minimally even in problems with many inputs and (2) you don't need to know a priori what the important features of the domain are. Recent results (http://nn.cs.utexas.edu/keyword?whiteson:gecco05) reported in this paper suggest that this approach can be quite effective."

        What has been others' experience with initial population connectivity? I noticed in SharpNEAT that the initial connectivity can be specified as a percentage. Have people found this sort of parameter to be useful? At the moment I'm experimenting with starting with initial populations which have been generated by applying the mutation operators with an increased mutation rate (adding between 0 and the maximum number of connections (ie fully connected)), with these results (averaged over 50 runs, with maximum of 500 generations):

        Fully connected with randomly set initial weight values:
           All runs solved in average of 26 generations.
        No initial connections:
           Only 4 runs solved.
        Each member in population has between 0 and the maximum number of connections:
           All runs solved in average of 93 generations.
        No initial connections (add connection mutation rate = 0.5):
           All runs solved in average of 40 generations.

        When starting with no initial connections it seems evolution really struggles to find a solution and is getting stuck in local optima. After 100 or 200 generations there will be around 15 species, but none of these manages to improve (I tried enabling and disabling the killing off of species after 15 generations but this made no difference). Increasing the number of generations to 1000 didn't seem to help (I did about 10 runs with 1000 generations, none of them found a solution). It seems that in this case with these parameter settings NEAT is unable to add the necessary structure despite protection of innovation via speciation.

        The results when starting with a population where members have between 0 and the maximum number of connections took about 3.5 times as long as with a population where all members start fully connected, further indicating that starting with a minimal topology doesn't help for this task. Perhaps these minimally connected networks take up a large part of the population and slow down the evolution of the larger networks.

        Finally, to test the hypothesis that NEAT is unable to add the necessary structure I increased the add connection mutation rate from 0.05 to 0.5. NEAT was able to find a solution for all runs in an average of 40 generations. I think Ken Stanley mentioned this as an important factor recently, perhaps in a similar context, but I don't remember the details, sorry Ken!

        So this opens the question of whether it's generally better to start minimally and have a higher structural mutation rate, or start with a significant number of initial connections and have a lower add connection mutation rate. I have not implemented the feature selection mutation operator that connects an input to all output connections (http://nn.cs.utexas.edu/keyword?whiteson:gecco05), perhaps this is the best solution for some kinds of tasks, however it may be fairly task dependent (eg I'm not sure how useful it would be for HyperNEAT where the only inputs are substrate neuron coordinates and in nearly all cases they will all be useful).

        Cheers,
        Oliver
      Your message has been successfully submitted and would be delivered to recipients shortly.