Loading ...
Sorry, an error occurred while loading the content.

Re: GECCO Paper on HyperNEAT

Expand Messages
  • Ken
    Hi All, Thomas and Shimon were gracious enough to share with me this paper early on, so I have had time to consider its implications. Both Thomas and Shimon
    Message 1 of 14 , May 6, 2013
    • 0 Attachment
      Hi All,

      Thomas and Shimon were gracious enough to share with me this paper early on, so I have had time to consider its implications. Both Thomas and Shimon know that I take serious issue with one of its claims (regarding "fracture"), so it will be no surprise to them that I am sharing with you my concerns. For brevity I will refer to Thomas and Shimon as V&W here (the first initials of their last names). I made a short and long version of this response for people who might have different amounts of time to read it.

      _________________________
      Short version:

      V&W conclude that HyperNEAT (and hence CPPNs) has a problem with an important property called "fracture." I disagree with this conclusion and think the evidence offered for it is misleading. Therefore, I made a website showing many examples of fracture in HyperNEAT as counter-evidence. You may find this site interesting even if you do not know about fracture because it shows internal representations inside CPPNs (lots of pictures) that few have seen before:

      http://eplex.cs.ucf.edu/hyperNEATpage/content/cppn_fracture.html

      ______________________
      Longer version:


      The paper is divided into two parts: In the first part, V&W attempt three simple benchmarks with HyperNEAT and find that it fails on slightly more complex variants, suggesting that it may have a problem with a lower level of problem complexity than one would expect. I have less issue with this part, as it is indeed worth reporting where a method has trouble, and could lead us to insights about why it has such trouble. (My bigger concerns are with the second half of the paper, on fracture.)

      The only potential problem with the first half of the paper is that two of the three experiments are run in a version of HyperNEAT the authors created themselves, called peasNEAT (https://github.com/noio/peas). We are not yet sure whether peasNEAT has any fundamental implementation errors, but myself and colleagues have been examining peasNEAT just to make sure we can learn from its results. What we have found at this preliminary stage is that peasNEAT indeed contains a number of significant deviations (and maybe bugs) from normal implementations of HyperNEAT (most with respect to the CPPN implementation) that may account for its performance difficulties. However, we are not yet sure if it would change the results to fix these issues, so we do not yet want to imply one way or another - just something to keep in mind when thinking about these negative results at this point. Of course we will be transparent on our findings once we finish analyzing the system and we are not suggesting anything at this time.

      However, like I said my big concern is much more with the latter half. To make the paper more fundamental than just a few negative results, V&W needed a hypothesis to explain the alleged bad performance of HyperNEAT on these problems, and my feeling (as they know) is that their hypothesis is dangerously wrong and if accepted at face value would lead the field significantly astray. In short, they suggest that HyperNEAT (and hence CPPNs) has big problems representing something called "fracture." Fracture is defined as "discontinuous variation of patterns." They conclude that this type of pattern is "highly problematic for HyperNEAT."

      V&W needed some kind of unifying insight to explain the negative results from their first half, but this insight is the wrong one, which is unfortunate because it is supported by misleading quantitative results (once again we may be seeing the danger of our field's heavy trust in quantification). But these quantitative results are exactly what we have been discussing (on this group) as being one of the Achilles' heels of this type of quantitative evolutionary experiment: To show that HyperNEAT is bad at fracture they run a set of target-based experiments (again with peasNEAT) in which HyperNEAT's objective is to match an arbitrary fractured pattern of weights. It would be expected based on my work with Woolley that HyperNEAT would perform poorly at such a task. At the same time, a more direct style of encoding based on wavelets (which they use for comparison) does better. It is not a surprise that a more direct encoding would be better at creating a copy of an arbitrary target pattern, but it would have few or none of the good properties of indirect encoding (e.g. compositionality, elaboration of regularity, etc.).

      Most concerning for me is that the hypothesis about fracture rests solely upon the most misleading kind of target-based experiment. I am still trying to understand how V&W could cite my paper with Brian Woolley ("On the Deleterious Effects of A Priori Objectives on Evolution and Representation") in their background section without any hint of disagreement, yet later in the paper so explicitly violate every ounce of caution on target-based experiments that it suggests. In any case, I've been expecting that sooner or later someone would probably suggest a misleading principle based on such a faulty target-based premise. Since it seems to have happened now, it is important to try to show that the conclusions are wrong.

      To be clear: I have no problem with suggestions that HyperNEAT may not perform well in some domains. These are entirely fair game. My problem is with the hypothesis given about *why* some domains fare poorly with HyperNEAT.

      Because I believe it is important to set the record straight that HyperNEAT and CPPNs are both good experimental platforms for observing fracture, I have made a page full of pictures and examples of explicit fracture in CPPNs. (Like I said, I've had some time to prepare this page since I've known about the paper for a few weeks.) Even for those not particularly interested in the argument about fracture, you might still find this web page interesting because it shows some fascinating (at least to me) observations that few people have seen before about the internal elegance of CPPN representations:

      http://eplex.cs.ucf.edu/hyperNEATpage/content/cppn_fracture.html

      One of the ironies of this paper for me is that the very thing it alleges "highly problematic" for CPPNs is one of the main things that make CPPNs so interesting. There is literally a treasure trove of data on representation in CPPNs sitting on Picbreeder – thousands of patterns all of whose CPPNs are directly visible to the public through the Picbreeder DNA tool. The internals of these CPPNs and what they tell us about representation hold a wealth of undiscovered lessons, many of them (ironically) on how fracture is represented in sophisticated encodings. To date, observations taken with the DNA tool have not been studied formally, but as you can see at the site above, every image on the Picbreeder site (and probably Endless Forms as well) can teach us surprising lessons on representation.

      In that sense, while the second half of V&W's paper is counterproductive and misleading in my opinion, perhaps some good still comes from this opportunity it creates to share with you the remarkable kinds of fracture that have evolved in CPPNs, and perhaps to begin to open a discussion on what that teaches us about indirect encoding and representation.

      With regard to the challenging problems for HyperNEAT alleged by Thomas and Shimon's paper, myself and my group and colleagues still need to complete an analysis to decide whether there are any useful lessons to draw (though an inherent problem with fracture in CPPNs will likely not be the lesson). If there is a problem, it is most likely a classic stepping-stone paradox: The solution does not look like the steps that lead to the solution. In cases like that, problems that look not-so-hard can turn out deceptive in unexpected ways. If that is the case in some or all three domains, then that is certainly interesting and could perhaps be remedied with something like novelty search or novelty-based multiobjectivization. But right now I am holding off on drawing any conclusions because we are not yet certain whether the peasNEAT implementation of HyperNEAT is without a bug (which would be a less interesting explanation).
      In any case I hope you will take a look at the fracture website and enjoy some of the images there.

      I should finally note that Shimon, Thomas and I remain on good terms personally and that this is an entirely professional disagreement (with regards to the fracture hypothesis). I wish both Shimon and Thomas the best and appreciate their desire to discover more about HyperNEAT.

      Best,

      ken


      --- In neat@yahoogroups.com, Thomas van den Berg <paranoid.ndroid@...> wrote:
      >
      > Dear Colleagues,
      >
      > The paper "Critical Factors in the Performance of HyperNEAT", written by
      > Shimon Whiteson and me, was accepted for GECCO2013. It is based on the work
      > that I did (and am doing) for the thesis for my MSc. A.I., on the topic of
      > HyperNEAT.
      >
      > This paper presents a series of experiments designed to examine the
      > critical factors for its success. First, we determine the fewest hidden
      > nodes a genotypic network needs to solve several of these tasks. Our
      > results show that all of these tasks are easy: they can be solved with at
      > most one hidden node and require generating only trivial regular patterns.
      > Then, we examine how HyperNEAT performs when the tasks are made harder. Our
      > results show that HyperNEAT's performance decays quickly: it fails to solve
      > all variants of these tasks that require more complex solutions. Next, we
      > examine the hypothesis that fracture in the problem space, known to be
      > challenging for regular NEAT, is even more so for HyperNEAT. Our results
      > suggest that quite complex networks are needed to cope with even modest
      > fracture and HyperNEAT has difficulty discovering them. Finally, we connect
      > these results to previous experiments showing that HyperNEAT's performance
      > decreases on irregular tasks. Our results suggest irregularity is an
      > extreme form of fracture and that HyperNEAT's limitations are thus more
      > severe than those experiments suggested.
      >
      > Find it here:
      > http://staff.science.uva.nl/~whiteson/pubs/b2hd-vandenberggecco13.html
      >
      > Regards,
      >
      > Thomas van den Berg
      >
    • shimonw
      Dear all, I d like to take the opportunity to respond to some of the criticisms that Ken made about our GECCO-13 paper on HyperNeat. I ve organized the
      Message 2 of 14 , May 27, 2013
      • 0 Attachment

        Dear all,


        I'd like to take the opportunity to respond to some of the criticisms that Ken made about our GECCO-13 paper on HyperNeat.  I've organized the response according to what I consider the three main issues of contention.  Following Ken's example, I've included a short summary at the beginning for those who lack the time or interest to read the whole response. (FYI, this response was prepared with the help of Thomas van den Berg, the first author of the paper in question).


        Short summary

        1. There's no evidence of any problem with our code or experiments.

        2. While it is an important caveat of one section of our paper that it focuses on target-based scenarios, there are good reasons to do the experiment this way, the results still provide substantial evidence that fracture can be difficult for HyperNEAT, this is supported by additional evidence not from target-based scenarios, and analogous caveats apply to the counterexamples Ken provides.

        3. Ken's examples of Picbreeder evolving fracture do not contradict our claim about the topology needed to represent fracture but are in fact consistent with it.  If our claim is wrong, anyone can easily disprove it by showing a simpler topology than in our Fig. 13 but to date no one has done so.


        And here's the full version:


        Whether our implementation of HyperNEAT is valid:

        The PEAS implementation of NEAT and HyperNEAT is an attempt to create an accessible python environment for experimenting with (generative) evolutionary methods. The code was made public as soon as possible to allow feedback, so we're happy that Ken and others are looking at it and working with it now.


        That said, we have several good reasons to believe that all of our results are based on correctly functioning code:


        1. The code was built by closely following the specifications given in published papers, and existing implementations were used as references whenever the details were unclear. 


        2. It was validated on multiple benchmarks from existing research, including those we mention in our paper. We have not found—and no one has shown us—any evidence that suggests our implementation is invalid or that its results are inconsistent with those of existing implementations.


        3. Not all of the experiments in our paper use PEAS. The fact that we get qualitatively similar results on the Walking Gait task using Clune et al.'s implementation is evidence for both our hypothesis and for the correctness of our code.


        Whether our experiments are misleading because they use a target-based scenario:

        It is indeed an important caveat of the experiments we present in Section 6 of our paper that they use a target-based scenario (evolving towards a fixed configuration of weights) with a "tight" fitness function, as we explicitly acknowledge in the paper. However:


        1. There are good reasons to do the experiment this way: because the definition of fracture is inherently generative, only a target-based scenario makes it possible to control for the amount of fracture needed to solve the task.  In addition, our experiments in the target-based scenario allow us to follow up on the experiments of Clune et al., shedding light on the reasons for HyperNEAT's failure, which were not completely apparent from the original experiments.


        2. Even given the caveat that it is a target-based scenario, it is surprising and disappointing how quickly HyperNEAT fails to find the fracture necessary to solve very small, easy target-based tasks, much easier than those considered by Woolley & Stanley.  Therefore, we think these results constitute valid evidence that fracture can be a challenge for HyperNEAT.


        3. In addition, most of the experiments in our paper were not on such target-based tasks, and the 'difficult' Walking Gait experiment provides additional evidence that fracture can be challenging for HyperNEAT, but now in a "regular" fitness-based scenario.


        4. An analogous caveat applies to the Picbreeder counterexamples that Ken mentions: they were generated using the "loose" fitness functions of interactive evolution, where many good solutions are possible and humans naturally provide intermediate gradients.  These results are beautiful, fascinating, and important, but they are no more representative of many "regular" fitness functions, which are encountered in a wide range of challenging optimization settings and for which good solutions are rare but not unique, than target-based fitness functions are.


        Obviously, more research needs to be done to explore different points on the spectrum between tight and loose fitness functions, and it's worthwhile to debate what parts of the spectrum we should care about.  But the fact that one set of our experiments looks at only one point on that spectrum does not make them misleading any more than the examples Ken provided are misleading.  It just means that, like any finite set of experiments, it cannot tell the whole story.


        5. Regarding the non-Picbreeder counterexamples of HyperNEAT evolving fracture: while these examples show that HyperNEAT can evolve fracture given "regular" fitness functions, they do not establish that such fracture is necessary.  A key distinction between our experiments and those that have previously been used to evaluate HyperNEAT is that we use baselines to establish the level of complexity that is actually needed to solve the task.  The results show that HyperNEAT sometimes evolves complex structures that look impressive but are largely superfluous.  Thus, an important caveat of these counterexamples is that we don't know how much of this fracture, if any, is needed to solve the task.


        Whether simpler topologies can represent fracture than what we claim.

        Ken provides some examples from Picbreeder to show how fracture can be represented in a very simply way.  While these examples are very cool, they actually do not contradict our assertions about what topologies are needed to represent fracture.  Both Ken's examples and our Figure 13 highlight the fact that representing fracture requires what we call an "indicator node" for outputting a component region in each of the phenotypes.  The only reason that Figure 13 is a bit more complicated than Ken's examples is that it is solving a harder problem: in addition to partitioning the space into regions, it fills each region with a particular repeated pattern.  This requires extra nodes to prepare the inputs to the indicator node.


        Note that Fig 13 is by no means a proof that a topology of this complexity is required to represent fracture: it's an upper bound, not a lower bound.  We know it can be done in this way and we are unable to think of a simpler way.  That doesn't prove that a simpler way doesn't exist, but, to date, no one has been able to show us a simpler way.  If our claim is not correct, everyone is free to prove us wrong by simply providing such an example.


        In conclusion

        We wrote this paper because we believe strongly in the promise of GDS, even when used with "regular" fitness functions. It seems clear that HyperNEAT paired with interactive evolution is a powerful tool.  But our experiments suggest that there are some potentially serious difficulties when it comes to the "regular" fitness functions that arise in a large range of important applications.  We don't think those limitations were sufficiently apparent from previous experimental analyses of HyperNEAT and we therefore hope that, in this respect, our paper will be a useful contribution.


        -------------------------------------------------------------

        Shimon Whiteson | Assistant Professor

        Intelligent Autonomous Systems Group

        Informatics Institute | University of Amsterdam

        -------------------------------------------------------------

        Science Park 904 | 1098 XH Amsterdam

        +31 (0)20.525.8701 | +31 (0)6.3851.0110

        http://staff.science.uva.nl/~whiteson

        -------------------------------------------------------------

      • Ken
        Hi Shimon, Let me respond to your three points as best I can at this point. I appreciate the effort you ve put into articulating your position and hope this
        Message 3 of 14 , May 28, 2013
        • 0 Attachment
          Hi Shimon,

          Let me respond to your three points as best I can at this point. I appreciate the effort you've put into articulating your position and hope this discussion remains positive and illuminating. Please note that we will release a more formal response with supporting data in the near future, but for the moment I will try to respond informally to the best of my ability with the understanding that we have collected data to support my argument.

          [For people who want a quick summary:

          1) In recent reimplementations by our group with only slightly different parameters and setups it appears regular HyperNEAT actually *can* solve the problems in the paper, suggesting that critical factors in the performance of HyperNEAT are not actually clarified by the paper.
          2) Shimon and I disagree on what "target-based" means. In my view the benchmark problems in the paper are just as "target-based" as the pattern-matching task, and I give details on why in this response, and on why that distinction is important for the future of indirect encoding.
          3) I argue that it doesn't really matter if fracture takes six or seven nodes to represent because that number is trivial in the grand scheme of things, and even Picbreeder genomes have many times more nodes than that.
          ]


          The detailed version:

          1) Regarding the code itself, we have at this point run enough experiments in reimplementations of both the triangles and the hard line following domain, and within PEAS-NEAT (sometimes with tweaks to parameters), to say with some confidence that HyperNEAT actually *does* solve these problems. Of course, because these problems are the basis for the main concern raised in the paper, the fact we can get good results with HyperNEAT on them is a significant problem for the main hypothesis.

          I do not want to imply the PEAS code is somehow fundamentally flawed or wrong, or that you and Thomas did not do your best to make it accurate. The main issue is just that HyperNEAT apparently can solve these tasks without much trouble.

          While it looks like getting HyperNEAT to solve these tasks may require a slightly different parameterization or setup than the one in the paper, the paper is claiming to be identifying "Critical Factors in the Performance of HyperNEAT." If the main factors in the performance of these problems turns out to be minor variations in their parameters, then that would suggest that the "factors" are misidentified in this paper, and that in fact HyperNEAT is not ill-equipped to solve these problems.

          However, I also should disclose that it appears to us at this time that these two domains that were intended to be harder *also* can yield good scores without hidden nodes, which actually means the domains are not that good for exploring the hypothesis advanced in the paper that there is some kind of problem with hidden nodes for HyperNEAT. At the same time, it also appear from our preliminary results in a different version of HyperNEAT that allowing hidden nodes may yield a bit better performance at least the triangles domain, but we are still determining if the difference is significant. At least it appears that a *perfect* solution (which it turns out HyperNEAT found at least once) does require hidden nodes. Therefore, to be fair, based on these domains and our preliminary analysis, the main hypothesis still may or may not be true, but these domains do not seem to be the best to provide evidence for it one way or another.

          As you noted, the third domain, the quadruped, is in another implementation, but given that we have been having good results with HyperNEAT in quadruped domains similar to the "hard" one in the paper (and in fact even harder)

          (for example http://eplex.cs.ucf.edu/papers/morse_gecco13.pdf and
          http://eplex.cs.ucf.edu/papers/risi_gecco13b.pdf)

          it is reasonable to speculate that a similar story could turn out true for the hard quadruped as well: it can probably be made to work with a little parameter tweaking, and may not even require hidden nodes either.

          So I think this hidden node issue remains an interesting open question, but the domains in this paper do not really seem to address it. I would not be surprised if there are indeed some domains that HyperNEAT has trouble solving as a direct objective if it requires multiple hidden nodes, but that is why ultimately a more important issue is probably the target-based one, which is the one you addressed next:

          2) We seem to disagree on what is "target-based." You say most experiments were "not on such target-based tasks," but I consider all the experiments in your paper target-based, including the ones you call "regular." Indeed, that's been a major problem in our field I've been trying to highlight: the "regular" way of doing business isn't going to scale because it is target-based.

          Running a controller-evolution problem with a strictly objective-driven fitness is analogous to evolving the output of a CPPN to produce a particular patten or image. That's the deeper point of my paper with Woolley, and an important lesson from novelty search in general: the field has been running target-based experiments for decades but evolution in nature does not work that way. "Walking" was never an explicit objective (i.e. target) for nature. It is the oblique achievement of such feats (i.e. despite never being formalized as an objective) that makes nature's accomplishments so interesting. Picbreeder's accomplishments are similarly oblique. And as we have observed with novelty search, when you take away that target (such as in the maze, or the biped), sometimes that's when you get even better results.

          So in my view all your experiments in this paper are target-based. Of course, like I said above, they seem nevertheless possible to solve by HyperNEAT (even though they are target-based), but more importantly even if they could not be solved as targets, in principle we would have to ask whether they are more appropriately solved with a stronger behavioral diversity mechanism. That is a reasonable hypothesis given results in recent years from our group and others.

          That observation is relevant because it would suggest qualifying your hypothesis. Instead of saying fracture is "highly problematic for HyperNEAT," the hypothesis would instead say, "fracture is sometimes problematic for HyperNEAT when fitness is expressed exclusively in a target-based fashion." Of course, my guess right now is that it is not particularly more problematic for HyperNEAT than any other genuine indirect encoding, and to the extent it's true, this statement would likely apply for other genuine indirect encodings as well.

          I say "genuine indirect encoding" because one of the big dangers in this kind of analysis is that we inadvertently glorify direct encodings. Often direct encodings will actually be better at hitting arbitrary targets. That's an unfortunate fact that is going to unravel a lot of our progress in indirect encoding if we continue returning to it. A direct encoding is like a Xerox machine, reproducing every pixel in an image independently. So of course it will copy arbitrary patterns better than an encoding that has to build up complexity compositionally, i.e. by establishing regularities first and then elaborating on them. The wavelet encoding in your paper is similar to this kind of direct encoding, placing independent blobs at arbitrary locations.

          My belief is that in this field we're facing a tradeoff that hasn't been discussed very much, but that is very fundamental and important:
          The more direct the mapping between genotype and final solution, the more suited direct encodings become to relatively simple target-based problems (e.g. under 1,000 dimensions), but we pay for that "quality" by committing to an encoding that can never generate genuinely high-complexity structures.

          So as long as we stick to the benchmark playground that we're used to (where most solutions require under 1,000 parts), we are going to keep fooling ourselves into revisiting direct-like encodings, which have no chance to ever produce anything like the complexity seen in nature.

          Thus in my view the final target-based validation in the paper is potentially misleading in this respect - it appears rational as a choice for validating the fracture hypothesis, but actually it is a *good* result for the indirect encoding: we would not want an indirect encoding to succeed in this type of scenario. If it did, it should raise the suspicion that it is a direct encoding in disguise, just a Xerox machine posing as as watchmaker (or watch-designer to be more accurate).

          3) [It's worth noting that there are a lot of fractures of the type with repeating patterns within fractured regions on Picbreeder as well, but I'll leave that to a different thread so as not to distract from the main discussion here.]

          I think you might have misunderstood my point about figure 13 (or maybe I didn't express it clearly enough). I don't think it's really important whether the exact number of nodes shown in figure 13 is necessary for fracture. The only point I meant to make is that you don't need a special "mask node" with an inversion operator like in that figure (since Picbreeder doesn't have such nodes). But sure, I agree that the exact number of hidden nodes needed for a particular fracture is probably similar to the number (i.e. six or seven) in your figure.

          But that isn't the important point from my perspective. The important point is that six or seven isn't very many hidden nodes in the grand scheme of things. The Picbreeder Apple has 83 nodes. The Car has 50, and the Skull 23. 6 is just a small fraction, and clearly poses no problem in hundreds or even thousands of examples of such structures evolved routinely on Picbreeder. Human DNA has about 30,000 genes; a gene is roughly analogous to a node in a CPPN (genes connect to each other through a GRN). 6 out of 30,000 is a tiny drop in the bucket. We should be aspiring to evolving genotypic structures with at least 50 nodes, and Picbreeder shows that for CPPNs such a feat is routine and unremarkable.

          The main point is, there's nothing particularly prohibitive about a substructure requiring a few nodes (and it happens routinely), so the motivation for the hypothesis, i.e. that requiring a few nodes to represent a particular motif is a "problem," is worth questioning.

          Now granted the question on whether HyperNEAT in a target-based scenario can evolve levels of complexity like in Picbreeder or Endlessforms is open, but like I said, I see that issue as a red herring because we need to be moving away from target-based formulations if we are to see the complexity that is achievable through indirect encoding. Target-based formulations will ultimately only drag us back to more direct encodings that do well in neat little benchmarks. That doesn't mean we can't use indirect encodings to solve problems; it just means that we need to learn to formalize our problems in a way that respects the need for phenotypic diversity and open-endedness.

          Ultimately we may even find that evolution in its most grandiose, i.e. when it finds things like humans, simply must be unchained completely from a specific target behavior. I don't know if that's true yet, but if it is, I feel no inhibition about pushing it down that road anyway. That is, even if it isn't really a "tool," it is still the greatest creative force ever to exist, and unchaining that force is no small feat even if it is not the solution to a benchmark problem. At the same time I hold out hope that specific problems can still be solved by indirect encodings as well as long as we respect their special need for open-endedness and diversity.

          "An unhappy and unhealthy indirect encoding is one with only a strict target to which to aspire."

          Best Regards,

          ken


          --- In neat@yahoogroups.com, "shimonw" <shimon@...> wrote:
          >
          >
          > Dear all,
          >
          >
          >
          >
          > I'd like to take the opportunity to respond to some of the
          > criticisms that Ken made about our GECCO-13 paper on HyperNeat.
          > I've organized the response according to what I consider the three
          > main issues of contention. Following Ken's example, I've
          > included a short summary at the beginning for those who lack the time or
          > interest to read the whole response. (FYI, this response was prepared
          > with the help of Thomas van den Berg, the first author of the paper in
          > question).
          >
          >
          >
          >
          > Short summary
          >
          > 1. There's no evidence of any problem with our code or experiments.
          >
          > 2. While it is an important caveat of one section of our paper that it
          > focuses on target-based scenarios, there are good reasons to do the
          > experiment this way, the results still provide substantial evidence that
          > fracture can be difficult for HyperNEAT, this is supported by additional
          > evidence not from target-based scenarios, and analogous caveats apply to
          > the counterexamples Ken provides.
          >
          > 3. Ken's examples of Picbreeder evolving fracture do not contradict
          > our claim about the topology needed to represent fracture but are in
          > fact consistent with it. If our claim is wrong, anyone can easily
          > disprove it by showing a simpler topology than in our Fig. 13 but to
          > date no one has done so.
          >
          >
          >
          >
          > And here's the full version:
          >
          >
          >
          >
          > Whether our implementation of HyperNEAT is valid:
          >
          > The PEAS implementation of NEAT and HyperNEAT is an attempt to create an
          > accessible python environment for experimenting with (generative)
          > evolutionary methods. The code was made public as soon as possible to
          > allow feedback, so we're happy that Ken and others are looking at it
          > and working with it now.
          >
          >
          >
          >
          > That said, we have several good reasons to believe that all of our
          > results are based on correctly functioning code:
          >
          >
          >
          >
          > 1. The code was built by closely following the specifications given in
          > published papers, and existing implementations were used as references
          > whenever the details were unclear.
          >
          >
          >
          >
          > 2. It was validated on multiple benchmarks from existing research,
          > including those we mention in our paper. We have not found—and no
          > one has shown us—any evidence that suggests our implementation is
          > invalid or that its results are inconsistent with those of existing
          > implementations.
          >
          >
          >
          >
          > 3. Not all of the experiments in our paper use PEAS. The fact that we
          > get qualitatively similar results on the Walking Gait task using Clune
          > et al.'s implementation is evidence for both our hypothesis and for
          > the correctness of our code.
          >
          >
          >
          >
          > Whether our experiments are misleading because they use a target-based
          > scenario:
          >
          > It is indeed an important caveat of the experiments we present in
          > Section 6 of our paper that they use a target-based scenario (evolving
          > towards a fixed configuration of weights) with a "tight" fitness
          > function, as we explicitly acknowledge in the paper. However:
          >
          >
          >
          >
          > 1. There are good reasons to do the experiment this way: because the
          > definition of fracture is inherently generative, only a target-based
          > scenario makes it possible to control for the amount of fracture needed
          > to solve the task. In addition, our experiments in the target-based
          > scenario allow us to follow up on the experiments of Clune et al.,
          > shedding light on the reasons for HyperNEAT's failure, which were
          > not completely apparent from the original experiments.
          >
          >
          >
          >
          > 2. Even given the caveat that it is a target-based scenario, it is
          > surprising and disappointing how quickly HyperNEAT fails to find the
          > fracture necessary to solve very small, easy target-based tasks, much
          > easier than those considered by Woolley & Stanley. Therefore, we think
          > these results constitute valid evidence that fracture can be a challenge
          > for HyperNEAT.
          >
          >
          >
          >
          > 3. In addition, most of the experiments in our paper were not on such
          > target-based tasks, and the 'difficult' Walking Gait experiment provides
          > additional evidence that fracture can be challenging for HyperNEAT, but
          > now in a "regular" fitness-based scenario.
          >
          >
          >
          >
          > 4. An analogous caveat applies to the Picbreeder counterexamples that
          > Ken mentions: they were generated using the "loose" fitness
          > functions of interactive evolution, where many good solutions are
          > possible and humans naturally provide intermediate gradients. These
          > results are beautiful, fascinating, and important, but they are no more
          > representative of many "regular" fitness functions, which are
          > encountered in a wide range of challenging optimization settings and for
          > which good solutions are rare but not unique, than target-based fitness
          > functions are.
          >
          >
          >
          >
          > Obviously, more research needs to be done to explore different points on
          > the spectrum between tight and loose fitness functions, and it's
          > worthwhile to debate what parts of the spectrum we should care about.
          > But the fact that one set of our experiments looks at only one point on
          > that spectrum does not make them misleading any more than the examples
          > Ken provided are misleading. It just means that, like any finite set of
          > experiments, it cannot tell the whole story.
          >
          >
          >
          >
          > 5. Regarding the non-Picbreeder counterexamples of HyperNEAT evolving
          > fracture: while these examples show that HyperNEAT can evolve fracture
          > given "regular" fitness functions, they do not establish that
          > such fracture is necessary. A key distinction between our experiments
          > and those that have previously been used to evaluate HyperNEAT is that
          > we use baselines to establish the level of complexity that is actually
          > needed to solve the task. The results show that HyperNEAT sometimes
          > evolves complex structures that look impressive but are largely
          > superfluous. Thus, an important caveat of these counterexamples is that
          > we don't know how much of this fracture, if any, is needed to solve
          > the task.
          >
          >
          >
          >
          > Whether simpler topologies can represent fracture than what we claim.
          >
          > Ken provides some examples from Picbreeder to show how fracture can be
          > represented in a very simply way. While these examples are very cool,
          > they actually do not contradict our assertions about what topologies are
          > needed to represent fracture. Both Ken's examples and our Figure 13
          > highlight the fact that representing fracture requires what we call an
          > "indicator node" for outputting a component region in each of
          > the phenotypes. The only reason that Figure 13 is a bit more
          > complicated than Ken's examples is that it is solving a harder
          > problem: in addition to partitioning the space into regions, it fills
          > each region with a particular repeated pattern. This requires extra
          > nodes to prepare the inputs to the indicator node.
          >
          >
          >
          >
          > Note that Fig 13 is by no means a proof that a topology of this
          > complexity is required to represent fracture: it's an upper bound,
          > not a lower bound. We know it can be done in this way and we are unable
          > to think of a simpler way. That doesn't prove that a simpler way
          > doesn't exist, but, to date, no one has been able to show us a
          > simpler way. If our claim is not correct, everyone is free to prove us
          > wrong by simply providing such an example.
          >
          >
          >
          >
          > In conclusion
          >
          > We wrote this paper because we believe strongly in the promise of GDS,
          > even when used with "regular" fitness functions. It seems clear
          > that HyperNEAT paired with interactive evolution is a powerful tool.
          > But our experiments suggest that there are some potentially serious
          > difficulties when it comes to the "regular" fitness functions
          > that arise in a large range of important applications. We don't
          > think those limitations were sufficiently apparent from previous
          > experimental analyses of HyperNEAT and we therefore hope that, in this
          > respect, our paper will be a useful contribution.
          >
          >
          >
          >
          > -------------------------------------------------------------
          >
          > Shimon Whiteson | Assistant Professor
          >
          > Intelligent Autonomous Systems Group
          >
          > Informatics Institute | University of Amsterdam
          >
          > -------------------------------------------------------------
          >
          > Science Park 904 | 1098 XH Amsterdam
          >
          > +31 (0)20.525.8701 | +31 (0)6.3851.0110
          >
          > http://staff.science.uva.nl/~whiteson
          > <http://staff.science.uva.nl/~whiteson>
          >
          >
          >
          > -------------------------------------------------------------
          >
        • shimonw
          Hi Ken, Thanks for your interesting and thought-provoking comments. I ll give some brief reactions here, and I hope we can have a productive discussion about
          Message 4 of 14 , Jun 7, 2013
          • 0 Attachment
            Hi Ken,

            Thanks for your interesting and thought-provoking comments. I'll give some brief reactions here, and I hope we can have a productive discussion about it at GECCO.

            1) I won't comment for now on the new experiments you've been running because the details are not available yet. Instead, I will just comment on the various walking gait domains, because the details of the additional results you mentioned are in that case already available, in newly published GECCO papers.

            FIrst of all, these are very cool results, and nice success stories for HyperNEAT. It's great to see that the GDS community is continuing the push the envelope in terms of robot locomotion, which seems to be emerging as a nice application for indirect encodings.

            However, I would simply caution against placing all these variations on the walking gait task on a one-dimensional spectrum of difficulty. In our paper, we took the orignal walking gait task and made only one change (increasing the mass of the torso) which clearly makes the task harder. In the papers Ken mentioned, there are other differences, e.g., the use of an SUPG encoding, the use of CTRNNs, the use of a different substrate topology, and varying the length of the legs.

            It's completely legitimate to make these changes and they don't detract from the results at all. But they are confounding factors when it comes to comparing HyperNEAT's performance on walking gait tasks of different difficulties. Some of these changes may make the task harder in some ways but other changes may make it easier in others ways, and so far we don't have experimental results that control for these factors.

            2) I hope to avoid a semantic debate about what is a "target-based" FF. If you want to include what I was calling a "regular" FF in the scope of target-based FFs, I won't object. But my point is that, regardless of what you call them, there is an important difference between FFs that require a single specific phenotype and FFs that require only that some goal be achieved. For the latter, there may be many ways to skin the cat, though in any interesting problem, good solutions are still quite rare. The distinction is important because, while the former category is arguably only of theoretical interest, the second category corresponds to what I consider the most interesting and important class of FFs, because these are the FFs that naturally arise in a very large number of real-world optimization problems.

            So, using your terminology, I agree it is reasonable to say HyperNEAT's difficulties with fracture are probably limited to "target-based" FFs. But for me, this is a potentially serious restriction, since target-based FFs include not just pathological FFs but a large class of real-world FFs.

            Regarding the wavelet method, I would still definitely call this an indirect encoding, but it is in some sense "less indirect" because it lacks the compositional functions of HyperNEAT. It's a perfectly reasonable hypothesis that such compositionality is advantageous in some ways. However, if that's true, then we should be able to demonstrate that by finding tasks where HyperNEAT outperforms the wavelet method (FYI, in the extended analysis Thomas is conducting for his master's thesis, we have found one variation of the line-following task where this is the case). So in that sense, the wavelet method is a useful baseline, which is the whole reason we introduced it.

            3) If it's true that it takes ~6 nodes to represent fracture, then I think this a potentially important point. On regular FFs, it could contribute to the difficulty of discovering fracture, since representations that have only some of the necessary components may not be rewarded (I think you would call this "deception"). In Picbreeder-like tasks, we already see that fracture can evolve anyway, but this doesn't mean it wouldn't do it even better if fracture required fewer nodes to represent. If we think that the solutions to many interesting problems require fracture, then I think it's at least possible that one could devise a better representation for such problems than the CPPNs used in HyperNEAT.

            As I understand it, your approach to dealing with "regular" FFs is to reformulate them "in a way that respects the need for phenotypic diversity and open-endedness." That's a very interesting approach and I think it has a lot of merit, but I don't think it will solve all our problems because it presupposes that such a reformulation is possible (or that the prior knowledge it requires is available). In some cases, I think we are just stuck with hard FFs and we need to develop better methods to deal with them.

            Regarding the idea that "an unhappy and unhealthy indirect encoding is one with only a strict target to which to aspire": this is a very intriguing hypothesis, but to me it is only a hypothesis. I think it's true of existing indirect encodings, but that for me does not in any way rule out the possibility of developing a different indirect encoding for which that is not true, and I think that's one thing we should be working on. One of the reasons that I'm interested in indirect encodings is that I strongly believe this can be done.

            Cheers,
            Shimon
          • Ken
            Hi Shimon, These are some really interesting and deep issues we re getting into. I am sure they will be the foundation of great discussions at GECCO. I ll
            Message 5 of 14 , Jun 7, 2013
            • 0 Attachment
              Hi Shimon,

              These are some really interesting and deep issues we're getting into.  I am sure they will be the foundation of great discussions at GECCO.  I'll try to keep my response more brief than before.

              1) Indeed recent quadruped results are not in the same setup as in your work; that is a legitimate qualification and they do not constitute a proof that in your particular setup HyperNEAT would succeed.  However, they are arguably at least comparably complex.  For example, the CTRNN in
              http://eplex.cs.ucf.edu/papers/risi_gecco13b.pdf  is a complicated structure and the CPPN that encodes it is required not only to generate a controller for a single quadruped morphology, but to generate multiple controllers for multiple morphologies, all from the same CPPN.  A video of three controllers all from the same CPPN is here: http://youtu.be/oLSSt5GyHNk 

              2) I think the question of what is "target-based" can actually be interesting, but I completely agree that the semantic side of the argument isn't really important.  But what I do find interesting is the idea that these two scenarios differ substantively in your mind (as you put it):

              "there is an important difference between FFs that require a single specific phenotype and FFs that require only that some goal be achieved"

              I think by "single specific phenotype" you mean pattern-matching (i.e. image-matching).  But I think this distinction doesn't really exist:  There are an unlimited number of networks (i.e. CPPNs) that can encode a particular two-dimensional image, just as there are an unlimited number of CPPNs that can encode a particular behavior or goal (such as following a line). 

              What makes both target-based and fundamentally the same is not the number of ways they can be solved, but rather the way the fitness function interacts with the search process.  The problem is that target-based fitness encourages piece-wise incremental progress, which tends to be highly deceptive.  For example, with an image, matching a single pixel can raise fitness even if the function that causes that pixel to be matched has no relationship whatsoever to the overall regularities in the target pattern.  The same is true in control tasks:  A mutation that causes a quadruped to lunge forward clumsily is rewarded for the slight improvement in fitness, whereas an important underlying regularity (e.g. oscillation) necessary to get really good at the task is never established.  Non-target-based fitness (or what I call "non-objective") avoids these problems because it *does* reward establishing regularities: it appreciates new regularities even if they don't immediately raise fitness.

              But what's most interesting to me is your intuition that there is some way to sidestep the tradeoff in combining indirect encodings with target-based fitness functions.  That is, you suspect there may exist an indirect encoding that can both elaborate regularities compositionally over very many generations and also almost deterministically satisfy an arbitrary target-based objective.  I am convinced that is not the case  (it would be too good to be true), but you rightly point out that my position is a hypothesis.   However, just as a thought experiment, can you really imagine any neural network encoding (including DNA) that would give you Einstein-level intelligence if the target was simply to score maximally on an IQ test?  It is plain to me that the stepping stones to human level cannot be crossed in the presence of a target-based objective (or any one conceivable).  However, interestingly, that does not mean that such a neural network does not exist in the search space.  It just means a target won't get you there.

              3) We apparently disagree on whether 6 is a big number (for nodes needed to represent fracture).  If you try to devise an encoding that represents fracture with fewer than that, my intuition is that you would be sacrificing an enormous breadth of flexibility in the kinds of fracture (and patterns in general) that you can represent.  Think about the number of "degrees of freedom" in fracture: it's not really well-defined, but a single line of fracture has a lot of dimensions:

              -position
              -orientation
              -curvature (which can be arbitrarily complex)
              -fracture gradient (that is, one region can smoothly transition into another in a kind of interpolated transitional zone)
              -embedding (fracture can potentially be hierarchical or regular, i.e. dependent on higher-level containing regions)

              The idea that all of that can be encoded in anything less than about 6 simple functions seems optimistic.  Of course, if you are willing to sacrifice some of those degrees of freedom, then you can get fewer nodes, but the exquisite minutia of such patterns would be lost (and you'd move more towards direct encoding, like the wavelets do). 

              By the way, here are some cool examples of complicated kinds of fracture from Picbreeder (such as fractured periodic zones, but even more complex than even that):

              Nontrivial color patterns in each zone:
              http://picbreeder.org/search/showgenome.php?sid=11328


              Very complex brush-stroke pattern inside the apple vs. outside (the DNA for this one is fascinating):
              http://picbreeder.org/search/showgenome.php?sid=7109



              Fractured regions covering a distorted surface, giving a remarkable underlying curvature:
              http://picbreeder.org/search/showgenome.php?sid=2684


              A color variant: http://picbreeder.org/search/showgenome.php?sid=6652


              When I see so many remarkable examples like these on Picbreeder, the message to me is that we have only scratched the surface of what CPPN encoding can teach us about representation.  That's why I think it's premature (in the face of such a treasure trove of evidence) to be worrying about whether we can represent some subcomponent of such patterns with less than 6 nodes.  At present I'm amazed they can be represented at all, let alone with under 6 nodes.

              Thanks very much for raising these issues in any case - it's a great debate to have about representation and search and goes to the heart of the field of GDS/indirect encoding.

              Best,

              ken

              --- In neat@yahoogroups.com, "shimonw" wrote:
              >
              > Hi Ken,
              >
              > Thanks for your interesting and thought-provoking comments. I'll give some brief reactions here, and I hope we can have a productive discussion about it at GECCO.
              >
              > 1) I won't comment for now on the new experiments you've been running because the details are not available yet. Instead, I will just comment on the various walking gait domains, because the details of the additional results you mentioned are in that case already available, in newly published GECCO papers.
              >
              > FIrst of all, these are very cool results, and nice success stories for HyperNEAT. It's great to see that the GDS community is continuing the push the envelope in terms of robot locomotion, which seems to be emerging as a nice application for indirect encodings.
              >
              > However, I would simply caution against placing all these variations on the walking gait task on a one-dimensional spectrum of difficulty. In our paper, we took the orignal walking gait task and made only one change (increasing the mass of the torso) which clearly makes the task harder. In the papers Ken mentioned, there are other differences, e.g., the use of an SUPG encoding, the use of CTRNNs, the use of a different substrate topology, and varying the length of the legs.
              >
              > It's completely legitimate to make these changes and they don't detract from the results at all. But they are confounding factors when it comes to comparing HyperNEAT's performance on walking gait tasks of different difficulties. Some of these changes may make the task harder in some ways but other changes may make it easier in others ways, and so far we don't have experimental results that control for these factors.
              >
              > 2) I hope to avoid a semantic debate about what is a "target-based" FF. If you want to include what I was calling a "regular" FF in the scope of target-based FFs, I won't object. But my point is that, regardless of what you call them, there is an important difference between FFs that require a single specific phenotype and FFs that require only that some goal be achieved. For the latter, there may be many ways to skin the cat, though in any interesting problem, good solutions are still quite rare. The distinction is important because, while the former category is arguably only of theoretical interest, the second category corresponds to what I consider the most interesting and important class of FFs, because these are the FFs that naturally arise in a very large number of real-world optimization problems.
              >
              > So, using your terminology, I agree it is reasonable to say HyperNEAT's difficulties with fracture are probably limited to "target-based" FFs. But for me, this is a potentially serious restriction, since target-based FFs include not just pathological FFs but a large class of real-world FFs.
              >
              > Regarding the wavelet method, I would still definitely call this an indirect encoding, but it is in some sense "less indirect" because it lacks the compositional functions of HyperNEAT. It's a perfectly reasonable hypothesis that such compositionality is advantageous in some ways. However, if that's true, then we should be able to demonstrate that by finding tasks where HyperNEAT outperforms the wavelet method (FYI, in the extended analysis Thomas is conducting for his master's thesis, we have found one variation of the line-following task where this is the case). So in that sense, the wavelet method is a useful baseline, which is the whole reason we introduced it.
              >
              > 3) If it's true that it takes ~6 nodes to represent fracture, then I think this a potentially important point. On regular FFs, it could contribute to the difficulty of discovering fracture, since representations that have only some of the necessary components may not be rewarded (I think you would call this "deception"). In Picbreeder-like tasks, we already see that fracture can evolve anyway, but this doesn't mean it wouldn't do it even better if fracture required fewer nodes to represent. If we think that the solutions to many interesting problems require fracture, then I think it's at least possible that one could devise a better representation for such problems than the CPPNs used in HyperNEAT.
              >
              > As I understand it, your approach to dealing with "regular" FFs is to reformulate them "in a way that respects the need for phenotypic diversity and open-endedness." That's a very interesting approach and I think it has a lot of merit, but I don't think it will solve all our problems because it presupposes that such a reformulation is possible (or that the prior knowledge it requires is available). In some cases, I think we are just stuck with hard FFs and we need to develop better methods to deal with them.
              >
              > Regarding the idea that "an unhappy and unhealthy indirect encoding is one with only a strict target to which to aspire": this is a very intriguing hypothesis, but to me it is only a hypothesis. I think it's true of existing indirect encodings, but that for me does not in any way rule out the possibility of developing a different indirect encoding for which that is not true, and I think that's one thing we should be working on. One of the reasons that I'm interested in indirect encodings is that I strongly believe this can be done.
              >
              > Cheers,
              > Shimon
              >
            • shimonw
              Hi Ken, If I may, I d like to offer a couple reactions to your latest message. This is indeed an interesting discussion that touches on many potentially
              Message 6 of 14 , Jun 17, 2013
              • 0 Attachment
                Hi Ken,

                If I may, I'd like to offer a couple reactions to your latest message. This is indeed an interesting discussion that touches on many potentially important issues. I hope I can offer a useful perspective on these topics.

                You are right that in a target-based scenario there can be many CPPNs that generate the same phenotype. But the CPPN is a genotype, not a phenotype. In what I'm calling "regular" FFs, there are potentially many *phenotypes* that solve the problem, and for each of them potentially many genotypes. So in my mind, there is still a useful distinction between target based FFs and regular FFs, though they are perhaps best seen as different points on a spectrum.

                If I understand correctly, you are suggesting that the important distinction is how the FF interacts with the search process. But to me, this is a difficult criterion to use because the search process could be anything. The space of possible black-box optimization methods is huge, and what encourages piecewise incremental progress for one method may not for another.

                I don't pretend to know what kind of algorithm would generate Einstein-level intelligence (I don't even know if this is a useful goal). But as a researcher I highly respect once told me, "your failure to imagine it does not constitute a proof that it cannot be done."

                That's why I think your hypothesis is quite a bold one, because it looks at the limitations of a specific class of black-box optimization methods and generalizes them to all possible black-box optimization methods.

                To give just one example of the possibilities here: algorithms for continuous-armed bandits, such as X-armed bandits and Bayesian optimization methods such as GP-UCB, can be applied to black-box optimization. These methods avoid the problem of local optima in a principled way, by maintaining a posterior distribution over the global FF, and using optimism in the face of uncertainty to ensure sufficient exploration of the solution space. As a result, these methods have very nice theoretical properties, like guaranteed convergence to the global optimum in the limit and logarithmic regret bounds along the way.

                As far as I know, no one has tried to develop versions of these methods that can discover NN topologies, and which would thus be suitable for optimizing CPPNs or other indirect encodings. But there's no a priori reason to think it can't be done. I'm not saying this would necessarily work or even that it's the most promising approach to explore. I'm just saying it's one example of a principled approach that could avoid all the difficulties you mention and which we currently have no strong reason to eliminate from contention.

                So I think it's quite likely that these hard FFs are not unsolvable, but just that current methods cannot solve them. Rather than giving up on them, in my opinion we should be focusing on developing better methods for them.

                Regarding whether 6 nodes is a lot to represent fracture, I think the same point applies: just because we can't think of a better way to do it, doesn't mean it doesn't exist. Our paper establishes 6 as an upper bound, which can be easily done by example. If I understand correctly, you are suggesting that 6 may be a lower bound, which is much more difficult to demonstrate.

                I could definitely imagine that a different type of network, or one with a different set of activation functions, might be able to make better use of 6 nodes. Even if it's true that there's a trade-off, and using fewer nodes means less flexibility, there may be many cases where that's a favorable trade-off. We should at least be open to the possibility that it some cases the sweet spot is not a representation that requires 6 nodes for fracture.

                I'm looking forward to chatting more about this at GECCO.

                Cheers,
                Shimon

                --- In neat@yahoogroups.com, "Ken" <kstanley@...> wrote:
                >
                > Hi Shimon,
                >
                > These are some really interesting and deep issues we're getting into. I
                > am sure they will be the foundation of great discussions at GECCO. I'll
                > try to keep my response more brief than before.
                >
                > 1) Indeed recent quadruped results are not in the same setup as in your
                > work; that is a legitimate qualification and they do not constitute a
                > proof that in your particular setup HyperNEAT would succeed. However,
                > they are arguably at least comparably complex. For example, the CTRNN
                > in
                > http://eplex.cs.ucf.edu/papers/risi_gecco13b.pdf
                > <http://eplex.cs.ucf.edu/papers/risi_gecco13b.pdf> is a complicated
                > structure and the CPPN that encodes it is required not only to generate
                > a controller for a single quadruped morphology, but to generate multiple
                > controllers for multiple morphologies, all from the same CPPN. A video
                > of three controllers all from the same CPPN is here:
                > http://youtu.be/oLSSt5GyHNk <http://youtu.be/oLSSt5GyHNk>
                >
                > 2) I think the question of what is "target-based" can actually be
                > interesting, but I completely agree that the semantic side of the
                > argument isn't really important. But what I do find interesting is the
                > idea that these two scenarios differ substantively in your mind (as you
                > put it):
                >
                > "there is an important difference between FFs that require a single
                > specific phenotype and FFs that require only that some goal be achieved"
                >
                > I think by "single specific phenotype" you mean pattern-matching (i.e.
                > image-matching). But I think this distinction doesn't really exist:
                > There are an unlimited number of networks (i.e. CPPNs) that can encode a
                > particular two-dimensional image, just as there are an unlimited number
                > of CPPNs that can encode a particular behavior or goal (such as
                > following a line).
                >
                > What makes both target-based and fundamentally the same is not the
                > number of ways they can be solved, but rather the way the fitness
                > function interacts with the search process. The problem is that
                > target-based fitness encourages piece-wise incremental progress, which
                > tends to be highly deceptive. For example, with an image, matching a
                > single pixel can raise fitness even if the function that causes that
                > pixel to be matched has no relationship whatsoever to the overall
                > regularities in the target pattern. The same is true in control tasks:
                > A mutation that causes a quadruped to lunge forward clumsily is rewarded
                > for the slight improvement in fitness, whereas an important underlying
                > regularity (e.g. oscillation) necessary to get really good at the task
                > is never established. Non-target-based fitness (or what I call
                > "non-objective") avoids these problems because it *does* reward
                > establishing regularities: it appreciates new regularities even if they
                > don't immediately raise fitness.
                >
                > But what's most interesting to me is your intuition that there is some
                > way to sidestep the tradeoff in combining indirect encodings with
                > target-based fitness functions. That is, you suspect there may exist an
                > indirect encoding that can both elaborate regularities compositionally
                > over very many generations and also almost deterministically satisfy an
                > arbitrary target-based objective. I am convinced that is not the case
                > (it would be too good to be true), but you rightly point out that my
                > position is a hypothesis. However, just as a thought experiment, can
                > you really imagine any neural network encoding (including DNA) that
                > would give you Einstein-level intelligence if the target was simply to
                > score maximally on an IQ test? It is plain to me that the stepping
                > stones to human level cannot be crossed in the presence of a
                > target-based objective (or any one conceivable). However,
                > interestingly, that does not mean that such a neural network does not
                > exist in the search space. It just means a target won't get you there.
                >
                > 3) We apparently disagree on whether 6 is a big number (for nodes needed
                > to represent fracture). If you try to devise an encoding that
                > represents fracture with fewer than that, my intuition is that you would
                > be sacrificing an enormous breadth of flexibility in the kinds of
                > fracture (and patterns in general) that you can represent. Think about
                > the number of "degrees of freedom" in fracture: it's not really
                > well-defined, but a single line of fracture has a lot of dimensions:
                >
                > -position
                > -orientation
                > -curvature (which can be arbitrarily complex)
                > -fracture gradient (that is, one region can smoothly transition into
                > another in a kind of interpolated transitional zone)
                > -embedding (fracture can potentially be hierarchical or regular, i.e.
                > dependent on higher-level containing regions)
                >
                > The idea that all of that can be encoded in anything less than about 6
                > simple functions seems optimistic. Of course, if you are willing to
                > sacrifice some of those degrees of freedom, then you can get fewer
                > nodes, but the exquisite minutia of such patterns would be lost (and
                > you'd move more towards direct encoding, like the wavelets do).
                >
                > By the way, here are some cool examples of complicated kinds of fracture
                > from Picbreeder (such as fractured periodic zones, but even more complex
                > than even that):
                >
                > Nontrivial color patterns in each zone:
                > http://picbreeder.org/search/showgenome.php?sid=11328
                >
                >
                > Very complex brush-stroke pattern inside the apple vs. outside (the DNA
                > for this one is fascinating):
                > http://picbreeder.org/search/showgenome.php?sid=7109
                >
                >
                >
                > Fractured regions covering a distorted surface, giving a remarkable
                > underlying curvature:
                > http://picbreeder.org/search/showgenome.php?sid=2684
                >
                >
                > A color variant: http://picbreeder.org/search/showgenome.php?sid=6652
                >
                >
                > When I see so many remarkable examples like these on Picbreeder, the
                > message to me is that we have only scratched the surface of what CPPN
                > encoding can teach us about representation. That's why I think it's
                > premature (in the face of such a treasure trove of evidence) to be
                > worrying about whether we can represent some subcomponent of such
                > patterns with less than 6 nodes. At present I'm amazed they can be
                > represented at all, let alone with under 6 nodes.
                >
                > Thanks very much for raising these issues in any case - it's a great
                > debate to have about representation and search and goes to the heart of
                > the field of GDS/indirect encoding.
                >
                > Best,
                >
                > ken
                >
                > --- In neat@yahoogroups.com, "shimonw" wrote:
                > >
                > > Hi Ken,
                > >
                > > Thanks for your interesting and thought-provoking comments. I'll give
                > some brief reactions here, and I hope we can have a productive
                > discussion about it at GECCO.
                > >
                > > 1) I won't comment for now on the new experiments you've been running
                > because the details are not available yet. Instead, I will just comment
                > on the various walking gait domains, because the details of the
                > additional results you mentioned are in that case already available, in
                > newly published GECCO papers.
                > >
                > > FIrst of all, these are very cool results, and nice success stories
                > for HyperNEAT. It's great to see that the GDS community is continuing
                > the push the envelope in terms of robot locomotion, which seems to be
                > emerging as a nice application for indirect encodings.
                > >
                > > However, I would simply caution against placing all these variations
                > on the walking gait task on a one-dimensional spectrum of difficulty.
                > In our paper, we took the orignal walking gait task and made only one
                > change (increasing the mass of the torso) which clearly makes the task
                > harder. In the papers Ken mentioned, there are other differences, e.g.,
                > the use of an SUPG encoding, the use of CTRNNs, the use of a different
                > substrate topology, and varying the length of the legs.
                > >
                > > It's completely legitimate to make these changes and they don't
                > detract from the results at all. But they are confounding factors when
                > it comes to comparing HyperNEAT's performance on walking gait tasks of
                > different difficulties. Some of these changes may make the task harder
                > in some ways but other changes may make it easier in others ways, and so
                > far we don't have experimental results that control for these factors.
                > >
                > > 2) I hope to avoid a semantic debate about what is a "target-based"
                > FF. If you want to include what I was calling a "regular" FF in the
                > scope of target-based FFs, I won't object. But my point is that,
                > regardless of what you call them, there is an important difference
                > between FFs that require a single specific phenotype and FFs that
                > require only that some goal be achieved. For the latter, there may be
                > many ways to skin the cat, though in any interesting problem, good
                > solutions are still quite rare. The distinction is important because,
                > while the former category is arguably only of theoretical interest, the
                > second category corresponds to what I consider the most interesting and
                > important class of FFs, because these are the FFs that naturally arise
                > in a very large number of real-world optimization problems.
                > >
                > > So, using your terminology, I agree it is reasonable to say
                > HyperNEAT's difficulties with fracture are probably limited to
                > "target-based" FFs. But for me, this is a potentially serious
                > restriction, since target-based FFs include not just pathological FFs
                > but a large class of real-world FFs.
                > >
                > > Regarding the wavelet method, I would still definitely call this an
                > indirect encoding, but it is in some sense "less indirect" because it
                > lacks the compositional functions of HyperNEAT. It's a perfectly
                > reasonable hypothesis that such compositionality is advantageous in some
                > ways. However, if that's true, then we should be able to demonstrate
                > that by finding tasks where HyperNEAT outperforms the wavelet method
                > (FYI, in the extended analysis Thomas is conducting for his master's
                > thesis, we have found one variation of the line-following task where
                > this is the case). So in that sense, the wavelet method is a useful
                > baseline, which is the whole reason we introduced it.
                > >
                > > 3) If it's true that it takes ~6 nodes to represent fracture, then I
                > think this a potentially important point. On regular FFs, it could
                > contribute to the difficulty of discovering fracture, since
                > representations that have only some of the necessary components may not
                > be rewarded (I think you would call this "deception"). In
                > Picbreeder-like tasks, we already see that fracture can evolve anyway,
                > but this doesn't mean it wouldn't do it even better if fracture required
                > fewer nodes to represent. If we think that the solutions to many
                > interesting problems require fracture, then I think it's at least
                > possible that one could devise a better representation for such problems
                > than the CPPNs used in HyperNEAT.
                > >
                > > As I understand it, your approach to dealing with "regular" FFs is to
                > reformulate them "in a way that respects the need for phenotypic
                > diversity and open-endedness." That's a very interesting approach and I
                > think it has a lot of merit, but I don't think it will solve all our
                > problems because it presupposes that such a reformulation is possible
                > (or that the prior knowledge it requires is available). In some cases,
                > I think we are just stuck with hard FFs and we need to develop better
                > methods to deal with them.
                > >
                > > Regarding the idea that "an unhappy and unhealthy indirect encoding is
                > one with only a strict target to which to aspire": this is a very
                > intriguing hypothesis, but to me it is only a hypothesis. I think it's
                > true of existing indirect encodings, but that for me does not in any way
                > rule out the possibility of developing a different indirect encoding for
                > which that is not true, and I think that's one thing we should be
                > working on. One of the reasons that I'm interested in indirect
                > encodings is that I strongly believe this can be done.
                > >
                > > Cheers,
                > > Shimon
                > >
                >
              • Ken Lloyd
                Shimon, Neural networks are obviously represented by (possibly recurrent) graph networks. The connective dynamics of graph networks, and their distribution,
                Message 7 of 14 , Jun 17, 2013
                • 0 Attachment

                  Shimon,

                   

                  Neural networks are obviously represented by (possibly recurrent) graph networks.  The connective dynamics of graph networks, and their distribution, may be modeled by graph thermodynamics and simulated annealing (see Fruchterman – Reingold or Kamada – Kawai).  This helps avoid the local minima problem.  Whether the “schedule” of the thermodynamics is fixed, dependent on the context, or “other” is an open question.

                   

                  This behavior can generally be related to clustering and “modularity” where the weight functions and distance is meaningful in both the genotype and phenotype.

                   

                  Ken Lloyd

                   

                  From: neat@yahoogroups.com [mailto:neat@yahoogroups.com] On Behalf Of shimonw
                  Sent: Monday, June 17, 2013 5:55 AM
                  To: neat@yahoogroups.com
                  Subject: [neat] Re: GECCO Paper on HyperNEAT

                   

                   

                  Hi Ken,

                  If I may, I'd like to offer a couple reactions to your latest message. This is indeed an interesting discussion that touches on many potentially important issues. I hope I can offer a useful perspective on these topics.

                  You are right that in a target-based scenario there can be many CPPNs that generate the same phenotype. But the CPPN is a genotype, not a phenotype. In what I'm calling "regular" FFs, there are potentially many *phenotypes* that solve the problem, and for each of them potentially many genotypes. So in my mind, there is still a useful distinction between target based FFs and regular FFs, though they are perhaps best seen as different points on a spectrum.

                  If I understand correctly, you are suggesting that the important distinction is how the FF interacts with the search process. But to me, this is a difficult criterion to use because the search process could be anything. The space of possible black-box optimization methods is huge, and what encourages piecewise incremental progress for one method may not for another.

                  I don't pretend to know what kind of algorithm would generate Einstein-level intelligence (I don't even know if this is a useful goal). But as a researcher I highly respect once told me, "your failure to imagine it does not constitute a proof that it cannot be done."

                  That's why I think your hypothesis is quite a bold one, because it looks at the limitations of a specific class of black-box optimization methods and generalizes them to all possible black-box optimization methods.

                  To give just one example of the possibilities here: algorithms for continuous-armed bandits, such as X-armed bandits and Bayesian optimization methods such as GP-UCB, can be applied to black-box optimization. These methods avoid the problem of local optima in a principled way, by maintaining a posterior distribution over the global FF, and using optimism in the face of uncertainty to ensure sufficient exploration of the solution space. As a result, these methods have very nice theoretical properties, like guaranteed convergence to the global optimum in the limit and logarithmic regret bounds along the way.

                  As far as I know, no one has tried to develop versions of these methods that can discover NN topologies, and which would thus be suitable for optimizing CPPNs or other indirect encodings. But there's no a priori reason to think it can't be done. I'm not saying this would necessarily work or even that it's the most promising approach to explore. I'm just saying it's one example of a principled approach that could avoid all the difficulties you mention and which we currently have no strong reason to eliminate from contention.

                  So I think it's quite likely that these hard FFs are not unsolvable, but just that current methods cannot solve them. Rather than giving up on them, in my opinion we should be focusing on developing better methods for them.

                  Regarding whether 6 nodes is a lot to represent fracture, I think the same point applies: just because we can't think of a better way to do it, doesn't mean it doesn't exist. Our paper establishes 6 as an upper bound, which can be easily done by example. If I understand correctly, you are suggesting that 6 may be a lower bound, which is much more difficult to demonstrate.

                  I could definitely imagine that a different type of network, or one with a different set of activation functions, might be able to make better use of 6 nodes. Even if it's true that there's a trade-off, and using fewer nodes means less flexibility, there may be many cases where that's a favorable trade-off. We should at least be open to the possibility that it some cases the sweet spot is not a representation that requires 6 nodes for fracture.

                  I'm looking forward to chatting more about this at GECCO.

                  Cheers,
                  Shimon

                  --- In neat@yahoogroups.com, "Ken" <kstanley@...> wrote:
                  >
                  > Hi Shimon,
                  >
                  > These are some really interesting and deep issues we're getting into. I
                  > am sure they will be the foundation of great discussions at GECCO. I'll
                  > try to keep my response more brief than before.
                  >
                  > 1) Indeed recent quadruped results are not in the same setup as in your
                  > work; that is a legitimate qualification and they do not constitute a
                  > proof that in your particular setup HyperNEAT would succeed. However,
                  > they are arguably at least comparably complex. For example, the CTRNN
                  > in
                  > http://eplex.cs.ucf.edu/papers/risi_gecco13b.pdf
                  > <http://eplex.cs.ucf.edu/papers/risi_gecco13b.pdf> is a complicated
                  > structure and the CPPN that encodes it is required not only to generate
                  > a controller for a single quadruped morphology, but to generate multiple
                  > controllers for multiple morphologies, all from the same CPPN. A video
                  > of three controllers all from the same CPPN is here:
                  > http://youtu.be/oLSSt5GyHNk <http://youtu.be/oLSSt5GyHNk>
                  >
                  > 2) I think the question of what is "target-based" can actually be
                  > interesting, but I completely agree that the semantic side of the
                  > argument isn't really important. But what I do find interesting is the
                  > idea that these two scenarios differ substantively in your mind (as you
                  > put it):
                  >
                  > "there is an important difference between FFs that require a single
                  > specific phenotype and FFs that require only that some goal be achieved"
                  >
                  > I think by "single specific phenotype" you mean pattern-matching (i.e.
                  > image-matching). But I think this distinction doesn't really exist:
                  > There are an unlimited number of networks (i.e. CPPNs) that can encode a
                  > particular two-dimensional image, just as there are an unlimited number
                  > of CPPNs that can encode a particular behavior or goal (such as
                  > following a line).
                  >
                  > What makes both target-based and fundamentally the same is not the
                  > number of ways they can be solved, but rather the way the fitness
                  > function interacts with the search process. The problem is that
                  > target-based fitness encourages piece-wise incremental progress, which
                  > tends to be highly deceptive. For example, with an image, matching a
                  > single pixel can raise fitness even if the function that causes that
                  > pixel to be matched has no relationship whatsoever to the overall
                  > regularities in the target pattern. The same is true in control tasks:
                  > A mutation that causes a quadruped to lunge forward clumsily is rewarded
                  > for the slight improvement in fitness, whereas an important underlying
                  > regularity (e.g. oscillation) necessary to get really good at the task
                  > is never established. Non-target-based fitness (or what I call
                  > "non-objective") avoids these problems because it *does* reward
                  > establishing regularities: it appreciates new regularities even if they
                  > don't immediately raise fitness.
                  >
                  > But what's most interesting to me is your intuition that there is some
                  > way to sidestep the tradeoff in combining indirect encodings with
                  > target-based fitness functions. That is, you suspect there may exist an
                  > indirect encoding that can both elaborate regularities compositionally
                  > over very many generations and also almost deterministically satisfy an
                  > arbitrary target-based objective. I am convinced that is not the case
                  > (it would be too good to be true), but you rightly point out that my
                  > position is a hypothesis. However, just as a thought experiment, can
                  > you really imagine any neural network encoding (including DNA) that
                  > would give you Einstein-level intelligence if the target was simply to
                  > score maximally on an IQ test? It is plain to me that the stepping
                  > stones to human level cannot be crossed in the presence of a
                  > target-based objective (or any one conceivable). However,
                  > interestingly, that does not mean that such a neural network does not
                  > exist in the search space. It just means a target won't get you there.
                  >
                  > 3) We apparently disagree on whether 6 is a big number (for nodes needed
                  > to represent fracture). If you try to devise an encoding that
                  > represents fracture with fewer than that, my intuition is that you would
                  > be sacrificing an enormous breadth of flexibility in the kinds of
                  > fracture (and patterns in general) that you can represent. Think about
                  > the number of "degrees of freedom" in fracture: it's not really
                  > well-defined, but a single line of fracture has a lot of dimensions:
                  >
                  > -position
                  > -orientation
                  > -curvature (which can be arbitrarily complex)
                  > -fracture gradient (that is, one region can smoothly transition into
                  > another in a kind of interpolated transitional zone)
                  > -embedding (fracture can potentially be hierarchical or regular, i.e.
                  > dependent on higher-level containing regions)
                  >
                  > The idea that all of that can be encoded in anything less than about 6
                  > simple functions seems optimistic. Of course, if you are willing to
                  > sacrifice some of those degrees of freedom, then you can get fewer
                  > nodes, but the exquisite minutia of such patterns would be lost (and
                  > you'd move more towards direct encoding, like the wavelets do).
                  >
                  > By the way, here are some cool examples of complicated kinds of fracture
                  > from Picbreeder (such as fractured periodic zones, but even more complex
                  > than even that):
                  >
                  > Nontrivial color patterns in each zone:
                  > http://picbreeder.org/search/showgenome.php?sid=11328
                  >
                  >
                  > Very complex brush-stroke pattern inside the apple vs. outside (the DNA
                  > for this one is fascinating):
                  > http://picbreeder.org/search/showgenome.php?sid=7109
                  >
                  >
                  >
                  > Fractured regions covering a distorted surface, giving a remarkable
                  > underlying curvature:
                  > http://picbreeder.org/search/showgenome.php?sid=2684
                  >
                  >
                  > A color variant: http://picbreeder.org/search/showgenome.php?sid=6652
                  >
                  >
                  > When I see so many remarkable examples like these on Picbreeder, the
                  > message to me is that we have only scratched the surface of what CPPN
                  > encoding can teach us about representation. That's why I think it's
                  > premature (in the face of such a treasure trove of evidence) to be
                  > worrying about whether we can represent some subcomponent of such
                  > patterns with less than 6 nodes. At present I'm amazed they can be
                  > represented at all, let alone with under 6 nodes.
                  >
                  > Thanks very much for raising these issues in any case - it's a great
                  > debate to have about representation and search and goes to the heart of
                  > the field of GDS/indirect encoding.
                  >
                  > Best,
                  >
                  > ken
                  >
                  > --- In neat@yahoogroups.com, "shimonw" wrote:
                  > >
                  > > Hi Ken,
                  > >
                  > > Thanks for your interesting and thought-provoking comments. I'll give
                  > some brief reactions here, and I hope we can have a productive
                  > discussion about it at GECCO.
                  > >
                  > > 1) I won't comment for now on the new experiments you've been running
                  > because the details are not available yet. Instead, I will just comment
                  > on the various walking gait domains, because the details of the
                  > additional results you mentioned are in that case already available, in
                  > newly published GECCO papers.
                  > >
                  > > FIrst of all, these are very cool results, and nice success stories
                  > for HyperNEAT. It's great to see that the GDS community is continuing
                  > the push the envelope in terms of robot locomotion, which seems to be
                  > emerging as a nice application for indirect encodings.
                  > >
                  > > However, I would simply caution against placing all these variations
                  > on the walking gait task on a one-dimensional spectrum of difficulty.
                  > In our paper, we took the orignal walking gait task and made only one
                  > change (increasing the mass of the torso) which clearly makes the task
                  > harder. In the papers Ken mentioned, there are other differences, e.g.,
                  > the use of an SUPG encoding, the use of CTRNNs, the use of a different
                  > substrate topology, and varying the length of the legs.
                  > >
                  > > It's completely legitimate to make these changes and they don't
                  > detract from the results at all. But they are confounding factors when
                  > it comes to comparing HyperNEAT's performance on walking gait tasks of
                  > different difficulties. Some of these changes may make the task harder
                  > in some ways but other changes may make it easier in others ways, and so
                  > far we don't have experimental results that control for these factors.
                  > >
                  > > 2) I hope to avoid a semantic debate about what is a "target-based"
                  > FF. If you want to include what I was calling a "regular" FF in the
                  > scope of target-based FFs, I won't object. But my point is that,
                  > regardless of what you call them, there is an important difference
                  > between FFs that require a single specific phenotype and FFs that
                  > require only that some goal be achieved. For the latter, there may be
                  > many ways to skin the cat, though in any interesting problem, good
                  > solutions are still quite rare. The distinction is important because,
                  > while the former category is arguably only of theoretical interest, the
                  > second category corresponds to what I consider the most interesting and
                  > important class of FFs, because these are the FFs that naturally arise
                  > in a very large number of real-world optimization problems.
                  > >
                  > > So, using your terminology, I agree it is reasonable to say
                  > HyperNEAT's difficulties with fracture are probably limited to
                  > "target-based" FFs. But for me, this is a potentially serious
                  > restriction, since target-based FFs include not just pathological FFs
                  > but a large class of real-world FFs.
                  > >
                  > > Regarding the wavelet method, I would still definitely call this an
                  > indirect encoding, but it is in some sense "less indirect" because it
                  > lacks the compositional functions of HyperNEAT. It's a perfectly
                  > reasonable hypothesis that such compositionality is advantageous in some
                  > ways. However, if that's true, then we should be able to demonstrate
                  > that by finding tasks where HyperNEAT outperforms the wavelet method
                  > (FYI, in the extended analysis Thomas is conducting for his master's
                  > thesis, we have found one variation of the line-following task where
                  > this is the case). So in that sense, the wavelet method is a useful
                  > baseline, which is the whole reason we introduced it.
                  > >
                  > > 3) If it's true that it takes ~6 nodes to represent fracture, then I
                  > think this a potentially important point. On regular FFs, it could
                  > contribute to the difficulty of discovering fracture, since
                  > representations that have only some of the necessary components may not
                  > be rewarded (I think you would call this "deception"). In
                  > Picbreeder-like tasks, we already see that fracture can evolve anyway,
                  > but this doesn't mean it wouldn't do it even better if fracture required
                  > fewer nodes to represent. If we think that the solutions to many
                  > interesting problems require fracture, then I think it's at least
                  > possible that one could devise a better representation for such problems
                  > than the CPPNs used in HyperNEAT.
                  > >
                  > > As I understand it, your approach to dealing with "regular" FFs is to
                  > reformulate them "in a way that respects the need for phenotypic
                  > diversity and open-endedness." That's a very interesting approach and I
                  > think it has a lot of merit, but I don't think it will solve all our
                  > problems because it presupposes that such a reformulation is possible
                  > (or that the prior knowledge it requires is available). In some cases,
                  > I think we are just stuck with hard FFs and we need to develop better
                  > methods to deal with them.
                  > >
                  > > Regarding the idea that "an unhappy and unhealthy indirect encoding is
                  > one with only a strict target to which to aspire": this is a very
                  > intriguing hypothesis, but to me it is only a hypothesis. I think it's
                  > true of existing indirect encodings, but that for me does not in any way
                  > rule out the possibility of developing a different indirect encoding for
                  > which that is not true, and I think that's one thing we should be
                  > working on. One of the reasons that I'm interested in indirect
                  > encodings is that I strongly believe this can be done.
                  > >
                  > > Cheers,
                  > > Shimon
                  > >
                  >

                  No virus found in this message.
                  Checked by AVG - www.avg.com
                  Version: 2013.0.3345 / Virus Database: 3199/6417 - Release Date: 06/16/13

                • Vassilis Vassiliades
                  Hi Shimon, I have some questions regarding what you said about using bandit algorithms to design NN/CPPN topologies. Do you believe that in order to tackle
                  Message 8 of 14 , Jun 17, 2013
                  • 0 Attachment
                    Hi Shimon,

                    I have some questions regarding what you said about using bandit algorithms to design NN/CPPN topologies.

                    Do you believe that in order to tackle this problem, one would need contextual bandit algorithms, where the state would be the current NN topology?

                    If there are actions that correspond to the addition and deletion of nodes and connections, the dimensionality of the context/policy would change during learning. So, I am wondering whether you know of any algorithms or techniques that could deal with such problems, where the dimensionality of the state is not fixed.

                    Best,
                    Vassilis



                    On Mon, Jun 17, 2013 at 2:54 PM, shimonw <shimon@...> wrote:
                     

                    Hi Ken,

                    If I may, I'd like to offer a couple reactions to your latest message. This is indeed an interesting discussion that touches on many potentially important issues. I hope I can offer a useful perspective on these topics.

                    You are right that in a target-based scenario there can be many CPPNs that generate the same phenotype. But the CPPN is a genotype, not a phenotype. In what I'm calling "regular" FFs, there are potentially many *phenotypes* that solve the problem, and for each of them potentially many genotypes. So in my mind, there is still a useful distinction between target based FFs and regular FFs, though they are perhaps best seen as different points on a spectrum.

                    If I understand correctly, you are suggesting that the important distinction is how the FF interacts with the search process. But to me, this is a difficult criterion to use because the search process could be anything. The space of possible black-box optimization methods is huge, and what encourages piecewise incremental progress for one method may not for another.

                    I don't pretend to know what kind of algorithm would generate Einstein-level intelligence (I don't even know if this is a useful goal). But as a researcher I highly respect once told me, "your failure to imagine it does not constitute a proof that it cannot be done."

                    That's why I think your hypothesis is quite a bold one, because it looks at the limitations of a specific class of black-box optimization methods and generalizes them to all possible black-box optimization methods.

                    To give just one example of the possibilities here: algorithms for continuous-armed bandits, such as X-armed bandits and Bayesian optimization methods such as GP-UCB, can be applied to black-box optimization. These methods avoid the problem of local optima in a principled way, by maintaining a posterior distribution over the global FF, and using optimism in the face of uncertainty to ensure sufficient exploration of the solution space. As a result, these methods have very nice theoretical properties, like guaranteed convergence to the global optimum in the limit and logarithmic regret bounds along the way.

                    As far as I know, no one has tried to develop versions of these methods that can discover NN topologies, and which would thus be suitable for optimizing CPPNs or other indirect encodings. But there's no a priori reason to think it can't be done. I'm not saying this would necessarily work or even that it's the most promising approach to explore. I'm just saying it's one example of a principled approach that could avoid all the difficulties you mention and which we currently have no strong reason to eliminate from contention.

                    So I think it's quite likely that these hard FFs are not unsolvable, but just that current methods cannot solve them. Rather than giving up on them, in my opinion we should be focusing on developing better methods for them.

                    Regarding whether 6 nodes is a lot to represent fracture, I think the same point applies: just because we can't think of a better way to do it, doesn't mean it doesn't exist. Our paper establishes 6 as an upper bound, which can be easily done by example. If I understand correctly, you are suggesting that 6 may be a lower bound, which is much more difficult to demonstrate.

                    I could definitely imagine that a different type of network, or one with a different set of activation functions, might be able to make better use of 6 nodes. Even if it's true that there's a trade-off, and using fewer nodes means less flexibility, there may be many cases where that's a favorable trade-off. We should at least be open to the possibility that it some cases the sweet spot is not a representation that requires 6 nodes for fracture.

                    I'm looking forward to chatting more about this at GECCO.

                    Cheers,
                    Shimon 

                  • Ken
                    Hi Shimon, The nice thing about this type of conversation is that it pushes us towards realizing our fundamental assumptions. In that spirit, a few other
                    Message 9 of 14 , Jun 17, 2013
                    • 0 Attachment
                      Hi Shimon,

                      The nice thing about this type of conversation is that it pushes us towards realizing our fundamental assumptions.  In that spirit, a few other thoughts in response:

                      Many top biologists and EC researchers would call the behavior of the organism in the world (i.e. its interactions with the world) the "phenotype."  David Fogel once sent me a long and forceful defense of such a position, quoting a number of famous biologists.  In that context, whether you map CPPN->pattern or CPPN->behavior, a target is still a target and the neural network is not the phenotype.

                      In any case, I guess we both hold bold hypotheses in some sense.  Your hypothesis that there exist methods that can solve hard FFs almost deterministically as targets (up to and even beyond delivering us an Einstein-level brain) seems boldly optimistic to me.  But the big difference between our hypotheses is that you have no proof of concept for your approach while I do.  In your approach, you will set a target of Einstein and almost magically the algorithm will then deliver us Einstein, which no method has yet shown even the faintest sign of achieving.  In my approach, Einstein is not the target of the search process but instead  appears as a byproduct of a search without any explicit target, which is (amazingly) actually what happened in the real world!  It may be tempting to dismiss that distinction, but think about it, the fact that any process actually *did* produce Einstein is beyond incredible.  We should learn from that as much as we can before insisting "there's got to be a better way." 

                      Of course, it's great that you're searching for a better way (it sure would be nice if we could just set Einstein's IQ as a fitness target), but the things is, there will always be a ton of people trying to follow that path because it's such a clean and intuitively appealing notion.  So it hardly needs a vigorous argument to convince someone that it's important to try.  The danger is instead  that in such a rush of enthusiasm for intuitively appealing approaches, we sweep aside the much more delicate yet equally critical attempt to harness evolution on its own terms, which leads to things like novelty search, indirect encoding, and behavioral diversity.  The power of this process may not be as clean and simple  as the optimization crowd is hoping; my argument here guards against that danger.

                      I think something similar is going on in our fracture argument: In Picbreeder (and to some extent Endlessforms too, which also uses CPPNs) we have perhaps the only massive proof-of-concept that exists of fracture evolving through an artificial encoding.  Picbreeder is a treasure trove of almost 10,000 examples, each one with its own lesson to teach us on fracture.  And my argument is, since we are fortunate enough to have this example right in front of us (like the example of natural evolution evolving Einstein), let's learn everything we can from it before looking away and saying hastily, "there must be something better."  Picbreeder is a fortuitous goldmine of data that is very hard to reproduce; it took several years and the contributions of hundreds of people to accumulate its results.  Let's scrub every last bit of wisdom from that windfall that we can.  It's not a question of better for me, it's a question of learning from the only evidence we have.  How do you expect to really develop a better encoding of fracture if you ignore the thousands of examples of fracture that already exist?  You will just end up reinventing the wheel.  

                      The danger of that is evident in some of the assumptions you hold already about how CPPNs encode fracture.  For example, you are worried that it may not do so efficiently, but you have only constructed a single thought experiment (your figure 13) instead of taking a measured look at the thousands of examples in front of us.  In fact, while indeed I don't think 6 nodes is particularly much, it's easy enough to find fracture in Picbreeder represented in under 6 nodes.  For example, the pattern below has at least 3 distinct fractured regions (which I numbered for clarity): a pinkish region, an ovular inner region, and a conic lower region.  However, the CPPN that encodes it (shown to its left) only uses 4 nodes to encode these 3 fractured regions (notice that two of the displayed hidden nodes have outgoing weights of zero, so they are not contributing to the output pattern, leaving only the 4 nodes).  That's a respectable 1.3 nodes per fractured region:

                      (this image is also available at http://eplex.cs.ucf.edu/hyperNEATpage/content/four-node-fracture.jpg )




                      We could argue about the definition of fracture, and perhaps you'd want to refine the definition to argue that the above example doesn't really have three fractured regions.  Or you could argue that the "d" input should be counted as a 5th node (to encode the 3 regions, which is still a respectable 1.7 nodes per region).  But even those would be healthy discussions that we could not begin to have without first referring to the examples that we already have.  Of course (intuitively) more complex fracture will require more nodes (maybe at some point 6!), as it should.  All I'm saying is I think this kind of speculation about whether it's efficient or not is jumping the gun:  Let's understand it first before we worry about fixing it.

                      So I think there's a difference of philosophies ultimately at work here.  Because the road ahead to AI is so vast, I'm more motivated by maximizing our information than doing "better."  For example, that's the motivation behind Picbreeder - obviously it is not quantifiably better than anything in particular, but it increased our understanding of a number of important phenomena, including fracture, representation, and non-objective search, by providing useful information.  So I say, let's see where this takes us - there's still a ton left to learn.  But you are already itching to start anew and fix things that are not clearly broken just when we have hit a goldmine of untapped information. 

                      Without question, neither of our perspectives is definitively superior; we are both merely speculating about where time is best invested, and obviously neither of us can predict the future.  This argument is only my own case for my perspective.

                      Best,

                      ken


                      --- In neat@yahoogroups.com, "shimonw" wrote:
                      >
                      > Hi Ken,
                      >
                      > If I may, I'd like to offer a couple reactions to your latest message. This is indeed an interesting discussion that touches on many potentially important issues. I hope I can offer a useful perspective on these topics.
                      >
                      > You are right that in a target-based scenario there can be many CPPNs that generate the same phenotype. But the CPPN is a genotype, not a phenotype. In what I'm calling "regular" FFs, there are potentially many *phenotypes* that solve the problem, and for each of them potentially many genotypes. So in my mind, there is still a useful distinction between target based FFs and regular FFs, though they are perhaps best seen as different points on a spectrum.
                      >
                      > If I understand correctly, you are suggesting that the important distinction is how the FF interacts with the search process. But to me, this is a difficult criterion to use because the search process could be anything. The space of possible black-box optimization methods is huge, and what encourages piecewise incremental progress for one method may not for another.
                      >
                      > I don't pretend to know what kind of algorithm would generate Einstein-level intelligence (I don't even know if this is a useful goal). But as a researcher I highly respect once told me, "your failure to imagine it does not constitute a proof that it cannot be done."
                      >
                      > That's why I think your hypothesis is quite a bold one, because it looks at the limitations of a specific class of black-box optimization methods and generalizes them to all possible black-box optimization methods.
                      >
                      > To give just one example of the possibilities here: algorithms for continuous-armed bandits, such as X-armed bandits and Bayesian optimization methods such as GP-UCB, can be applied to black-box optimization. These methods avoid the problem of local optima in a principled way, by maintaining a posterior distribution over the global FF, and using optimism in the face of uncertainty to ensure sufficient exploration of the solution space. As a result, these methods have very nice theoretical properties, like guaranteed convergence to the global optimum in the limit and logarithmic regret bounds along the way.
                      >
                      > As far as I know, no one has tried to develop versions of these methods that can discover NN topologies, and which would thus be suitable for optimizing CPPNs or other indirect encodings. But there's no a priori reason to think it can't be done. I'm not saying this would necessarily work or even that it's the most promising approach to explore. I'm just saying it's one example of a principled approach that could avoid all the difficulties you mention and which we currently have no strong reason to eliminate from contention.
                      >
                      > So I think it's quite likely that these hard FFs are not unsolvable, but just that current methods cannot solve them. Rather than giving up on them, in my opinion we should be focusing on developing better methods for them.
                      >
                      > Regarding whether 6 nodes is a lot to represent fracture, I think the same point applies: just because we can't think of a better way to do it, doesn't mean it doesn't exist. Our paper establishes 6 as an upper bound, which can be easily done by example. If I understand correctly, you are suggesting that 6 may be a lower bound, which is much more difficult to demonstrate.
                      >
                      > I could definitely imagine that a different type of network, or one with a different set of activation functions, might be able to make better use of 6 nodes. Even if it's true that there's a trade-off, and using fewer nodes means less flexibility, there may be many cases where that's a favorable trade-off. We should at least be open to the possibility that it some cases the sweet spot is not a representation that requires 6 nodes for fracture.
                      >
                      > I'm looking forward to chatting more about this at GECCO.
                      >
                      > Cheers,
                      > Shimon
                      >
                      > --- In neat@yahoogroups.com, "Ken" kstanley@ wrote:
                      > >
                      > > Hi Shimon,
                      > >
                      > > These are some really interesting and deep issues we're getting into. I
                      > > am sure they will be the foundation of great discussions at GECCO. I'll
                      > > try to keep my response more brief than before.
                      > >
                      > > 1) Indeed recent quadruped results are not in the same setup as in your
                      > > work; that is a legitimate qualification and they do not constitute a
                      > > proof that in your particular setup HyperNEAT would succeed. However,
                      > > they are arguably at least comparably complex. For example, the CTRNN
                      > > in
                      > > http://eplex.cs.ucf.edu/papers/risi_gecco13b.pdf
                      > > is a complicated
                      > > structure and the CPPN that encodes it is required not only to generate
                      > > a controller for a single quadruped morphology, but to generate multiple
                      > > controllers for multiple morphologies, all from the same CPPN. A video
                      > > of three controllers all from the same CPPN is here:
                      > > http://youtu.be/oLSSt5GyHNk
                      > >
                      > > 2) I think the question of what is "target-based" can actually be
                      > > interesting, but I completely agree that the semantic side of the
                      > > argument isn't really important. But what I do find interesting is the
                      > > idea that these two scenarios differ substantively in your mind (as you
                      > > put it):
                      > >
                      > > "there is an important difference between FFs that require a single
                      > > specific phenotype and FFs that require only that some goal be achieved"
                      > >
                      > > I think by "single specific phenotype" you mean pattern-matching (i.e.
                      > > image-matching). But I think this distinction doesn't really exist:
                      > > There are an unlimited number of networks (i.e. CPPNs) that can encode a
                      > > particular two-dimensional image, just as there are an unlimited number
                      > > of CPPNs that can encode a particular behavior or goal (such as
                      > > following a line).
                      > >
                      > > What makes both target-based and fundamentally the same is not the
                      > > number of ways they can be solved, but rather the way the fitness
                      > > function interacts with the search process. The problem is that
                      > > target-based fitness encourages piece-wise incremental progress, which
                      > > tends to be highly deceptive. For example, with an image, matching a
                      > > single pixel can raise fitness even if the function that causes that
                      > > pixel to be matched has no relationship whatsoever to the overall
                      > > regularities in the target pattern. The same is true in control tasks:
                      > > A mutation that causes a quadruped to lunge forward clumsily is rewarded
                      > > for the slight improvement in fitness, whereas an important underlying
                      > > regularity (e.g. oscillation) necessary to get really good at the task
                      > > is never established. Non-target-based fitness (or what I call
                      > > "non-objective") avoids these problems because it *does* reward
                      > > establishing regularities: it appreciates new regularities even if they
                      > > don't immediately raise fitness.
                      > >
                      > > But what's most interesting to me is your intuition that there is some
                      > > way to sidestep the tradeoff in combining indirect encodings with
                      > > target-based fitness functions. That is, you suspect there may exist an
                      > > indirect encoding that can both elaborate regularities compositionally
                      > > over very many generations and also almost deterministically satisfy an
                      > > arbitrary target-based objective. I am convinced that is not the case
                      > > (it would be too good to be true), but you rightly point out that my
                      > > position is a hypothesis. However, just as a thought experiment, can
                      > > you really imagine any neural network encoding (including DNA) that
                      > > would give you Einstein-level intelligence if the target was simply to
                      > > score maximally on an IQ test? It is plain to me that the stepping
                      > > stones to human level cannot be crossed in the presence of a
                      > > target-based objective (or any one conceivable). However,
                      > > interestingly, that does not mean that such a neural network does not
                      > > exist in the search space. It just means a target won't get you there.
                      > >
                      > > 3) We apparently disagree on whether 6 is a big number (for nodes needed
                      > > to represent fracture). If you try to devise an encoding that
                      > > represents fracture with fewer than that, my intuition is that you would
                      > > be sacrificing an enormous breadth of flexibility in the kinds of
                      > > fracture (and patterns in general) that you can represent. Think about
                      > > the number of "degrees of freedom" in fracture: it's not really
                      > > well-defined, but a single line of fracture has a lot of dimensions:
                      > >
                      > > -position
                      > > -orientation
                      > > -curvature (which can be arbitrarily complex)
                      > > -fracture gradient (that is, one region can smoothly transition into
                      > > another in a kind of interpolated transitional zone)
                      > > -embedding (fracture can potentially be hierarchical or regular, i.e.
                      > > dependent on higher-level containing regions)
                      > >
                      > > The idea that all of that can be encoded in anything less than about 6
                      > > simple functions seems optimistic. Of course, if you are willing to
                      > > sacrifice some of those degrees of freedom, then you can get fewer
                      > > nodes, but the exquisite minutia of such patterns would be lost (and
                      > > you'd move more towards direct encoding, like the wavelets do).
                      > >
                      > > By the way, here are some cool examples of complicated kinds of fracture
                      > > from Picbreeder (such as fractured periodic zones, but even more complex
                      > > than even that):
                      > >
                      > > Nontrivial color patterns in each zone:
                      > > http://picbreeder.org/search/showgenome.php?sid=11328
                      > >
                      > >
                      > > Very complex brush-stroke pattern inside the apple vs. outside (the DNA
                      > > for this one is fascinating):
                      > > http://picbreeder.org/search/showgenome.php?sid=7109
                      > >
                      > >
                      > >
                      > > Fractured regions covering a distorted surface, giving a remarkable
                      > > underlying curvature:
                      > > http://picbreeder.org/search/showgenome.php?sid=2684
                      > >
                      > >
                      > > A color variant: http://picbreeder.org/search/showgenome.php?sid=6652
                      > >
                      > >
                      > > When I see so many remarkable examples like these on Picbreeder, the
                      > > message to me is that we have only scratched the surface of what CPPN
                      > > encoding can teach us about representation. That's why I think it's
                      > > premature (in the face of such a treasure trove of evidence) to be
                      > > worrying about whether we can represent some subcomponent of such
                      > > patterns with less than 6 nodes. At present I'm amazed they can be
                      > > represented at all, let alone with under 6 nodes.
                      > >
                      > > Thanks very much for raising these issues in any case - it's a great
                      > > debate to have about representation and search and goes to the heart of
                      > > the field of GDS/indirect encoding.
                      > >
                      > > Best,
                      > >
                      > > ken
                      > >
                      > > --- In neat@yahoogroups.com, "shimonw" wrote:
                      > > >
                      > > > Hi Ken,
                      > > >
                      > > > Thanks for your interesting and thought-provoking comments. I'll give
                      > > some brief reactions here, and I hope we can have a productive
                      > > discussion about it at GECCO.
                      > > >
                      > > > 1) I won't comment for now on the new experiments you've been running
                      > > because the details are not available yet. Instead, I will just comment
                      > > on the various walking gait domains, because the details of the
                      > > additional results you mentioned are in that case already available, in
                      > > newly published GECCO papers.
                      > > >
                      > > > FIrst of all, these are very cool results, and nice success stories
                      > > for HyperNEAT. It's great to see that the GDS community is continuing
                      > > the push the envelope in terms of robot locomotion, which seems to be
                      > > emerging as a nice application for indirect encodings.
                      > > >
                      > > > However, I would simply caution against placing all these variations
                      > > on the walking gait task on a one-dimensional spectrum of difficulty.
                      > > In our paper, we took the orignal walking gait task and made only one
                      > > change (increasing the mass of the torso) which clearly makes the task
                      > > harder. In the papers Ken mentioned, there are other differences, e.g.,
                      > > the use of an SUPG encoding, the use of CTRNNs, the use of a different
                      > > substrate topology, and varying the length of the legs.
                      > > >
                      > > > It's completely legitimate to make these changes and they don't
                      > > detract from the results at all. But they are confounding factors when
                      > > it comes to comparing HyperNEAT's performance on walking gait tasks of
                      > > different difficulties. Some of these changes may make the task harder
                      > > in some ways but other changes may make it easier in others ways, and so
                      > > far we don't have experimental results that control for these factors.
                      > > >
                      > > > 2) I hope to avoid a semantic debate about what is a "target-based"
                      > > FF. If you want to include what I was calling a "regular" FF in the
                      > > scope of target-based FFs, I won't object. But my point is that,
                      > > regardless of what you call them, there is an important difference
                      > > between FFs that require a single specific phenotype and FFs that
                      > > require only that some goal be achieved. For the latter, there may be
                      > > many ways to skin the cat, though in any interesting problem, good
                      > > solutions are still quite rare. The distinction is important because,
                      > > while the former category is arguably only of theoretical interest, the
                      > > second category corresponds to what I consider the most interesting and
                      > > important class of FFs, because these are the FFs that naturally arise
                      > > in a very large number of real-world optimization problems.
                      > > >
                      > > > So, using your terminology, I agree it is reasonable to say
                      > > HyperNEAT's difficulties with fracture are probably limited to
                      > > "target-based" FFs. But for me, this is a potentially serious
                      > > restriction, since target-based FFs include not just pathological FFs
                      > > but a large class of real-world FFs.
                      > > >
                      > > > Regarding the wavelet method, I would still definitely call this an
                      > > indirect encoding, but it is in some sense "less indirect" because it
                      > > lacks the compositional functions of HyperNEAT. It's a perfectly
                      > > reasonable hypothesis that such compositionality is advantageous in some
                      > > ways. However, if that's true, then we should be able to demonstrate
                      > > that by finding tasks where HyperNEAT outperforms the wavelet method
                      > > (FYI, in the extended analysis Thomas is conducting for his master's
                      > > thesis, we have found one variation of the line-following task where
                      > > this is the case). So in that sense, the wavelet method is a useful
                      > > baseline, which is the whole reason we introduced it.
                      > > >
                      > > > 3) If it's true that it takes ~6 nodes to represent fracture, then I
                      > > think this a potentially important point. On regular FFs, it could
                      > > contribute to the difficulty of discovering fracture, since
                      > > representations that have only some of the necessary components may not
                      > > be rewarded (I think you would call this "deception"). In
                      > > Picbreeder-like tasks, we already see that fracture can evolve anyway,
                      > > but this doesn't mean it wouldn't do it even better if fracture required
                      > > fewer nodes to represent. If we think that the solutions to many
                      > > interesting problems require fracture, then I think it's at least
                      > > possible that one could devise a better representation for such problems
                      > > than the CPPNs used in HyperNEAT.
                      > > >
                      > > > As I understand it, your approach to dealing with "regular" FFs is to
                      > > reformulate them "in a way that respects the need for phenotypic
                      > > diversity and open-endedness." That's a very interesting approach and I
                      > > think it has a lot of merit, but I don't think it will solve all our
                      > > problems because it presupposes that such a reformulation is possible
                      > > (or that the prior knowledge it requires is available). In some cases,
                      > > I think we are just stuck with hard FFs and we need to develop better
                      > > methods to deal with them.
                      > > >
                      > > > Regarding the idea that "an unhappy and unhealthy indirect encoding is
                      > > one with only a strict target to which to aspire": this is a very
                      > > intriguing hypothesis, but to me it is only a hypothesis. I think it's
                      > > true of existing indirect encodings, but that for me does not in any way
                      > > rule out the possibility of developing a different indirect encoding for
                      > > which that is not true, and I think that's one thing we should be
                      > > working on. One of the reasons that I'm interested in indirect
                      > > encodings is that I strongly believe this can be done.
                      > > >
                      > > > Cheers,
                      > > > Shimon
                      > > >
                      > >
                      >
                    • shimonw
                      Hi Vassilis, Thanks for your comments. No, I don t think we can model it as a contextual bandit problem because in that setting the agent doesn t get to
                      Message 10 of 14 , Jun 24, 2013
                      • 0 Attachment
                        Hi Vassilis,

                        Thanks for your comments. No, I don't think we can model it as a contextual bandit problem because in that setting the agent doesn't get to choose the state, whereas it does get to choose the NN topology.

                        I'm not aware of bandit methods that deal with flexible length representations, and I suspect this would be the main research challenge in trying to develop bandit methods for GDS. In the Bayesian optimization approaches, a prior is placed of the space of fitness functions, so in this case that would require priors over functions whose domains are vectors of different lengths. I'm not enough of an expert in Bayesian methods to say whether existing methods can deal with that, but I see no fundamental reason why it couldn't be done.

                        -Shimon

                        --- In neat@yahoogroups.com, Vassilis Vassiliades <vassilisvas@...> wrote:
                        >
                        > Hi Shimon,
                        >
                        > I have some questions regarding what you said about using bandit algorithms
                        > to design NN/CPPN topologies.
                        >
                        > Do you believe that in order to tackle this problem, one would need
                        > contextual bandit algorithms, where the state would be the current NN
                        > topology?
                        >
                        > If there are actions that correspond to the addition and deletion of nodes
                        > and connections, the dimensionality of the context/policy would change
                        > during learning. So, I am wondering whether you know of any algorithms or
                        > techniques that could deal with such problems, where the dimensionality of
                        > the state is not fixed.
                        >
                        > Best,
                        > Vassilis
                        >
                        >
                        >
                        > On Mon, Jun 17, 2013 at 2:54 PM, shimonw <shimon@...> wrote:
                        >
                        > > **
                        > >
                        > >
                        > > Hi Ken,
                        > >
                        > > If I may, I'd like to offer a couple reactions to your latest message.
                        > > This is indeed an interesting discussion that touches on many potentially
                        > > important issues. I hope I can offer a useful perspective on these topics.
                        > >
                        > > You are right that in a target-based scenario there can be many CPPNs that
                        > > generate the same phenotype. But the CPPN is a genotype, not a phenotype.
                        > > In what I'm calling "regular" FFs, there are potentially many *phenotypes*
                        > > that solve the problem, and for each of them potentially many genotypes. So
                        > > in my mind, there is still a useful distinction between target based FFs
                        > > and regular FFs, though they are perhaps best seen as different points on a
                        > > spectrum.
                        > >
                        > > If I understand correctly, you are suggesting that the important
                        > > distinction is how the FF interacts with the search process. But to me,
                        > > this is a difficult criterion to use because the search process could be
                        > > anything. The space of possible black-box optimization methods is huge, and
                        > > what encourages piecewise incremental progress for one method may not for
                        > > another.
                        > >
                        > > I don't pretend to know what kind of algorithm would generate
                        > > Einstein-level intelligence (I don't even know if this is a useful goal).
                        > > But as a researcher I highly respect once told me, "your failure to imagine
                        > > it does not constitute a proof that it cannot be done."
                        > >
                        > > That's why I think your hypothesis is quite a bold one, because it looks
                        > > at the limitations of a specific class of black-box optimization methods
                        > > and generalizes them to all possible black-box optimization methods.
                        > >
                        > > To give just one example of the possibilities here: algorithms for
                        > > continuous-armed bandits, such as X-armed bandits and Bayesian optimization
                        > > methods such as GP-UCB, can be applied to black-box optimization. These
                        > > methods avoid the problem of local optima in a principled way, by
                        > > maintaining a posterior distribution over the global FF, and using optimism
                        > > in the face of uncertainty to ensure sufficient exploration of the solution
                        > > space. As a result, these methods have very nice theoretical properties,
                        > > like guaranteed convergence to the global optimum in the limit and
                        > > logarithmic regret bounds along the way.
                        > >
                        > > As far as I know, no one has tried to develop versions of these methods
                        > > that can discover NN topologies, and which would thus be suitable for
                        > > optimizing CPPNs or other indirect encodings. But there's no a priori
                        > > reason to think it can't be done. I'm not saying this would necessarily
                        > > work or even that it's the most promising approach to explore. I'm just
                        > > saying it's one example of a principled approach that could avoid all the
                        > > difficulties you mention and which we currently have no strong reason to
                        > > eliminate from contention.
                        > >
                        > > So I think it's quite likely that these hard FFs are not unsolvable, but
                        > > just that current methods cannot solve them. Rather than giving up on them,
                        > > in my opinion we should be focusing on developing better methods for them.
                        > >
                        > > Regarding whether 6 nodes is a lot to represent fracture, I think the same
                        > > point applies: just because we can't think of a better way to do it,
                        > > doesn't mean it doesn't exist. Our paper establishes 6 as an upper bound,
                        > > which can be easily done by example. If I understand correctly, you are
                        > > suggesting that 6 may be a lower bound, which is much more difficult to
                        > > demonstrate.
                        > >
                        > > I could definitely imagine that a different type of network, or one with a
                        > > different set of activation functions, might be able to make better use of
                        > > 6 nodes. Even if it's true that there's a trade-off, and using fewer nodes
                        > > means less flexibility, there may be many cases where that's a favorable
                        > > trade-off. We should at least be open to the possibility that it some cases
                        > > the sweet spot is not a representation that requires 6 nodes for fracture.
                        > >
                        > > I'm looking forward to chatting more about this at GECCO.
                        > >
                        > > Cheers,
                        > > Shimon
                        > >
                        >
                      • shimonw
                        Hi Ken, Yes, I agree that this conversation is exposing some of our fundamental assumptions, and it is always good to examine and question such assumptions.
                        Message 11 of 14 , Jun 24, 2013
                        • 0 Attachment
                          Hi Ken,

                          Yes, I agree that this conversation is exposing some of our fundamental assumptions, and it is always good to examine and question such assumptions.

                          However, if you think that our hypotheses are equally bold, then I fear my main point has been misunderstood. I am not actually hypothesizing that an algorithm with any particular qualities exists. I am only saying that I am not aware of any result which gives good reason to abandon hope in developing methods that work with indirect encodings and excel on "regular" FFs. This is actually not a hypothesis at all; it is an incontrovertible fact. Of course, it doesn't prove that such a result doesn't exist and I certainly don't claim a comprehensive mastery of the literature. But if such a result does exist (though it's hard to imagine it would), anyone is free to simply point it out to me.

                          By contrast, your hypothesis was that "an unhappy and unhealthy indirect encoding is one with only a strict target to which to aspire." I continue to maintain that this quite a bold claim. You suggested that the "success" of natural evolution provides a proof of concept to support your hypothesis, but I find this highly problematic. Firstly, it's unclear that evolution was "successful" in any meaningful way because there's an anthropic principle at work here: we are human and so we find humans impressive. The success of evolution is thus a self-fulfilling prophecy. Secondly, and more importantly, even if we accept that evolution is successful, this actually does not validate your hypothesis at all. Showing that there's a nice algorithm that works without a target is not the same as showing that there are *no* algorithms that work with a target, which is what your hypothesis claims.

                          That's why I think it's not correct to describe our "hypotheses" as equally bold. In my case, there is nothing to prove: there's just a simple fact. In your case, validating the hypothesis requires proving a negative (and quite a sweeping one at that), which is notoriously difficult to do.

                          Finally, I sense from your defense of PicBreeder that I may have been misunderstood in another way. I'm in no way trying to criticize PicBreeder or suggest that research into it is misguided. I think it's a very cool and intriguing result, possibly a highly significant one. You and I may disagree about *why* it's significant, but we definitely agree that it's useful to study further and has the potential to yield important insights.

                          I'm just saying that, in the complementary search for optimization methods that use indirect encodings and can excel on hard FFs, there's really no reason for despair.

                          Cheers,
                          Shimon

                          --- In neat@yahoogroups.com, "Ken" <kstanley@...> wrote:
                          >
                          > Hi Shimon,
                          >
                          > The nice thing about this type of conversation is that it pushes us
                          > towards realizing our fundamental assumptions. In that spirit, a few
                          > other thoughts in response:
                          >
                          > Many top biologists and EC researchers would call the behavior of the
                          > organism in the world (i.e. its interactions with the world) the
                          > "phenotype." David Fogel once sent me a long and forceful defense of
                          > such a position, quoting a number of famous biologists. In that
                          > context, whether you map CPPN->pattern or CPPN->behavior, a target is
                          > still a target and the neural network is not the phenotype.
                          >
                          > In any case, I guess we both hold bold hypotheses in some sense. Your
                          > hypothesis that there exist methods that can solve hard FFs almost
                          > deterministically as targets (up to and even beyond delivering us an
                          > Einstein-level brain) seems boldly optimistic to me. But the big
                          > difference between our hypotheses is that you have no proof of concept
                          > for your approach while I do. In your approach, you will set a target
                          > of Einstein and almost magically the algorithm will then deliver us
                          > Einstein, which no method has yet shown even the faintest sign of
                          > achieving. In my approach, Einstein is not the target of the search
                          > process but instead appears as a byproduct of a search without any
                          > explicit target, which is (amazingly) actually what happened in the real
                          > world! It may be tempting to dismiss that distinction, but think about
                          > it, the fact that any process actually *did* produce Einstein is beyond
                          > incredible. We should learn from that as much as we can before
                          > insisting "there's got to be a better way."
                          >
                          > Of course, it's great that you're searching for a better way (it sure
                          > would be nice if we could just set Einstein's IQ as a fitness target),
                          > but the things is, there will always be a ton of people trying to follow
                          > that path because it's such a clean and intuitively appealing notion.
                          > So it hardly needs a vigorous argument to convince someone that it's
                          > important to try. The danger is instead that in such a rush of
                          > enthusiasm for intuitively appealing approaches, we sweep aside the much
                          > more delicate yet equally critical attempt to harness evolution on its
                          > own terms, which leads to things like novelty search, indirect encoding,
                          > and behavioral diversity. The power of this process may not be as clean
                          > and simple as the optimization crowd is hoping; my argument here guards
                          > against that danger.
                          >
                          > I think something similar is going on in our fracture argument: In
                          > Picbreeder (and to some extent Endlessforms too, which also uses CPPNs)
                          > we have perhaps the only massive proof-of-concept that exists of
                          > fracture evolving through an artificial encoding. Picbreeder is a
                          > treasure trove of almost 10,000 examples, each one with its own lesson
                          > to teach us on fracture. And my argument is, since we are fortunate
                          > enough to have this example right in front of us (like the example of
                          > natural evolution evolving Einstein), let's learn everything we can from
                          > it before looking away and saying hastily, "there must be something
                          > better." Picbreeder is a fortuitous goldmine of data that is very hard
                          > to reproduce; it took several years and the contributions of hundreds
                          > of people to accumulate its results. Let's scrub every last bit of
                          > wisdom from that windfall that we can. It's not a question of better
                          > for me, it's a question of learning from the only evidence we have. How
                          > do you expect to really develop a better encoding of fracture if you
                          > ignore the thousands of examples of fracture that already exist? You
                          > will just end up reinventing the wheel.
                          >
                          > The danger of that is evident in some of the assumptions you hold
                          > already about how CPPNs encode fracture. For example, you are worried
                          > that it may not do so efficiently, but you have only constructed a
                          > single thought experiment (your figure 13) instead of taking a measured
                          > look at the thousands of examples in front of us. In fact, while indeed
                          > I don't think 6 nodes is particularly much, it's easy enough to find
                          > fracture in Picbreeder represented in under 6 nodes. For example, the
                          > pattern below has at least 3 distinct fractured regions (which I
                          > numbered for clarity): a pinkish region, an ovular inner region, and a
                          > conic lower region. However, the CPPN that encodes it (shown to its
                          > left) only uses 4 nodes to encode these 3 fractured regions (notice that
                          > two of the displayed hidden nodes have outgoing weights of zero, so they
                          > are not contributing to the output pattern, leaving only the 4 nodes).
                          > That's a respectable 1.3 nodes per fractured region:
                          >
                          > (this image is also available at
                          > http://eplex.cs.ucf.edu/hyperNEATpage/content/four-node-fracture.jpg
                          > <http://eplex.cs.ucf.edu/hyperNEATpage/content/four-node-fracture.jpg>
                          > )
                          >
                          >
                          >
                          >
                          > We could argue about the definition of fracture, and perhaps you'd want
                          > to refine the definition to argue that the above example doesn't really
                          > have three fractured regions. Or you could argue that the "d" input
                          > should be counted as a 5th node (to encode the 3 regions, which is still
                          > a respectable 1.7 nodes per region). But even those would be healthy
                          > discussions that we could not begin to have without first referring to
                          > the examples that we already have. Of course (intuitively) more complex
                          > fracture will require more nodes (maybe at some point 6!), as it should.
                          > All I'm saying is I think this kind of speculation about whether it's
                          > efficient or not is jumping the gun: Let's understand it first before
                          > we worry about fixing it.
                          >
                          > So I think there's a difference of philosophies ultimately at work here.
                          > Because the road ahead to AI is so vast, I'm more motivated by
                          > maximizing our information than doing "better." For example, that's the
                          > motivation behind Picbreeder - obviously it is not quantifiably better
                          > than anything in particular, but it increased our understanding of a
                          > number of important phenomena, including fracture, representation, and
                          > non-objective search, by providing useful information. So I say, let's
                          > see where this takes us - there's still a ton left to learn. But you
                          > are already itching to start anew and fix things that are not clearly
                          > broken just when we have hit a goldmine of untapped information.
                          >
                          > Without question, neither of our perspectives is definitively superior;
                          > we are both merely speculating about where time is best invested, and
                          > obviously neither of us can predict the future. This argument is only
                          > my own case for my perspective.
                          >
                          > Best,
                          >
                          > ken
                          >
                          >
                          > --- In neat@yahoogroups.com, "shimonw" wrote:
                          > >
                          > > Hi Ken,
                          > >
                          > > If I may, I'd like to offer a couple reactions to your latest message.
                          > This is indeed an interesting discussion that touches on many
                          > potentially important issues. I hope I can offer a useful perspective
                          > on these topics.
                          > >
                          > > You are right that in a target-based scenario there can be many CPPNs
                          > that generate the same phenotype. But the CPPN is a genotype, not a
                          > phenotype. In what I'm calling "regular" FFs, there are potentially many
                          > *phenotypes* that solve the problem, and for each of them potentially
                          > many genotypes. So in my mind, there is still a useful distinction
                          > between target based FFs and regular FFs, though they are perhaps best
                          > seen as different points on a spectrum.
                          > >
                          > > If I understand correctly, you are suggesting that the important
                          > distinction is how the FF interacts with the search process. But to me,
                          > this is a difficult criterion to use because the search process could be
                          > anything. The space of possible black-box optimization methods is huge,
                          > and what encourages piecewise incremental progress for one method may
                          > not for another.
                          > >
                          > > I don't pretend to know what kind of algorithm would generate
                          > Einstein-level intelligence (I don't even know if this is a useful
                          > goal). But as a researcher I highly respect once told me, "your failure
                          > to imagine it does not constitute a proof that it cannot be done."
                          > >
                          > > That's why I think your hypothesis is quite a bold one, because it
                          > looks at the limitations of a specific class of black-box optimization
                          > methods and generalizes them to all possible black-box optimization
                          > methods.
                          > >
                          > > To give just one example of the possibilities here: algorithms for
                          > continuous-armed bandits, such as X-armed bandits and Bayesian
                          > optimization methods such as GP-UCB, can be applied to black-box
                          > optimization. These methods avoid the problem of local optima in a
                          > principled way, by maintaining a posterior distribution over the global
                          > FF, and using optimism in the face of uncertainty to ensure sufficient
                          > exploration of the solution space. As a result, these methods have very
                          > nice theoretical properties, like guaranteed convergence to the global
                          > optimum in the limit and logarithmic regret bounds along the way.
                          > >
                          > > As far as I know, no one has tried to develop versions of these
                          > methods that can discover NN topologies, and which would thus be
                          > suitable for optimizing CPPNs or other indirect encodings. But there's
                          > no a priori reason to think it can't be done. I'm not saying this would
                          > necessarily work or even that it's the most promising approach to
                          > explore. I'm just saying it's one example of a principled approach that
                          > could avoid all the difficulties you mention and which we currently have
                          > no strong reason to eliminate from contention.
                          > >
                          > > So I think it's quite likely that these hard FFs are not unsolvable,
                          > but just that current methods cannot solve them. Rather than giving up
                          > on them, in my opinion we should be focusing on developing better
                          > methods for them.
                          > >
                          > > Regarding whether 6 nodes is a lot to represent fracture, I think the
                          > same point applies: just because we can't think of a better way to do
                          > it, doesn't mean it doesn't exist. Our paper establishes 6 as an upper
                          > bound, which can be easily done by example. If I understand correctly,
                          > you are suggesting that 6 may be a lower bound, which is much more
                          > difficult to demonstrate.
                          > >
                          > > I could definitely imagine that a different type of network, or one
                          > with a different set of activation functions, might be able to make
                          > better use of 6 nodes. Even if it's true that there's a trade-off, and
                          > using fewer nodes means less flexibility, there may be many cases where
                          > that's a favorable trade-off. We should at least be open to the
                          > possibility that it some cases the sweet spot is not a representation
                          > that requires 6 nodes for fracture.
                          > >
                          > > I'm looking forward to chatting more about this at GECCO.
                          > >
                          > > Cheers,
                          > > Shimon
                          > >
                          > > --- In neat@yahoogroups.com, "Ken" kstanley@ wrote:
                          > > >
                          > > > Hi Shimon,
                          > > >
                          > > > These are some really interesting and deep issues we're getting
                          > into. I
                          > > > am sure they will be the foundation of great discussions at GECCO.
                          > I'll
                          > > > try to keep my response more brief than before.
                          > > >
                          > > > 1) Indeed recent quadruped results are not in the same setup as in
                          > your
                          > > > work; that is a legitimate qualification and they do not constitute
                          > a
                          > > > proof that in your particular setup HyperNEAT would succeed.
                          > However,
                          > > > they are arguably at least comparably complex. For example, the
                          > CTRNN
                          > > > in
                          > > > http://eplex.cs.ucf.edu/papers/risi_gecco13b.pdf
                          > > > is a complicated
                          > > > structure and the CPPN that encodes it is required not only to
                          > generate
                          > > > a controller for a single quadruped morphology, but to generate
                          > multiple
                          > > > controllers for multiple morphologies, all from the same CPPN. A
                          > video
                          > > > of three controllers all from the same CPPN is here:
                          > > > http://youtu.be/oLSSt5GyHNk
                          > > >
                          > > > 2) I think the question of what is "target-based" can actually be
                          > > > interesting, but I completely agree that the semantic side of the
                          > > > argument isn't really important. But what I do find interesting is
                          > the
                          > > > idea that these two scenarios differ substantively in your mind (as
                          > you
                          > > > put it):
                          > > >
                          > > > "there is an important difference between FFs that require a single
                          > > > specific phenotype and FFs that require only that some goal be
                          > achieved"
                          > > >
                          > > > I think by "single specific phenotype" you mean pattern-matching
                          > (i.e.
                          > > > image-matching). But I think this distinction doesn't really exist:
                          > > > There are an unlimited number of networks (i.e. CPPNs) that can
                          > encode a
                          > > > particular two-dimensional image, just as there are an unlimited
                          > number
                          > > > of CPPNs that can encode a particular behavior or goal (such as
                          > > > following a line).
                          > > >
                          > > > What makes both target-based and fundamentally the same is not the
                          > > > number of ways they can be solved, but rather the way the fitness
                          > > > function interacts with the search process. The problem is that
                          > > > target-based fitness encourages piece-wise incremental progress,
                          > which
                          > > > tends to be highly deceptive. For example, with an image, matching
                          > a
                          > > > single pixel can raise fitness even if the function that causes that
                          > > > pixel to be matched has no relationship whatsoever to the overall
                          > > > regularities in the target pattern. The same is true in control
                          > tasks:
                          > > > A mutation that causes a quadruped to lunge forward clumsily is
                          > rewarded
                          > > > for the slight improvement in fitness, whereas an important
                          > underlying
                          > > > regularity (e.g. oscillation) necessary to get really good at the
                          > task
                          > > > is never established. Non-target-based fitness (or what I call
                          > > > "non-objective") avoids these problems because it *does* reward
                          > > > establishing regularities: it appreciates new regularities even if
                          > they
                          > > > don't immediately raise fitness.
                          > > >
                          > > > But what's most interesting to me is your intuition that there is
                          > some
                          > > > way to sidestep the tradeoff in combining indirect encodings with
                          > > > target-based fitness functions. That is, you suspect there may
                          > exist an
                          > > > indirect encoding that can both elaborate regularities
                          > compositionally
                          > > > over very many generations and also almost deterministically satisfy
                          > an
                          > > > arbitrary target-based objective. I am convinced that is not the
                          > case
                          > > > (it would be too good to be true), but you rightly point out that my
                          > > > position is a hypothesis. However, just as a thought experiment,
                          > can
                          > > > you really imagine any neural network encoding (including DNA) that
                          > > > would give you Einstein-level intelligence if the target was simply
                          > to
                          > > > score maximally on an IQ test? It is plain to me that the stepping
                          > > > stones to human level cannot be crossed in the presence of a
                          > > > target-based objective (or any one conceivable). However,
                          > > > interestingly, that does not mean that such a neural network does
                          > not
                          > > > exist in the search space. It just means a target won't get you
                          > there.
                          > > >
                          > > > 3) We apparently disagree on whether 6 is a big number (for nodes
                          > needed
                          > > > to represent fracture). If you try to devise an encoding that
                          > > > represents fracture with fewer than that, my intuition is that you
                          > would
                          > > > be sacrificing an enormous breadth of flexibility in the kinds of
                          > > > fracture (and patterns in general) that you can represent. Think
                          > about
                          > > > the number of "degrees of freedom" in fracture: it's not really
                          > > > well-defined, but a single line of fracture has a lot of dimensions:
                          > > >
                          > > > -position
                          > > > -orientation
                          > > > -curvature (which can be arbitrarily complex)
                          > > > -fracture gradient (that is, one region can smoothly transition into
                          > > > another in a kind of interpolated transitional zone)
                          > > > -embedding (fracture can potentially be hierarchical or regular,
                          > i.e.
                          > > > dependent on higher-level containing regions)
                          > > >
                          > > > The idea that all of that can be encoded in anything less than about
                          > 6
                          > > > simple functions seems optimistic. Of course, if you are willing to
                          > > > sacrifice some of those degrees of freedom, then you can get fewer
                          > > > nodes, but the exquisite minutia of such patterns would be lost (and
                          > > > you'd move more towards direct encoding, like the wavelets do).
                          > > >
                          > > > By the way, here are some cool examples of complicated kinds of
                          > fracture
                          > > > from Picbreeder (such as fractured periodic zones, but even more
                          > complex
                          > > > than even that):
                          > > >
                          > > > Nontrivial color patterns in each zone:
                          > > > http://picbreeder.org/search/showgenome.php?sid=11328
                          > > >
                          > > >
                          > > > Very complex brush-stroke pattern inside the apple vs. outside (the
                          > DNA
                          > > > for this one is fascinating):
                          > > > http://picbreeder.org/search/showgenome.php?sid=7109
                          > > >
                          > > >
                          > > >
                          > > > Fractured regions covering a distorted surface, giving a remarkable
                          > > > underlying curvature:
                          > > > http://picbreeder.org/search/showgenome.php?sid=2684
                          > > >
                          > > >
                          > > > A color variant:
                          > http://picbreeder.org/search/showgenome.php?sid=6652
                          > > >
                          > > >
                          > > > When I see so many remarkable examples like these on Picbreeder, the
                          > > > message to me is that we have only scratched the surface of what
                          > CPPN
                          > > > encoding can teach us about representation. That's why I think it's
                          > > > premature (in the face of such a treasure trove of evidence) to be
                          > > > worrying about whether we can represent some subcomponent of such
                          > > > patterns with less than 6 nodes. At present I'm amazed they can be
                          > > > represented at all, let alone with under 6 nodes.
                          > > >
                          > > > Thanks very much for raising these issues in any case - it's a great
                          > > > debate to have about representation and search and goes to the heart
                          > of
                          > > > the field of GDS/indirect encoding.
                          > > >
                          > > > Best,
                          > > >
                          > > > ken
                          > > >
                          > > > --- In neat@yahoogroups.com, "shimonw" wrote:
                          > > > >
                          > > > > Hi Ken,
                          > > > >
                          > > > > Thanks for your interesting and thought-provoking comments. I'll
                          > give
                          > > > some brief reactions here, and I hope we can have a productive
                          > > > discussion about it at GECCO.
                          > > > >
                          > > > > 1) I won't comment for now on the new experiments you've been
                          > running
                          > > > because the details are not available yet. Instead, I will just
                          > comment
                          > > > on the various walking gait domains, because the details of the
                          > > > additional results you mentioned are in that case already available,
                          > in
                          > > > newly published GECCO papers.
                          > > > >
                          > > > > FIrst of all, these are very cool results, and nice success
                          > stories
                          > > > for HyperNEAT. It's great to see that the GDS community is
                          > continuing
                          > > > the push the envelope in terms of robot locomotion, which seems to
                          > be
                          > > > emerging as a nice application for indirect encodings.
                          > > > >
                          > > > > However, I would simply caution against placing all these
                          > variations
                          > > > on the walking gait task on a one-dimensional spectrum of
                          > difficulty.
                          > > > In our paper, we took the orignal walking gait task and made only
                          > one
                          > > > change (increasing the mass of the torso) which clearly makes the
                          > task
                          > > > harder. In the papers Ken mentioned, there are other differences,
                          > e.g.,
                          > > > the use of an SUPG encoding, the use of CTRNNs, the use of a
                          > different
                          > > > substrate topology, and varying the length of the legs.
                          > > > >
                          > > > > It's completely legitimate to make these changes and they don't
                          > > > detract from the results at all. But they are confounding factors
                          > when
                          > > > it comes to comparing HyperNEAT's performance on walking gait tasks
                          > of
                          > > > different difficulties. Some of these changes may make the task
                          > harder
                          > > > in some ways but other changes may make it easier in others ways,
                          > and so
                          > > > far we don't have experimental results that control for these
                          > factors.
                          > > > >
                          > > > > 2) I hope to avoid a semantic debate about what is a
                          > "target-based"
                          > > > FF. If you want to include what I was calling a "regular" FF in the
                          > > > scope of target-based FFs, I won't object. But my point is that,
                          > > > regardless of what you call them, there is an important difference
                          > > > between FFs that require a single specific phenotype and FFs that
                          > > > require only that some goal be achieved. For the latter, there may
                          > be
                          > > > many ways to skin the cat, though in any interesting problem, good
                          > > > solutions are still quite rare. The distinction is important
                          > because,
                          > > > while the former category is arguably only of theoretical interest,
                          > the
                          > > > second category corresponds to what I consider the most interesting
                          > and
                          > > > important class of FFs, because these are the FFs that naturally
                          > arise
                          > > > in a very large number of real-world optimization problems.
                          > > > >
                          > > > > So, using your terminology, I agree it is reasonable to say
                          > > > HyperNEAT's difficulties with fracture are probably limited to
                          > > > "target-based" FFs. But for me, this is a potentially serious
                          > > > restriction, since target-based FFs include not just pathological
                          > FFs
                          > > > but a large class of real-world FFs.
                          > > > >
                          > > > > Regarding the wavelet method, I would still definitely call this
                          > an
                          > > > indirect encoding, but it is in some sense "less indirect" because
                          > it
                          > > > lacks the compositional functions of HyperNEAT. It's a perfectly
                          > > > reasonable hypothesis that such compositionality is advantageous in
                          > some
                          > > > ways. However, if that's true, then we should be able to
                          > demonstrate
                          > > > that by finding tasks where HyperNEAT outperforms the wavelet method
                          > > > (FYI, in the extended analysis Thomas is conducting for his master's
                          > > > thesis, we have found one variation of the line-following task where
                          > > > this is the case). So in that sense, the wavelet method is a useful
                          > > > baseline, which is the whole reason we introduced it.
                          > > > >
                          > > > > 3) If it's true that it takes ~6 nodes to represent fracture, then
                          > I
                          > > > think this a potentially important point. On regular FFs, it could
                          > > > contribute to the difficulty of discovering fracture, since
                          > > > representations that have only some of the necessary components may
                          > not
                          > > > be rewarded (I think you would call this "deception"). In
                          > > > Picbreeder-like tasks, we already see that fracture can evolve
                          > anyway,
                          > > > but this doesn't mean it wouldn't do it even better if fracture
                          > required
                          > > > fewer nodes to represent. If we think that the solutions to many
                          > > > interesting problems require fracture, then I think it's at least
                          > > > possible that one could devise a better representation for such
                          > problems
                          > > > than the CPPNs used in HyperNEAT.
                          > > > >
                          > > > > As I understand it, your approach to dealing with "regular" FFs is
                          > to
                          > > > reformulate them "in a way that respects the need for phenotypic
                          > > > diversity and open-endedness." That's a very interesting approach
                          > and I
                          > > > think it has a lot of merit, but I don't think it will solve all our
                          > > > problems because it presupposes that such a reformulation is
                          > possible
                          > > > (or that the prior knowledge it requires is available). In some
                          > cases,
                          > > > I think we are just stuck with hard FFs and we need to develop
                          > better
                          > > > methods to deal with them.
                          > > > >
                          > > > > Regarding the idea that "an unhappy and unhealthy indirect
                          > encoding is
                          > > > one with only a strict target to which to aspire": this is a very
                          > > > intriguing hypothesis, but to me it is only a hypothesis. I think
                          > it's
                          > > > true of existing indirect encodings, but that for me does not in any
                          > way
                          > > > rule out the possibility of developing a different indirect encoding
                          > for
                          > > > which that is not true, and I think that's one thing we should be
                          > > > working on. One of the reasons that I'm interested in indirect
                          > > > encodings is that I strongly believe this can be done.
                          > > > >
                          > > > > Cheers,
                          > > > > Shimon
                          > > > >
                          > > >
                          > >
                          >
                        • Ken
                          Hi Shimon, I sense from your saying, I am only saying that I am not aware of any result which gives good reason to abandon hope in developing methods... that
                          Message 12 of 14 , Jun 26, 2013
                          • 0 Attachment
                            Hi Shimon,

                            I sense from your saying, "I am only saying that I am not aware of any result which gives good reason to abandon hope in developing methods..." that you are being very careful to remain as safely agnostic as possible. But I would guess that you (or at least most people) wouldn't be working on finding "...methods that work with indirect encodings and excel on 'regular' FFs" unless you actually believed they exist (which would indeed be a bold hypothesis). Or maybe you really do choose research directions with no confidence in them beyond total neutrality, but sometimes there's no harm in throwing your hat in the ring and seeing where your hopes (i.e. hypotheses) lead you. In any case, since you say you aren't aware of any, I'd like to offer a number of published results that do support (not prove, but certainly support) doubting the hypothesis that such methods exist.

                            First, the most obvious is my paper with Brian Woolley (http://eplex.cs.ucf.edu/papers/woolley_gecco11.pdf) on re-evolving Picbreeder images as targets. That of course goes straight to the heart of the issue we are discussing, which is whether it will be possible to evolve to particular targets with regular FFs. In that paper, the answer is a resounding "no," although Picbreeder users routinely evolve such images (meaning images as complex and interesting as the Skull or Butterfly), yet not as targets. Of course that does not prove that some other indirect encoding does not exist that can evolve to such images as targets, but what should be worrisome is the extreme disparity in that work: Picbreeder images that are evolved in only a few dozen generations (with a population of 15) cannot be re-evolved effectively in 30,000 generations! I mean, you could logically say "there is no proof that a method better than NEAT+CPPNs at evolving to targets does not exist," but surely NEAT+CPPNs aren't *that* terrible of a combination - they'd have to be abjectly terrible for there to be hope that something can handily substitute in the face of such abysmal failure. So the seed of doubt is reasonably planted. At the same time, we see in the same Picbreeder system (with NEAT+CPPNs) routine discovery of such images as non-objectives in only a few generations. That's interesting.

                            Second, the novelty search and behavioral diversity results (from our group and others; see http://eplex.cs.ucf.edu/noveltysearch/userspage/) deserve to be counted as evidence (not proof) supporting my hypothesis. We see there many examples of a non-objective search easily solving problems that are difficult or impossible for purely target-based searches with various algorithms. Certainly after some time all that accumulated evidence is enough to raise doubt as to whether there exists some kind of magical target-driven algorithm that always gives you what you want. But it's easy to misinterpret the lesson in novelty search's results. It's not its successes that are most important to the current discussion; rather it's the *failures* of the target-driven variants that are most suggestive: We are talking about a situation in which entirely reasonable target-based setups that follow well accepted practices from decades of research are being squashed by a ridiculous setup that doesn't even know what it's trying to do! Rather than celebrating the "success" of novelty search, we should be sweating with anxiety over this disconcerting failure of the very paradigm over which you are saying "there is no reason for despair." Perhaps not yet despair, but doubt - certainly.

                            Third, sometimes just thinking a problem through is worth a few cents. Take the example of evolving the pattern of a face. If an indirect encoding is to evolve a face properly, then logic suggests that it simply must discover symmetry early on. It is true that it could theoretically discover a face by discovering both sides separately, but that is not what I am calling "healthy" because any mutation of such a perverse representation will destroy the face-ness of the face (which means it cannot lead to new faces). It also would be inefficient because it would require finding both sides independently when they are not independent. In fact, the more complex each side of the face is, the worse this misadventure becomes. At an extreme, if each side is as complex as a human brain, forget it, the need to make an astronomically unlikely sequence of discoveries twice is an absolute deal-breaker.

                            Now the question is, how can there possibly be any magical target-based algorithm or encoding that "realizes" that symmetry needs to be rewarded early on when symmetry on its own does not even remotely resemble a face? Of course if there is special knowledge, like if we know that faces require symmetry, we could try to incorporate that knowledge into the FF, but that's not what you're suggesting - you're suggesting there may exist some method that doesn't need to know the stepping stones but magically rewards them anyway simply by virtue of rewarding only the target. Once again, the possibility remains, but the reason for doubt here is significant.

                            Finally, the idea that the products of nature are indeed impressive is not mere conjecture. For example, see our publication on the topic of impressiveness (which also establishes a connection between novelty and impressiveness):
                            http://eplex.cs.ucf.edu/papers/lehman_alife12b.pdf

                            Now I'm not taking an extreme position here. I'm not saying a target can't be anywhere in the mix whatsoever (so I leave plenty of room e.g. for the behavioral diversity combination that Stef and JBM have developed over the years). I'm just saying there is good reason to doubt the clarion call for pure target-based FFs leading us to the promised land. And as an extra bonus in the name of hope for the non-objective alternative, we have the fact that Einstein did evolve. That is, I didn't mean in my previous post to say that Einstein is a proof that non-objective evolution is the only way; rather my point is that it is a proof-of-concept that it *can* work. Your alternative on the other hand boasts no such proof of concept. We don't have any evidence it can or did work at finding human-level intelligence, ever, anthropic concern or not.

                            So we've got publications, we've got results, we've got reasoning, we've got natural precedent, all consistent with my hypothesis, and it continues to accumulate. And for what I believe is your hypothesis (i.e. that powerful target-based methods exist) we have little support beyond that there is no proof that that it isn't true.

                            Anyway, by all means, don't despair. But do embrace your position and consider what it implies. For example, try to imagine a way any conceivable indirect encoding with a target can commit to an ambiguous yet essential symmetry before it discovers its target. It's a really interesting question. Of course, like you say, just because you can't think of it doesn't mean it doesn't exist, but even so it sure would be reassuring if you could think of it.

                            Best,

                            ken


                            --- In neat@yahoogroups.com, "shimonw" <shimon@...> wrote:
                            >
                            > Hi Ken,
                            >
                            > Yes, I agree that this conversation is exposing some of our fundamental assumptions, and it is always good to examine and question such assumptions.
                            >
                            > However, if you think that our hypotheses are equally bold, then I fear my main point has been misunderstood. I am not actually hypothesizing that an algorithm with any particular qualities exists. I am only saying that I am not aware of any result which gives good reason to abandon hope in developing methods that work with indirect encodings and excel on "regular" FFs. This is actually not a hypothesis at all; it is an incontrovertible fact. Of course, it doesn't prove that such a result doesn't exist and I certainly don't claim a comprehensive mastery of the literature. But if such a result does exist (though it's hard to imagine it would), anyone is free to simply point it out to me.
                            >
                            > By contrast, your hypothesis was that "an unhappy and unhealthy indirect encoding is one with only a strict target to which to aspire." I continue to maintain that this quite a bold claim. You suggested that the "success" of natural evolution provides a proof of concept to support your hypothesis, but I find this highly problematic. Firstly, it's unclear that evolution was "successful" in any meaningful way because there's an anthropic principle at work here: we are human and so we find humans impressive. The success of evolution is thus a self-fulfilling prophecy. Secondly, and more importantly, even if we accept that evolution is successful, this actually does not validate your hypothesis at all. Showing that there's a nice algorithm that works without a target is not the same as showing that there are *no* algorithms that work with a target, which is what your hypothesis claims.
                            >
                            > That's why I think it's not correct to describe our "hypotheses" as equally bold. In my case, there is nothing to prove: there's just a simple fact. In your case, validating the hypothesis requires proving a negative (and quite a sweeping one at that), which is notoriously difficult to do.
                            >
                            > Finally, I sense from your defense of PicBreeder that I may have been misunderstood in another way. I'm in no way trying to criticize PicBreeder or suggest that research into it is misguided. I think it's a very cool and intriguing result, possibly a highly significant one. You and I may disagree about *why* it's significant, but we definitely agree that it's useful to study further and has the potential to yield important insights.
                            >
                            > I'm just saying that, in the complementary search for optimization methods that use indirect encodings and can excel on hard FFs, there's really no reason for despair.
                            >
                            > Cheers,
                            > Shimon
                            >
                            > --- In neat@yahoogroups.com, "Ken" <kstanley@> wrote:
                            > >
                            > > Hi Shimon,
                            > >
                            > > The nice thing about this type of conversation is that it pushes us
                            > > towards realizing our fundamental assumptions. In that spirit, a few
                            > > other thoughts in response:
                            > >
                            > > Many top biologists and EC researchers would call the behavior of the
                            > > organism in the world (i.e. its interactions with the world) the
                            > > "phenotype." David Fogel once sent me a long and forceful defense of
                            > > such a position, quoting a number of famous biologists. In that
                            > > context, whether you map CPPN->pattern or CPPN->behavior, a target is
                            > > still a target and the neural network is not the phenotype.
                            > >
                            > > In any case, I guess we both hold bold hypotheses in some sense. Your
                            > > hypothesis that there exist methods that can solve hard FFs almost
                            > > deterministically as targets (up to and even beyond delivering us an
                            > > Einstein-level brain) seems boldly optimistic to me. But the big
                            > > difference between our hypotheses is that you have no proof of concept
                            > > for your approach while I do. In your approach, you will set a target
                            > > of Einstein and almost magically the algorithm will then deliver us
                            > > Einstein, which no method has yet shown even the faintest sign of
                            > > achieving. In my approach, Einstein is not the target of the search
                            > > process but instead appears as a byproduct of a search without any
                            > > explicit target, which is (amazingly) actually what happened in the real
                            > > world! It may be tempting to dismiss that distinction, but think about
                            > > it, the fact that any process actually *did* produce Einstein is beyond
                            > > incredible. We should learn from that as much as we can before
                            > > insisting "there's got to be a better way."
                            > >
                            > > Of course, it's great that you're searching for a better way (it sure
                            > > would be nice if we could just set Einstein's IQ as a fitness target),
                            > > but the things is, there will always be a ton of people trying to follow
                            > > that path because it's such a clean and intuitively appealing notion.
                            > > So it hardly needs a vigorous argument to convince someone that it's
                            > > important to try. The danger is instead that in such a rush of
                            > > enthusiasm for intuitively appealing approaches, we sweep aside the much
                            > > more delicate yet equally critical attempt to harness evolution on its
                            > > own terms, which leads to things like novelty search, indirect encoding,
                            > > and behavioral diversity. The power of this process may not be as clean
                            > > and simple as the optimization crowd is hoping; my argument here guards
                            > > against that danger.
                            > >
                            > > I think something similar is going on in our fracture argument: In
                            > > Picbreeder (and to some extent Endlessforms too, which also uses CPPNs)
                            > > we have perhaps the only massive proof-of-concept that exists of
                            > > fracture evolving through an artificial encoding. Picbreeder is a
                            > > treasure trove of almost 10,000 examples, each one with its own lesson
                            > > to teach us on fracture. And my argument is, since we are fortunate
                            > > enough to have this example right in front of us (like the example of
                            > > natural evolution evolving Einstein), let's learn everything we can from
                            > > it before looking away and saying hastily, "there must be something
                            > > better." Picbreeder is a fortuitous goldmine of data that is very hard
                            > > to reproduce; it took several years and the contributions of hundreds
                            > > of people to accumulate its results. Let's scrub every last bit of
                            > > wisdom from that windfall that we can. It's not a question of better
                            > > for me, it's a question of learning from the only evidence we have. How
                            > > do you expect to really develop a better encoding of fracture if you
                            > > ignore the thousands of examples of fracture that already exist? You
                            > > will just end up reinventing the wheel.
                            > >
                            > > The danger of that is evident in some of the assumptions you hold
                            > > already about how CPPNs encode fracture. For example, you are worried
                            > > that it may not do so efficiently, but you have only constructed a
                            > > single thought experiment (your figure 13) instead of taking a measured
                            > > look at the thousands of examples in front of us. In fact, while indeed
                            > > I don't think 6 nodes is particularly much, it's easy enough to find
                            > > fracture in Picbreeder represented in under 6 nodes. For example, the
                            > > pattern below has at least 3 distinct fractured regions (which I
                            > > numbered for clarity): a pinkish region, an ovular inner region, and a
                            > > conic lower region. However, the CPPN that encodes it (shown to its
                            > > left) only uses 4 nodes to encode these 3 fractured regions (notice that
                            > > two of the displayed hidden nodes have outgoing weights of zero, so they
                            > > are not contributing to the output pattern, leaving only the 4 nodes).
                            > > That's a respectable 1.3 nodes per fractured region:
                            > >
                            > > (this image is also available at
                            > > http://eplex.cs.ucf.edu/hyperNEATpage/content/four-node-fracture.jpg
                            > > <http://eplex.cs.ucf.edu/hyperNEATpage/content/four-node-fracture.jpg>
                            > > )
                            > >
                            > >
                            > >
                            > >
                            > > We could argue about the definition of fracture, and perhaps you'd want
                            > > to refine the definition to argue that the above example doesn't really
                            > > have three fractured regions. Or you could argue that the "d" input
                            > > should be counted as a 5th node (to encode the 3 regions, which is still
                            > > a respectable 1.7 nodes per region). But even those would be healthy
                            > > discussions that we could not begin to have without first referring to
                            > > the examples that we already have. Of course (intuitively) more complex
                            > > fracture will require more nodes (maybe at some point 6!), as it should.
                            > > All I'm saying is I think this kind of speculation about whether it's
                            > > efficient or not is jumping the gun: Let's understand it first before
                            > > we worry about fixing it.
                            > >
                            > > So I think there's a difference of philosophies ultimately at work here.
                            > > Because the road ahead to AI is so vast, I'm more motivated by
                            > > maximizing our information than doing "better." For example, that's the
                            > > motivation behind Picbreeder - obviously it is not quantifiably better
                            > > than anything in particular, but it increased our understanding of a
                            > > number of important phenomena, including fracture, representation, and
                            > > non-objective search, by providing useful information. So I say, let's
                            > > see where this takes us - there's still a ton left to learn. But you
                            > > are already itching to start anew and fix things that are not clearly
                            > > broken just when we have hit a goldmine of untapped information.
                            > >
                            > > Without question, neither of our perspectives is definitively superior;
                            > > we are both merely speculating about where time is best invested, and
                            > > obviously neither of us can predict the future. This argument is only
                            > > my own case for my perspective.
                            > >
                            > > Best,
                            > >
                            > > ken
                            > >
                            > >
                            > > --- In neat@yahoogroups.com, "shimonw" wrote:
                            > > >
                            > > > Hi Ken,
                            > > >
                            > > > If I may, I'd like to offer a couple reactions to your latest message.
                            > > This is indeed an interesting discussion that touches on many
                            > > potentially important issues. I hope I can offer a useful perspective
                            > > on these topics.
                            > > >
                            > > > You are right that in a target-based scenario there can be many CPPNs
                            > > that generate the same phenotype. But the CPPN is a genotype, not a
                            > > phenotype. In what I'm calling "regular" FFs, there are potentially many
                            > > *phenotypes* that solve the problem, and for each of them potentially
                            > > many genotypes. So in my mind, there is still a useful distinction
                            > > between target based FFs and regular FFs, though they are perhaps best
                            > > seen as different points on a spectrum.
                            > > >
                            > > > If I understand correctly, you are suggesting that the important
                            > > distinction is how the FF interacts with the search process. But to me,
                            > > this is a difficult criterion to use because the search process could be
                            > > anything. The space of possible black-box optimization methods is huge,
                            > > and what encourages piecewise incremental progress for one method may
                            > > not for another.
                            > > >
                            > > > I don't pretend to know what kind of algorithm would generate
                            > > Einstein-level intelligence (I don't even know if this is a useful
                            > > goal). But as a researcher I highly respect once told me, "your failure
                            > > to imagine it does not constitute a proof that it cannot be done."
                            > > >
                            > > > That's why I think your hypothesis is quite a bold one, because it
                            > > looks at the limitations of a specific class of black-box optimization
                            > > methods and generalizes them to all possible black-box optimization
                            > > methods.
                            > > >
                            > > > To give just one example of the possibilities here: algorithms for
                            > > continuous-armed bandits, such as X-armed bandits and Bayesian
                            > > optimization methods such as GP-UCB, can be applied to black-box
                            > > optimization. These methods avoid the problem of local optima in a
                            > > principled way, by maintaining a posterior distribution over the global
                            > > FF, and using optimism in the face of uncertainty to ensure sufficient
                            > > exploration of the solution space. As a result, these methods have very
                            > > nice theoretical properties, like guaranteed convergence to the global
                            > > optimum in the limit and logarithmic regret bounds along the way.
                            > > >
                            > > > As far as I know, no one has tried to develop versions of these
                            > > methods that can discover NN topologies, and which would thus be
                            > > suitable for optimizing CPPNs or other indirect encodings. But there's
                            > > no a priori reason to think it can't be done. I'm not saying this would
                            > > necessarily work or even that it's the most promising approach to
                            > > explore. I'm just saying it's one example of a principled approach that
                            > > could avoid all the difficulties you mention and which we currently have
                            > > no strong reason to eliminate from contention.
                            > > >
                            > > > So I think it's quite likely that these hard FFs are not unsolvable,
                            > > but just that current methods cannot solve them. Rather than giving up
                            > > on them, in my opinion we should be focusing on developing better
                            > > methods for them.
                            > > >
                            > > > Regarding whether 6 nodes is a lot to represent fracture, I think the
                            > > same point applies: just because we can't think of a better way to do
                            > > it, doesn't mean it doesn't exist. Our paper establishes 6 as an upper
                            > > bound, which can be easily done by example. If I understand correctly,
                            > > you are suggesting that 6 may be a lower bound, which is much more
                            > > difficult to demonstrate.
                            > > >
                            > > > I could definitely imagine that a different type of network, or one
                            > > with a different set of activation functions, might be able to make
                            > > better use of 6 nodes. Even if it's true that there's a trade-off, and
                            > > using fewer nodes means less flexibility, there may be many cases where
                            > > that's a favorable trade-off. We should at least be open to the
                            > > possibility that it some cases the sweet spot is not a representation
                            > > that requires 6 nodes for fracture.
                            > > >
                            > > > I'm looking forward to chatting more about this at GECCO.
                            > > >
                            > > > Cheers,
                            > > > Shimon
                            > > >
                            > > > --- In neat@yahoogroups.com, "Ken" kstanley@ wrote:
                            > > > >
                            > > > > Hi Shimon,
                            > > > >
                            > > > > These are some really interesting and deep issues we're getting
                            > > into. I
                            > > > > am sure they will be the foundation of great discussions at GECCO.
                            > > I'll
                            > > > > try to keep my response more brief than before.
                            > > > >
                            > > > > 1) Indeed recent quadruped results are not in the same setup as in
                            > > your
                            > > > > work; that is a legitimate qualification and they do not constitute
                            > > a
                            > > > > proof that in your particular setup HyperNEAT would succeed.
                            > > However,
                            > > > > they are arguably at least comparably complex. For example, the
                            > > CTRNN
                            > > > > in
                            > > > > http://eplex.cs.ucf.edu/papers/risi_gecco13b.pdf
                            > > > > is a complicated
                            > > > > structure and the CPPN that encodes it is required not only to
                            > > generate
                            > > > > a controller for a single quadruped morphology, but to generate
                            > > multiple
                            > > > > controllers for multiple morphologies, all from the same CPPN. A
                            > > video
                            > > > > of three controllers all from the same CPPN is here:
                            > > > > http://youtu.be/oLSSt5GyHNk
                            > > > >
                            > > > > 2) I think the question of what is "target-based" can actually be
                            > > > > interesting, but I completely agree that the semantic side of the
                            > > > > argument isn't really important. But what I do find interesting is
                            > > the
                            > > > > idea that these two scenarios differ substantively in your mind (as
                            > > you
                            > > > > put it):
                            > > > >
                            > > > > "there is an important difference between FFs that require a single
                            > > > > specific phenotype and FFs that require only that some goal be
                            > > achieved"
                            > > > >
                            > > > > I think by "single specific phenotype" you mean pattern-matching
                            > > (i.e.
                            > > > > image-matching). But I think this distinction doesn't really exist:
                            > > > > There are an unlimited number of networks (i.e. CPPNs) that can
                            > > encode a
                            > > > > particular two-dimensional image, just as there are an unlimited
                            > > number
                            > > > > of CPPNs that can encode a particular behavior or goal (such as
                            > > > > following a line).
                            > > > >
                            > > > > What makes both target-based and fundamentally the same is not the
                            > > > > number of ways they can be solved, but rather the way the fitness
                            > > > > function interacts with the search process. The problem is that
                            > > > > target-based fitness encourages piece-wise incremental progress,
                            > > which
                            > > > > tends to be highly deceptive. For example, with an image, matching
                            > > a
                            > > > > single pixel can raise fitness even if the function that causes that
                            > > > > pixel to be matched has no relationship whatsoever to the overall
                            > > > > regularities in the target pattern. The same is true in control
                            > > tasks:
                            > > > > A mutation that causes a quadruped to lunge forward clumsily is
                            > > rewarded
                            > > > > for the slight improvement in fitness, whereas an important
                            > > underlying
                            > > > > regularity (e.g. oscillation) necessary to get really good at the
                            > > task
                            > > > > is never established. Non-target-based fitness (or what I call
                            > > > > "non-objective") avoids these problems because it *does* reward
                            > > > > establishing regularities: it appreciates new regularities even if
                            > > they
                            > > > > don't immediately raise fitness.
                            > > > >
                            > > > > But what's most interesting to me is your intuition that there is
                            > > some
                            > > > > way to sidestep the tradeoff in combining indirect encodings with
                            > > > > target-based fitness functions. That is, you suspect there may
                            > > exist an
                            > > > > indirect encoding that can both elaborate regularities
                            > > compositionally
                            > > > > over very many generations and also almost deterministically satisfy
                            > > an
                            > > > > arbitrary target-based objective. I am convinced that is not the
                            > > case
                            > > > > (it would be too good to be true), but you rightly point out that my
                            > > > > position is a hypothesis. However, just as a thought experiment,
                            > > can
                            > > > > you really imagine any neural network encoding (including DNA) that
                            > > > > would give you Einstein-level intelligence if the target was simply
                            > > to
                            > > > > score maximally on an IQ test? It is plain to me that the stepping
                            > > > > stones to human level cannot be crossed in the presence of a
                            > > > > target-based objective (or any one conceivable). However,
                            > > > > interestingly, that does not mean that such a neural network does
                            > > not
                            > > > > exist in the search space. It just means a target won't get you
                            > > there.
                            > > > >
                            > > > > 3) We apparently disagree on whether 6 is a big number (for nodes
                            > > needed
                            > > > > to represent fracture). If you try to devise an encoding that
                            > > > > represents fracture with fewer than that, my intuition is that you
                            > > would
                            > > > > be sacrificing an enormous breadth of flexibility in the kinds of
                            > > > > fracture (and patterns in general) that you can represent. Think
                            > > about
                            > > > > the number of "degrees of freedom" in fracture: it's not really
                            > > > > well-defined, but a single line of fracture has a lot of dimensions:
                            > > > >
                            > > > > -position
                            > > > > -orientation
                            > > > > -curvature (which can be arbitrarily complex)
                            > > > > -fracture gradient (that is, one region can smoothly transition into
                            > > > > another in a kind of interpolated transitional zone)
                            > > > > -embedding (fracture can potentially be hierarchical or regular,
                            > > i.e.
                            > > > > dependent on higher-level containing regions)
                            > > > >
                            > > > > The idea that all of that can be encoded in anything less than about
                            > > 6
                            > > > > simple functions seems optimistic. Of course, if you are willing to
                            > > > > sacrifice some of those degrees of freedom, then you can get fewer
                            > > > > nodes, but the exquisite minutia of such patterns would be lost (and
                            > > > > you'd move more towards direct encoding, like the wavelets do).
                            > > > >
                            > > > > By the way, here are some cool examples of complicated kinds of
                            > > fracture
                            > > > > from Picbreeder (such as fractured periodic zones, but even more
                            > > complex
                            > > > > than even that):
                            > > > >
                            > > > > Nontrivial color patterns in each zone:
                            > > > > http://picbreeder.org/search/showgenome.php?sid=11328
                            > > > >
                            > > > >
                            > > > > Very complex brush-stroke pattern inside the apple vs. outside (the
                            > > DNA
                            > > > > for this one is fascinating):
                            > > > > http://picbreeder.org/search/showgenome.php?sid=7109
                            > > > >
                            > > > >
                            > > > >
                            > > > > Fractured regions covering a distorted surface, giving a remarkable
                            > > > > underlying curvature:
                            > > > > http://picbreeder.org/search/showgenome.php?sid=2684
                            > > > >
                            > > > >
                            > > > > A color variant:
                            > > http://picbreeder.org/search/showgenome.php?sid=6652
                            > > > >
                            > > > >
                            > > > > When I see so many remarkable examples like these on Picbreeder, the
                            > > > > message to me is that we have only scratched the surface of what
                            > > CPPN
                            > > > > encoding can teach us about representation. That's why I think it's
                            > > > > premature (in the face of such a treasure trove of evidence) to be
                            > > > > worrying about whether we can represent some subcomponent of such
                            > > > > patterns with less than 6 nodes. At present I'm amazed they can be
                            > > > > represented at all, let alone with under 6 nodes.
                            > > > >
                            > > > > Thanks very much for raising these issues in any case - it's a great
                            > > > > debate to have about representation and search and goes to the heart
                            > > of
                            > > > > the field of GDS/indirect encoding.
                            > > > >
                            > > > > Best,
                            > > > >
                            > > > > ken
                            > > > >
                            > > > > --- In neat@yahoogroups.com, "shimonw" wrote:
                            > > > > >
                            > > > > > Hi Ken,
                            > > > > >
                            > > > > > Thanks for your interesting and thought-provoking comments. I'll
                            > > give
                            > > > > some brief reactions here, and I hope we can have a productive
                            > > > > discussion about it at GECCO.
                            > > > > >
                            > > > > > 1) I won't comment for now on the new experiments you've been
                            > > running
                            > > > > because the details are not available yet. Instead, I will just
                            > > comment
                            > > > > on the various walking gait domains, because the details of the
                            > > > > additional results you mentioned are in that case already available,
                            > > in
                            > > > > newly published GECCO papers.
                            > > > > >
                            > > > > > FIrst of all, these are very cool results, and nice success
                            > > stories
                            > > > > for HyperNEAT. It's great to see that the GDS community is
                            > > continuing
                            > > > > the push the envelope in terms of robot locomotion, which seems to
                            > > be
                            > > > > emerging as a nice application for indirect encodings.
                            > > > > >
                            > > > > > However, I would simply caution against placing all these
                            > > variations
                            > > > > on the walking gait task on a one-dimensional spectrum of
                            > > difficulty.
                            > > > > In our paper, we took the orignal walking gait task and made only
                            > > one
                            > > > > change (increasing the mass of the torso) which clearly makes the
                            > > task
                            > > > > harder. In the papers Ken mentioned, there are other differences,
                            > > e.g.,
                            > > > > the use of an SUPG encoding, the use of CTRNNs, the use of a
                            > > different
                            > > > > substrate topology, and varying the length of the legs.
                            > > > > >
                            > > > > > It's completely legitimate to make these changes and they don't
                            > > > > detract from the results at all. But they are confounding factors
                            > > when
                            > > > > it comes to comparing HyperNEAT's performance on walking gait tasks
                            > > of
                            > > > > different difficulties. Some of these changes may make the task
                            > > harder
                            > > > > in some ways but other changes may make it easier in others ways,
                            > > and so
                            > > > > far we don't have experimental results that control for these
                            > > factors.
                            > > > > >
                            > > > > > 2) I hope to avoid a semantic debate about what is a
                            > > "target-based"
                            > > > > FF. If you want to include what I was calling a "regular" FF in the
                            > > > > scope of target-based FFs, I won't object. But my point is that,
                            > > > > regardless of what you call them, there is an important difference
                            > > > > between FFs that require a single specific phenotype and FFs that
                            > > > > require only that some goal be achieved. For the latter, there may
                            > > be
                            > > > > many ways to skin the cat, though in any interesting problem, good
                            > > > > solutions are still quite rare. The distinction is important
                            > > because,
                            > > > > while the former category is arguably only of theoretical interest,
                            > > the
                            > > > > second category corresponds to what I consider the most interesting
                            > > and
                            > > > > important class of FFs, because these are the FFs that naturally
                            > > arise
                            > > > > in a very large number of real-world optimization problems.
                            > > > > >
                            > > > > > So, using your terminology, I agree it is reasonable to say
                            > > > > HyperNEAT's difficulties with fracture are probably limited to
                            > > > > "target-based" FFs. But for me, this is a potentially serious
                            > > > > restriction, since target-based FFs include not just pathological
                            > > FFs
                            > > > > but a large class of real-world FFs.
                            > > > > >
                            > > > > > Regarding the wavelet method, I would still definitely call this
                            > > an
                            > > > > indirect encoding, but it is in some sense "less indirect" because
                            > > it
                            > > > > lacks the compositional functions of HyperNEAT. It's a perfectly
                            > > > > reasonable hypothesis that such compositionality is advantageous in
                            > > some
                            > > > > ways. However, if that's true, then we should be able to
                            > > demonstrate
                            > > > > that by finding tasks where HyperNEAT outperforms the wavelet method
                            > > > > (FYI, in the extended analysis Thomas is conducting for his master's
                            > > > > thesis, we have found one variation of the line-following task where
                            > > > > this is the case). So in that sense, the wavelet method is a useful
                            > > > > baseline, which is the whole reason we introduced it.
                            > > > > >
                            > > > > > 3) If it's true that it takes ~6 nodes to represent fracture, then
                            > > I
                            > > > > think this a potentially important point. On regular FFs, it could
                            > > > > contribute to the difficulty of discovering fracture, since
                            > > > > representations that have only some of the necessary components may
                            > > not
                            > > > > be rewarded (I think you would call this "deception"). In
                            > > > > Picbreeder-like tasks, we already see that fracture can evolve
                            > > anyway,
                            > > > > but this doesn't mean it wouldn't do it even better if fracture
                            > > required
                            > > > > fewer nodes to represent. If we think that the solutions to many
                            > > > > interesting problems require fracture, then I think it's at least
                            > > > > possible that one could devise a better representation for such
                            > > problems
                            > > > > than the CPPNs used in HyperNEAT.
                            > > > > >
                            > > > > > As I understand it, your approach to dealing with "regular" FFs is
                            > > to
                            > > > > reformulate them "in a way that respects the need for phenotypic
                            > > > > diversity and open-endedness." That's a very interesting approach
                            > > and I
                            > > > > think it has a lot of merit, but I don't think it will solve all our
                            > > > > problems because it presupposes that such a reformulation is
                            > > possible
                            > > > > (or that the prior knowledge it requires is available). In some
                            > > cases,
                            > > > > I think we are just stuck with hard FFs and we need to develop
                            > > better
                            > > > > methods to deal with them.
                            > > > > >
                            > > > > > Regarding the idea that "an unhappy and unhealthy indirect
                            > > encoding is
                            > > > > one with only a strict target to which to aspire": this is a very
                            > > > > intriguing hypothesis, but to me it is only a hypothesis. I think
                            > > it's
                            > > > > true of existing indirect encodings, but that for me does not in any
                            > > way
                            > > > > rule out the possibility of developing a different indirect encoding
                            > > for
                            > > > > which that is not true, and I think that's one thing we should be
                            > > > > working on. One of the reasons that I'm interested in indirect
                            > > > > encodings is that I strongly believe this can be done.
                            > > > > >
                            > > > > > Cheers,
                            > > > > > Shimon
                            > > > > >
                            > > > >
                            > > >
                            > >
                            >
                          • Ken
                            Thanks to Jeff an JBM for alerting me to even more recent evidence that target-based fitness isn t healthy for indirect encodings. This article is a really
                            Message 13 of 14 , Jul 1, 2013
                            • 0 Attachment
                              Thanks to Jeff an JBM for alerting me to even more recent evidence that target-based fitness isn't healthy for indirect encodings. This article is a really nice find - it's in Nature and it's from people apparently not part of our community, yet comes to a very similar conclusion on the problem with target-based fitness for indirect encoding. The title is, "Adaptive dynamics under development-based genotype–phenotype maps." :

                              http://www.nature.com/nature/journal/v497/n7449/full/nature12142.html

                              Here is a quote from the abstract:
                              "We show that the complexity of the genotype–phenotype map prevents substantial adaptation in some of the phenotype–fitness maps: sustained adaptation is only possible using `roughness' or `few-traits' phenotype–fitness maps."

                              In other words, a direct target-based fitness prevents adaptation. This paper is a really nice companion to my paper with Brian Woolley.


                              Best,

                              ken

                              --- In neat@yahoogroups.com, "Ken" <kstanley@...> wrote:
                              >
                              >
                              >
                              > Hi Shimon,
                              >
                              > I sense from your saying, "I am only saying that I am not aware of any result which gives good reason to abandon hope in developing methods..." that you are being very careful to remain as safely agnostic as possible. But I would guess that you (or at least most people) wouldn't be working on finding "...methods that work with indirect encodings and excel on 'regular' FFs" unless you actually believed they exist (which would indeed be a bold hypothesis). Or maybe you really do choose research directions with no confidence in them beyond total neutrality, but sometimes there's no harm in throwing your hat in the ring and seeing where your hopes (i.e. hypotheses) lead you. In any case, since you say you aren't aware of any, I'd like to offer a number of published results that do support (not prove, but certainly support) doubting the hypothesis that such methods exist.
                              >
                              > First, the most obvious is my paper with Brian Woolley (http://eplex.cs.ucf.edu/papers/woolley_gecco11.pdf) on re-evolving Picbreeder images as targets. That of course goes straight to the heart of the issue we are discussing, which is whether it will be possible to evolve to particular targets with regular FFs. In that paper, the answer is a resounding "no," although Picbreeder users routinely evolve such images (meaning images as complex and interesting as the Skull or Butterfly), yet not as targets. Of course that does not prove that some other indirect encoding does not exist that can evolve to such images as targets, but what should be worrisome is the extreme disparity in that work: Picbreeder images that are evolved in only a few dozen generations (with a population of 15) cannot be re-evolved effectively in 30,000 generations! I mean, you could logically say "there is no proof that a method better than NEAT+CPPNs at evolving to targets does not exist," but surely NEAT+CPPNs aren't *that* terrible of a combination - they'd have to be abjectly terrible for there to be hope that something can handily substitute in the face of such abysmal failure. So the seed of doubt is reasonably planted. At the same time, we see in the same Picbreeder system (with NEAT+CPPNs) routine discovery of such images as non-objectives in only a few generations. That's interesting.
                              >
                              > Second, the novelty search and behavioral diversity results (from our group and others; see http://eplex.cs.ucf.edu/noveltysearch/userspage/) deserve to be counted as evidence (not proof) supporting my hypothesis. We see there many examples of a non-objective search easily solving problems that are difficult or impossible for purely target-based searches with various algorithms. Certainly after some time all that accumulated evidence is enough to raise doubt as to whether there exists some kind of magical target-driven algorithm that always gives you what you want. But it's easy to misinterpret the lesson in novelty search's results. It's not its successes that are most important to the current discussion; rather it's the *failures* of the target-driven variants that are most suggestive: We are talking about a situation in which entirely reasonable target-based setups that follow well accepted practices from decades of research are being squashed by a ridiculous setup that doesn't even know what it's trying to do! Rather than celebrating the "success" of novelty search, we should be sweating with anxiety over this disconcerting failure of the very paradigm over which you are saying "there is no reason for despair." Perhaps not yet despair, but doubt - certainly.
                              >
                              > Third, sometimes just thinking a problem through is worth a few cents. Take the example of evolving the pattern of a face. If an indirect encoding is to evolve a face properly, then logic suggests that it simply must discover symmetry early on. It is true that it could theoretically discover a face by discovering both sides separately, but that is not what I am calling "healthy" because any mutation of such a perverse representation will destroy the face-ness of the face (which means it cannot lead to new faces). It also would be inefficient because it would require finding both sides independently when they are not independent. In fact, the more complex each side of the face is, the worse this misadventure becomes. At an extreme, if each side is as complex as a human brain, forget it, the need to make an astronomically unlikely sequence of discoveries twice is an absolute deal-breaker.
                              >
                              > Now the question is, how can there possibly be any magical target-based algorithm or encoding that "realizes" that symmetry needs to be rewarded early on when symmetry on its own does not even remotely resemble a face? Of course if there is special knowledge, like if we know that faces require symmetry, we could try to incorporate that knowledge into the FF, but that's not what you're suggesting - you're suggesting there may exist some method that doesn't need to know the stepping stones but magically rewards them anyway simply by virtue of rewarding only the target. Once again, the possibility remains, but the reason for doubt here is significant.
                              >
                              > Finally, the idea that the products of nature are indeed impressive is not mere conjecture. For example, see our publication on the topic of impressiveness (which also establishes a connection between novelty and impressiveness):
                              > http://eplex.cs.ucf.edu/papers/lehman_alife12b.pdf
                              >
                              > Now I'm not taking an extreme position here. I'm not saying a target can't be anywhere in the mix whatsoever (so I leave plenty of room e.g. for the behavioral diversity combination that Stef and JBM have developed over the years). I'm just saying there is good reason to doubt the clarion call for pure target-based FFs leading us to the promised land. And as an extra bonus in the name of hope for the non-objective alternative, we have the fact that Einstein did evolve. That is, I didn't mean in my previous post to say that Einstein is a proof that non-objective evolution is the only way; rather my point is that it is a proof-of-concept that it *can* work. Your alternative on the other hand boasts no such proof of concept. We don't have any evidence it can or did work at finding human-level intelligence, ever, anthropic concern or not.
                              >
                              > So we've got publications, we've got results, we've got reasoning, we've got natural precedent, all consistent with my hypothesis, and it continues to accumulate. And for what I believe is your hypothesis (i.e. that powerful target-based methods exist) we have little support beyond that there is no proof that that it isn't true.
                              >
                              > Anyway, by all means, don't despair. But do embrace your position and consider what it implies. For example, try to imagine a way any conceivable indirect encoding with a target can commit to an ambiguous yet essential symmetry before it discovers its target. It's a really interesting question. Of course, like you say, just because you can't think of it doesn't mean it doesn't exist, but even so it sure would be reassuring if you could think of it.
                              >
                              > Best,
                              >
                              > ken
                              >
                              >
                              > --- In neat@yahoogroups.com, "shimonw" <shimon@> wrote:
                              > >
                              > > Hi Ken,
                              > >
                              > > Yes, I agree that this conversation is exposing some of our fundamental assumptions, and it is always good to examine and question such assumptions.
                              > >
                              > > However, if you think that our hypotheses are equally bold, then I fear my main point has been misunderstood. I am not actually hypothesizing that an algorithm with any particular qualities exists. I am only saying that I am not aware of any result which gives good reason to abandon hope in developing methods that work with indirect encodings and excel on "regular" FFs. This is actually not a hypothesis at all; it is an incontrovertible fact. Of course, it doesn't prove that such a result doesn't exist and I certainly don't claim a comprehensive mastery of the literature. But if such a result does exist (though it's hard to imagine it would), anyone is free to simply point it out to me.
                              > >
                              > > By contrast, your hypothesis was that "an unhappy and unhealthy indirect encoding is one with only a strict target to which to aspire." I continue to maintain that this quite a bold claim. You suggested that the "success" of natural evolution provides a proof of concept to support your hypothesis, but I find this highly problematic. Firstly, it's unclear that evolution was "successful" in any meaningful way because there's an anthropic principle at work here: we are human and so we find humans impressive. The success of evolution is thus a self-fulfilling prophecy. Secondly, and more importantly, even if we accept that evolution is successful, this actually does not validate your hypothesis at all. Showing that there's a nice algorithm that works without a target is not the same as showing that there are *no* algorithms that work with a target, which is what your hypothesis claims.
                              > >
                              > > That's why I think it's not correct to describe our "hypotheses" as equally bold. In my case, there is nothing to prove: there's just a simple fact. In your case, validating the hypothesis requires proving a negative (and quite a sweeping one at that), which is notoriously difficult to do.
                              > >
                              > > Finally, I sense from your defense of PicBreeder that I may have been misunderstood in another way. I'm in no way trying to criticize PicBreeder or suggest that research into it is misguided. I think it's a very cool and intriguing result, possibly a highly significant one. You and I may disagree about *why* it's significant, but we definitely agree that it's useful to study further and has the potential to yield important insights.
                              > >
                              > > I'm just saying that, in the complementary search for optimization methods that use indirect encodings and can excel on hard FFs, there's really no reason for despair.
                              > >
                              > > Cheers,
                              > > Shimon
                              > >
                              > > --- In neat@yahoogroups.com, "Ken" <kstanley@> wrote:
                              > > >
                              > > > Hi Shimon,
                              > > >
                              > > > The nice thing about this type of conversation is that it pushes us
                              > > > towards realizing our fundamental assumptions. In that spirit, a few
                              > > > other thoughts in response:
                              > > >
                              > > > Many top biologists and EC researchers would call the behavior of the
                              > > > organism in the world (i.e. its interactions with the world) the
                              > > > "phenotype." David Fogel once sent me a long and forceful defense of
                              > > > such a position, quoting a number of famous biologists. In that
                              > > > context, whether you map CPPN->pattern or CPPN->behavior, a target is
                              > > > still a target and the neural network is not the phenotype.
                              > > >
                              > > > In any case, I guess we both hold bold hypotheses in some sense. Your
                              > > > hypothesis that there exist methods that can solve hard FFs almost
                              > > > deterministically as targets (up to and even beyond delivering us an
                              > > > Einstein-level brain) seems boldly optimistic to me. But the big
                              > > > difference between our hypotheses is that you have no proof of concept
                              > > > for your approach while I do. In your approach, you will set a target
                              > > > of Einstein and almost magically the algorithm will then deliver us
                              > > > Einstein, which no method has yet shown even the faintest sign of
                              > > > achieving. In my approach, Einstein is not the target of the search
                              > > > process but instead appears as a byproduct of a search without any
                              > > > explicit target, which is (amazingly) actually what happened in the real
                              > > > world! It may be tempting to dismiss that distinction, but think about
                              > > > it, the fact that any process actually *did* produce Einstein is beyond
                              > > > incredible. We should learn from that as much as we can before
                              > > > insisting "there's got to be a better way."
                              > > >
                              > > > Of course, it's great that you're searching for a better way (it sure
                              > > > would be nice if we could just set Einstein's IQ as a fitness target),
                              > > > but the things is, there will always be a ton of people trying to follow
                              > > > that path because it's such a clean and intuitively appealing notion.
                              > > > So it hardly needs a vigorous argument to convince someone that it's
                              > > > important to try. The danger is instead that in such a rush of
                              > > > enthusiasm for intuitively appealing approaches, we sweep aside the much
                              > > > more delicate yet equally critical attempt to harness evolution on its
                              > > > own terms, which leads to things like novelty search, indirect encoding,
                              > > > and behavioral diversity. The power of this process may not be as clean
                              > > > and simple as the optimization crowd is hoping; my argument here guards
                              > > > against that danger.
                              > > >
                              > > > I think something similar is going on in our fracture argument: In
                              > > > Picbreeder (and to some extent Endlessforms too, which also uses CPPNs)
                              > > > we have perhaps the only massive proof-of-concept that exists of
                              > > > fracture evolving through an artificial encoding. Picbreeder is a
                              > > > treasure trove of almost 10,000 examples, each one with its own lesson
                              > > > to teach us on fracture. And my argument is, since we are fortunate
                              > > > enough to have this example right in front of us (like the example of
                              > > > natural evolution evolving Einstein), let's learn everything we can from
                              > > > it before looking away and saying hastily, "there must be something
                              > > > better." Picbreeder is a fortuitous goldmine of data that is very hard
                              > > > to reproduce; it took several years and the contributions of hundreds
                              > > > of people to accumulate its results. Let's scrub every last bit of
                              > > > wisdom from that windfall that we can. It's not a question of better
                              > > > for me, it's a question of learning from the only evidence we have. How
                              > > > do you expect to really develop a better encoding of fracture if you
                              > > > ignore the thousands of examples of fracture that already exist? You
                              > > > will just end up reinventing the wheel.
                              > > >
                              > > > The danger of that is evident in some of the assumptions you hold
                              > > > already about how CPPNs encode fracture. For example, you are worried
                              > > > that it may not do so efficiently, but you have only constructed a
                              > > > single thought experiment (your figure 13) instead of taking a measured
                              > > > look at the thousands of examples in front of us. In fact, while indeed
                              > > > I don't think 6 nodes is particularly much, it's easy enough to find
                              > > > fracture in Picbreeder represented in under 6 nodes. For example, the
                              > > > pattern below has at least 3 distinct fractured regions (which I
                              > > > numbered for clarity): a pinkish region, an ovular inner region, and a
                              > > > conic lower region. However, the CPPN that encodes it (shown to its
                              > > > left) only uses 4 nodes to encode these 3 fractured regions (notice that
                              > > > two of the displayed hidden nodes have outgoing weights of zero, so they
                              > > > are not contributing to the output pattern, leaving only the 4 nodes).
                              > > > That's a respectable 1.3 nodes per fractured region:
                              > > >
                              > > > (this image is also available at
                              > > > http://eplex.cs.ucf.edu/hyperNEATpage/content/four-node-fracture.jpg
                              > > > <http://eplex.cs.ucf.edu/hyperNEATpage/content/four-node-fracture.jpg>
                              > > > )
                              > > >
                              > > >
                              > > >
                              > > >
                              > > > We could argue about the definition of fracture, and perhaps you'd want
                              > > > to refine the definition to argue that the above example doesn't really
                              > > > have three fractured regions. Or you could argue that the "d" input
                              > > > should be counted as a 5th node (to encode the 3 regions, which is still
                              > > > a respectable 1.7 nodes per region). But even those would be healthy
                              > > > discussions that we could not begin to have without first referring to
                              > > > the examples that we already have. Of course (intuitively) more complex
                              > > > fracture will require more nodes (maybe at some point 6!), as it should.
                              > > > All I'm saying is I think this kind of speculation about whether it's
                              > > > efficient or not is jumping the gun: Let's understand it first before
                              > > > we worry about fixing it.
                              > > >
                              > > > So I think there's a difference of philosophies ultimately at work here.
                              > > > Because the road ahead to AI is so vast, I'm more motivated by
                              > > > maximizing our information than doing "better." For example, that's the
                              > > > motivation behind Picbreeder - obviously it is not quantifiably better
                              > > > than anything in particular, but it increased our understanding of a
                              > > > number of important phenomena, including fracture, representation, and
                              > > > non-objective search, by providing useful information. So I say, let's
                              > > > see where this takes us - there's still a ton left to learn. But you
                              > > > are already itching to start anew and fix things that are not clearly
                              > > > broken just when we have hit a goldmine of untapped information.
                              > > >
                              > > > Without question, neither of our perspectives is definitively superior;
                              > > > we are both merely speculating about where time is best invested, and
                              > > > obviously neither of us can predict the future. This argument is only
                              > > > my own case for my perspective.
                              > > >
                              > > > Best,
                              > > >
                              > > > ken
                              > > >
                              > > >
                              > > > --- In neat@yahoogroups.com, "shimonw" wrote:
                              > > > >
                              > > > > Hi Ken,
                              > > > >
                              > > > > If I may, I'd like to offer a couple reactions to your latest message.
                              > > > This is indeed an interesting discussion that touches on many
                              > > > potentially important issues. I hope I can offer a useful perspective
                              > > > on these topics.
                              > > > >
                              > > > > You are right that in a target-based scenario there can be many CPPNs
                              > > > that generate the same phenotype. But the CPPN is a genotype, not a
                              > > > phenotype. In what I'm calling "regular" FFs, there are potentially many
                              > > > *phenotypes* that solve the problem, and for each of them potentially
                              > > > many genotypes. So in my mind, there is still a useful distinction
                              > > > between target based FFs and regular FFs, though they are perhaps best
                              > > > seen as different points on a spectrum.
                              > > > >
                              > > > > If I understand correctly, you are suggesting that the important
                              > > > distinction is how the FF interacts with the search process. But to me,
                              > > > this is a difficult criterion to use because the search process could be
                              > > > anything. The space of possible black-box optimization methods is huge,
                              > > > and what encourages piecewise incremental progress for one method may
                              > > > not for another.
                              > > > >
                              > > > > I don't pretend to know what kind of algorithm would generate
                              > > > Einstein-level intelligence (I don't even know if this is a useful
                              > > > goal). But as a researcher I highly respect once told me, "your failure
                              > > > to imagine it does not constitute a proof that it cannot be done."
                              > > > >
                              > > > > That's why I think your hypothesis is quite a bold one, because it
                              > > > looks at the limitations of a specific class of black-box optimization
                              > > > methods and generalizes them to all possible black-box optimization
                              > > > methods.
                              > > > >
                              > > > > To give just one example of the possibilities here: algorithms for
                              > > > continuous-armed bandits, such as X-armed bandits and Bayesian
                              > > > optimization methods such as GP-UCB, can be applied to black-box
                              > > > optimization. These methods avoid the problem of local optima in a
                              > > > principled way, by maintaining a posterior distribution over the global
                              > > > FF, and using optimism in the face of uncertainty to ensure sufficient
                              > > > exploration of the solution space. As a result, these methods have very
                              > > > nice theoretical properties, like guaranteed convergence to the global
                              > > > optimum in the limit and logarithmic regret bounds along the way.
                              > > > >
                              > > > > As far as I know, no one has tried to develop versions of these
                              > > > methods that can discover NN topologies, and which would thus be
                              > > > suitable for optimizing CPPNs or other indirect encodings. But there's
                              > > > no a priori reason to think it can't be done. I'm not saying this would
                              > > > necessarily work or even that it's the most promising approach to
                              > > > explore. I'm just saying it's one example of a principled approach that
                              > > > could avoid all the difficulties you mention and which we currently have
                              > > > no strong reason to eliminate from contention.
                              > > > >
                              > > > > So I think it's quite likely that these hard FFs are not unsolvable,
                              > > > but just that current methods cannot solve them. Rather than giving up
                              > > > on them, in my opinion we should be focusing on developing better
                              > > > methods for them.
                              > > > >
                              > > > > Regarding whether 6 nodes is a lot to represent fracture, I think the
                              > > > same point applies: just because we can't think of a better way to do
                              > > > it, doesn't mean it doesn't exist. Our paper establishes 6 as an upper
                              > > > bound, which can be easily done by example. If I understand correctly,
                              > > > you are suggesting that 6 may be a lower bound, which is much more
                              > > > difficult to demonstrate.
                              > > > >
                              > > > > I could definitely imagine that a different type of network, or one
                              > > > with a different set of activation functions, might be able to make
                              > > > better use of 6 nodes. Even if it's true that there's a trade-off, and
                              > > > using fewer nodes means less flexibility, there may be many cases where
                              > > > that's a favorable trade-off. We should at least be open to the
                              > > > possibility that it some cases the sweet spot is not a representation
                              > > > that requires 6 nodes for fracture.
                              > > > >
                              > > > > I'm looking forward to chatting more about this at GECCO.
                              > > > >
                              > > > > Cheers,
                              > > > > Shimon
                              > > > >
                              > > > > --- In neat@yahoogroups.com, "Ken" kstanley@ wrote:
                              > > > > >
                              > > > > > Hi Shimon,
                              > > > > >
                              > > > > > These are some really interesting and deep issues we're getting
                              > > > into. I
                              > > > > > am sure they will be the foundation of great discussions at GECCO.
                              > > > I'll
                              > > > > > try to keep my response more brief than before.
                              > > > > >
                              > > > > > 1) Indeed recent quadruped results are not in the same setup as in
                              > > > your
                              > > > > > work; that is a legitimate qualification and they do not constitute
                              > > > a
                              > > > > > proof that in your particular setup HyperNEAT would succeed.
                              > > > However,
                              > > > > > they are arguably at least comparably complex. For example, the
                              > > > CTRNN
                              > > > > > in
                              > > > > > http://eplex.cs.ucf.edu/papers/risi_gecco13b.pdf
                              > > > > > is a complicated
                              > > > > > structure and the CPPN that encodes it is required not only to
                              > > > generate
                              > > > > > a controller for a single quadruped morphology, but to generate
                              > > > multiple
                              > > > > > controllers for multiple morphologies, all from the same CPPN. A
                              > > > video
                              > > > > > of three controllers all from the same CPPN is here:
                              > > > > > http://youtu.be/oLSSt5GyHNk
                              > > > > >
                              > > > > > 2) I think the question of what is "target-based" can actually be
                              > > > > > interesting, but I completely agree that the semantic side of the
                              > > > > > argument isn't really important. But what I do find interesting is
                              > > > the
                              > > > > > idea that these two scenarios differ substantively in your mind (as
                              > > > you
                              > > > > > put it):
                              > > > > >
                              > > > > > "there is an important difference between FFs that require a single
                              > > > > > specific phenotype and FFs that require only that some goal be
                              > > > achieved"
                              > > > > >
                              > > > > > I think by "single specific phenotype" you mean pattern-matching
                              > > > (i.e.
                              > > > > > image-matching). But I think this distinction doesn't really exist:
                              > > > > > There are an unlimited number of networks (i.e. CPPNs) that can
                              > > > encode a
                              > > > > > particular two-dimensional image, just as there are an unlimited
                              > > > number
                              > > > > > of CPPNs that can encode a particular behavior or goal (such as
                              > > > > > following a line).
                              > > > > >
                              > > > > > What makes both target-based and fundamentally the same is not the
                              > > > > > number of ways they can be solved, but rather the way the fitness
                              > > > > > function interacts with the search process. The problem is that
                              > > > > > target-based fitness encourages piece-wise incremental progress,
                              > > > which
                              > > > > > tends to be highly deceptive. For example, with an image, matching
                              > > > a
                              > > > > > single pixel can raise fitness even if the function that causes that
                              > > > > > pixel to be matched has no relationship whatsoever to the overall
                              > > > > > regularities in the target pattern. The same is true in control
                              > > > tasks:
                              > > > > > A mutation that causes a quadruped to lunge forward clumsily is
                              > > > rewarded
                              > > > > > for the slight improvement in fitness, whereas an important
                              > > > underlying
                              > > > > > regularity (e.g. oscillation) necessary to get really good at the
                              > > > task
                              > > > > > is never established. Non-target-based fitness (or what I call
                              > > > > > "non-objective") avoids these problems because it *does* reward
                              > > > > > establishing regularities: it appreciates new regularities even if
                              > > > they
                              > > > > > don't immediately raise fitness.
                              > > > > >
                              > > > > > But what's most interesting to me is your intuition that there is
                              > > > some
                              > > > > > way to sidestep the tradeoff in combining indirect encodings with
                              > > > > > target-based fitness functions. That is, you suspect there may
                              > > > exist an
                              > > > > > indirect encoding that can both elaborate regularities
                              > > > compositionally
                              > > > > > over very many generations and also almost deterministically satisfy
                              > > > an
                              > > > > > arbitrary target-based objective. I am convinced that is not the
                              > > > case
                              > > > > > (it would be too good to be true), but you rightly point out that my
                              > > > > > position is a hypothesis. However, just as a thought experiment,
                              > > > can
                              > > > > > you really imagine any neural network encoding (including DNA) that
                              > > > > > would give you Einstein-level intelligence if the target was simply
                              > > > to
                              > > > > > score maximally on an IQ test? It is plain to me that the stepping
                              > > > > > stones to human level cannot be crossed in the presence of a
                              > > > > > target-based objective (or any one conceivable). However,
                              > > > > > interestingly, that does not mean that such a neural network does
                              > > > not
                              > > > > > exist in the search space. It just means a target won't get you
                              > > > there.
                              > > > > >
                              > > > > > 3) We apparently disagree on whether 6 is a big number (for nodes
                              > > > needed
                              > > > > > to represent fracture). If you try to devise an encoding that
                              > > > > > represents fracture with fewer than that, my intuition is that you
                              > > > would
                              > > > > > be sacrificing an enormous breadth of flexibility in the kinds of
                              > > > > > fracture (and patterns in general) that you can represent. Think
                              > > > about
                              > > > > > the number of "degrees of freedom" in fracture: it's not really
                              > > > > > well-defined, but a single line of fracture has a lot of dimensions:
                              > > > > >
                              > > > > > -position
                              > > > > > -orientation
                              > > > > > -curvature (which can be arbitrarily complex)
                              > > > > > -fracture gradient (that is, one region can smoothly transition into
                              > > > > > another in a kind of interpolated transitional zone)
                              > > > > > -embedding (fracture can potentially be hierarchical or regular,
                              > > > i.e.
                              > > > > > dependent on higher-level containing regions)
                              > > > > >
                              > > > > > The idea that all of that can be encoded in anything less than about
                              > > > 6
                              > > > > > simple functions seems optimistic. Of course, if you are willing to
                              > > > > > sacrifice some of those degrees of freedom, then you can get fewer
                              > > > > > nodes, but the exquisite minutia of such patterns would be lost (and
                              > > > > > you'd move more towards direct encoding, like the wavelets do).
                              > > > > >
                              > > > > > By the way, here are some cool examples of complicated kinds of
                              > > > fracture
                              > > > > > from Picbreeder (such as fractured periodic zones, but even more
                              > > > complex
                              > > > > > than even that):
                              > > > > >
                              > > > > > Nontrivial color patterns in each zone:
                              > > > > > http://picbreeder.org/search/showgenome.php?sid=11328
                              > > > > >
                              > > > > >
                              > > > > > Very complex brush-stroke pattern inside the apple vs. outside (the
                              > > > DNA
                              > > > > > for this one is fascinating):
                              > > > > > http://picbreeder.org/search/showgenome.php?sid=7109
                              > > > > >
                              > > > > >
                              > > > > >
                              > > > > > Fractured regions covering a distorted surface, giving a remarkable
                              > > > > > underlying curvature:
                              > > > > > http://picbreeder.org/search/showgenome.php?sid=2684
                              > > > > >
                              > > > > >
                              > > > > > A color variant:
                              > > > http://picbreeder.org/search/showgenome.php?sid=6652
                              > > > > >
                              > > > > >
                              > > > > > When I see so many remarkable examples like these on Picbreeder, the
                              > > > > > message to me is that we have only scratched the surface of what
                              > > > CPPN
                              > > > > > encoding can teach us about representation. That's why I think it's
                              > > > > > premature (in the face of such a treasure trove of evidence) to be
                              > > > > > worrying about whether we can represent some subcomponent of such
                              > > > > > patterns with less than 6 nodes. At present I'm amazed they can be
                              > > > > > represented at all, let alone with under 6 nodes.
                              > > > > >
                              > > > > > Thanks very much for raising these issues in any case - it's a great
                              > > > > > debate to have about representation and search and goes to the heart
                              > > > of
                              > > > > > the field of GDS/indirect encoding.
                              > > > > >
                              > > > > > Best,
                              > > > > >
                              > > > > > ken
                              > > > > >
                              > > > > > --- In neat@yahoogroups.com, "shimonw" wrote:
                              > > > > > >
                              > > > > > > Hi Ken,
                              > > > > > >
                              > > > > > > Thanks for your interesting and thought-provoking comments. I'll
                              > > > give
                              > > > > > some brief reactions here, and I hope we can have a productive
                              > > > > > discussion about it at GECCO.
                              > > > > > >
                              > > > > > > 1) I won't comment for now on the new experiments you've been
                              > > > running
                              > > > > > because the details are not available yet. Instead, I will just
                              > > > comment
                              > > > > > on the various walking gait domains, because the details of the
                              > > > > > additional results you mentioned are in that case already available,
                              > > > in
                              > > > > > newly published GECCO papers.
                              > > > > > >
                              > > > > > > FIrst of all, these are very cool results, and nice success
                              > > > stories
                              > > > > > for HyperNEAT. It's great to see that the GDS community is
                              > > > continuing
                              > > > > > the push the envelope in terms of robot locomotion, which seems to
                              > > > be
                              > > > > > emerging as a nice application for indirect encodings.
                              > > > > > >
                              > > > > > > However, I would simply caution against placing all these
                              > > > variations
                              > > > > > on the walking gait task on a one-dimensional spectrum of
                              > > > difficulty.
                              > > > > > In our paper, we took the orignal walking gait task and made only
                              > > > one
                              > > > > > change (increasing the mass of the torso) which clearly makes the
                              > > > task
                              > > > > > harder. In the papers Ken mentioned, there are other differences,
                              > > > e.g.,
                              > > > > > the use of an SUPG encoding, the use of CTRNNs, the use of a
                              > > > different
                              > > > > > substrate topology, and varying the length of the legs.
                              > > > > > >
                              > > > > > > It's completely legitimate to make these changes and they don't
                              > > > > > detract from the results at all. But they are confounding factors
                              > > > when
                              > > > > > it comes to comparing HyperNEAT's performance on walking gait tasks
                              > > > of
                              > > > > > different difficulties. Some of these changes may make the task
                              > > > harder
                              > > > > > in some ways but other changes may make it easier in others ways,
                              > > > and so
                              > > > > > far we don't have experimental results that control for these
                              > > > factors.
                              > > > > > >
                              > > > > > > 2) I hope to avoid a semantic debate about what is a
                              > > > "target-based"
                              > > > > > FF. If you want to include what I was calling a "regular" FF in the
                              > > > > > scope of target-based FFs, I won't object. But my point is that,
                              > > > > > regardless of what you call them, there is an important difference
                              > > > > > between FFs that require a single specific phenotype and FFs that
                              > > > > > require only that some goal be achieved. For the latter, there may
                              > > > be
                              > > > > > many ways to skin the cat, though in any interesting problem, good
                              > > > > > solutions are still quite rare. The distinction is important
                              > > > because,
                              > > > > > while the former category is arguably only of theoretical interest,
                              > > > the
                              > > > > > second category corresponds to what I consider the most interesting
                              > > > and
                              > > > > > important class of FFs, because these are the FFs that naturally
                              > > > arise
                              > > > > > in a very large number of real-world optimization problems.
                              > > > > > >
                              > > > > > > So, using your terminology, I agree it is reasonable to say
                              > > > > > HyperNEAT's difficulties with fracture are probably limited to
                              > > > > > "target-based" FFs. But for me, this is a potentially serious
                              > > > > > restriction, since target-based FFs include not just pathological
                              > > > FFs
                              > > > > > but a large class of real-world FFs.
                              > > > > > >
                              > > > > > > Regarding the wavelet method, I would still definitely call this
                              > > > an
                              > > > > > indirect encoding, but it is in some sense "less indirect" because
                              > > > it
                              > > > > > lacks the compositional functions of HyperNEAT. It's a perfectly
                              > > > > > reasonable hypothesis that such compositionality is advantageous in
                              > > > some
                              > > > > > ways. However, if that's true, then we should be able to
                              > > > demonstrate
                              > > > > > that by finding tasks where HyperNEAT outperforms the wavelet method
                              > > > > > (FYI, in the extended analysis Thomas is conducting for his master's
                              > > > > > thesis, we have found one variation of the line-following task where
                              > > > > > this is the case). So in that sense, the wavelet method is a useful
                              > > > > > baseline, which is the whole reason we introduced it.
                              > > > > > >
                              > > > > > > 3) If it's true that it takes ~6 nodes to represent fracture, then
                              > > > I
                              > > > > > think this a potentially important point. On regular FFs, it could
                              > > > > > contribute to the difficulty of discovering fracture, since
                              > > > > > representations that have only some of the necessary components may
                              > > > not
                              > > > > > be rewarded (I think you would call this "deception"). In
                              > > > > > Picbreeder-like tasks, we already see that fracture can evolve
                              > > > anyway,
                              > > > > > but this doesn't mean it wouldn't do it even better if fracture
                              > > > required
                              > > > > > fewer nodes to represent. If we think that the solutions to many
                              > > > > > interesting problems require fracture, then I think it's at least
                              > > > > > possible that one could devise a better representation for such
                              > > > problems
                              > > > > > than the CPPNs used in HyperNEAT.
                              > > > > > >
                              > > > > > > As I understand it, your approach to dealing with "regular" FFs is
                              > > > to
                              > > > > > reformulate them "in a way that respects the need for phenotypic
                              > > > > > diversity and open-endedness." That's a very interesting approach
                              > > > and I
                              > > > > > think it has a lot of merit, but I don't think it will solve all our
                              > > > > > problems because it presupposes that such a reformulation is
                              > > > possible
                              > > > > > (or that the prior knowledge it requires is available). In some
                              > > > cases,
                              > > > > > I think we are just stuck with hard FFs and we need to develop
                              > > > better
                              > > > > > methods to deal with them.
                              > > > > > >
                              > > > > > > Regarding the idea that "an unhappy and unhealthy indirect
                              > > > encoding is
                              > > > > > one with only a strict target to which to aspire": this is a very
                              > > > > > intriguing hypothesis, but to me it is only a hypothesis. I think
                              > > > it's
                              > > > > > true of existing indirect encodings, but that for me does not in any
                              > > > way
                              > > > > > rule out the possibility of developing a different indirect encoding
                              > > > for
                              > > > > > which that is not true, and I think that's one thing we should be
                              > > > > > working on. One of the reasons that I'm interested in indirect
                              > > > > > encodings is that I strongly believe this can be done.
                              > > > > > >
                              > > > > > > Cheers,
                              > > > > > > Shimon
                              > > > > > >
                              > > > > >
                              > > > >
                              > > >
                              > >
                              >
                            Your message has been successfully submitted and would be delivered to recipients shortly.