Loading ...
Sorry, an error occurred while loading the content.

6380Re: [neat] New E-print: Neuroevolution Offers New Approach to Deep Learning

Expand Messages
  • kenstanley01
    Jul 18, 2014
    • 0 Attachment
      Hi Josh, thanks for the link to your paper and for highlighting that it also aims to evolve features.  It does appear that the DDFA idea could be complementary to what you're doing (as you note) in the sense that you reward "utility," which could lead to convergence (i.e. a lack of diversity).  Given that feature utility is in effect an objective-based target, It makes sense that you would likely end up seeing (as you say you do) a lack of diversity.  After all, just because a feature has higher utility than other features at a given moment in the search does not mean that those lower-utility features are not stepping stones to other features in the future that might have high utility.  It's just another instance of the usual objective paradox.  So it does seem like DDFA could fit in here.

      I had not been aware of Jan Koutnik's papers until a couple days ago when I spoke to him at GECCO.  You're right that there is a common insight between his work and ours, which that it makes sense to seek a set of diverse features.  However, the difference is in how we go about doing that.  In his approach, he charges evolution with finding all the features at once while maximizing their diversity through the fitness function f=min(D) + mean(D).  In contrast, DDFA evolves each feature as a unique member of the population and uses novelty (instead of attempting to maximize a fitness function) to seek new features.  The intended effect of both approaches is similar, so the question of which makes the most sense will depend on their effectiveness.

      One concrete difference is that  DDFA does not have to specify a priori what the number of features are, so it acts as a genuine accumulator that can keep adding more divergent features as long as you want.  However, perhaps more interesting is the deeper issue of the effect of trying to maximize f=min(D) + mean(D).  Notice that by trying the maximize this expression of diversity, Jan's approach is in effect also an objective-driven search (where the objective is maximum diversity within a single individual).  Therefore, it should also be subject to deception in the same way as other objective-based searches.  That is, a vector of weights that scores higher on feature diversity than others in the current population may not actually be a stepping stone to a much higher feature diversity, so you can get stuck.  The higher dimensionality of evolving all features simultaneously is also a possible impediment - mutations that increase diversity in one or more features become more likely to be accompanied by mutations that lower the diversity of others in the same genome the larger the genome gets.  

      Yet you also point out that Jan's approach can act as a compressor whereas DDFA is not really about compressing the feature space.  That is an interesting distinction.  However, I think DDFA could be used to compress the feature space if desired simply by selecting from its archive a maximally-spaced sample of whatever density you desire.  But this issue of compression deserves continued discussion - there are likely cases where it is interesting to evolve all the features at once as well.  So we should not simply dismiss the idea.

      In any case, I'd speculate that overall, as a practical matter, despite similar underlying motivational insights, the DDFA approach at the moment is potentially more flexible about not getting stuck in the search for diverse features.  The whole idea of searching for diverse features appears to be just another problem where you can go about it by searching for novelty or searching objectively and perhaps novelty is more effective here, but there is one very important and unique distinction in this case:  Here, our goal is explicitly to accumulate diversity, so unlike e.g. in robot maze navigation (where a solution through novelty comes as a side effect), novelty search is perfectly and explicitly aligned with what we are trying to do. I think that is an important signal of things to come - when you need to accumulate a diverse repertoire of something (especially something high-dimensional like feature-response vectors), a pressure towards novelty of some sort makes practical sense.

      You also note that in Jan's case he is using the evolved features as input into a recurrent network whereas in our example we used them in a classification problem, but I don't think that distinction is really important here.  He could just as easily input his features into a classification problem and we could input ours into a controller.   Perhaps having only a few features (i.e. compressed) makes more sense in one domain or another, but it's not really clear.  It may be a moot point though if you can select the density of DDFA features you want anyway (since in that case either one could get you a desired level of compression).  So I think the main issue here is what is the best way to accumulate diverse features; once you know how to do that, you can feed them into virtually anything that benefits from learning off such features.


    • Show all 5 messages in this topic