6382Re: [neat] New E-print: Neuroevolution Offers New Approach to Deep Learning
- Jul 26, 2014Hi Ken,
Sorry for the delayed responses, but I have been busy traveling / attending SAB this week.
Thank you for highlighting the difference between your work and Jan's. He was in attendance at SAB and I had some good conversations with him about this work and other topics.
Your approach with DDFA is actually more similar to what I am doing with OEELMs in that the units of evolution are the individual features, and not the entire network itself. I do really like this general idea of maximizing what amounts to the behavioral diversity of your features, and see a lot of promise here. I also see a lot of open questions in determining what is the best way to do this, especially for online learning tasks where you do not have a large, pre-collected training set to compute your diversity against (which also seems to be a very slow process). I have some ideas here, but do not feel comfortably sharing them publicly yet.
As for the question of dimensionality expansion vs. compression, do you see any reason why your suggested approach of first evolving a large set of diverse features and then pruning this down to a given dimension would not work? E.g. might there be advantages to knowing your desired level of compression a priori?
Finally, I am wondering if you have any insights into how DDFA might be applied to something like the convolutional neural nets that Jan is using? In that case you have massive weight sharing in your network, so the features are not really independent entities as they are in a standard MLP.
Will you be in New York next week for ALife? If so, I am looking forward to discussing this more with you offline.
On 07/19/2014 01:33 AM, kstanley@... [neat] wrote:
Hi Josh, thanks for the link to your paper and for highlighting that it also aims to evolve features. It does appear that the DDFA idea could be complementary to what you're doing (as you note) in the sense that you reward "utility," which could lead to convergence (i.e. a lack of diversity). Given that feature utility is in effect an objective-based target, It makes sense that you would likely end up seeing (as you say you do) a lack of diversity. After all, just because a feature has higher utility than other features at a given moment in the search does not mean that those lower-utility features are not stepping stones to other features in the future that might have high utility. It's just another instance of the usual objective paradox. So it does seem like DDFA could fit in here.I had not been aware of Jan Koutnik's papers until a couple days ago when I spoke to him at GECCO. You'r e right that there is a common insight between his work and ours, which that it makes sense to seek a set of diverse features. However, the difference is in how we go about doing that. In his approach, he charges evolution with finding all the features at once while maximizing their diversity through the fitness function f=min(D) + mean(D). In contrast, DDFA evolves each feature as a unique member of the population and uses novelty (instead of attempting to maximize a fitness function) to seek new features. The intended effect of both approaches is similar, so the question of which makes the most sense will depend on their effectiveness.One concrete difference is that DDFA does not have to specify a priori what the number of features are, so it acts as a genuine accumulator that can keep adding more divergent features as long as you want. However, perhaps more interesting is the deeper issue of the effect of tryin g to maximize f=min(D) + mean(D). Notice that by trying the maximize this expression of diversity, Jan's approach is in effect also an objective-driven search (where the objective is maximum diversity within a single individual). Therefore, it should also be subject to deception in the same way as other objective-based searches. That is, a vector of weights that scores higher on feature diversity than others in the current population may not actually be a stepping stone to a much higher feature diversity, so you can get stuck. The higher dimensionality of evolving all features simultaneously is also a possible impediment - mutations that increase diversity in one or more features become more likely to be accompanied by mutations that lower the diversity of others in the same genome the larger the genome gets.
Yet you also point out that Jan's approach can act as a compressor whereas DDFA is not really about compressing the feature space. That is an interesting distinction. However, I think DDFA could be used to compress the feature space if desired simply by selecting from its archive a maximally-spaced sample of whatever density you desire. But this issue of compression deserves continued discussion - there are likely cases where it is interesting to evolve all the features at once as well. So we should not simply dismiss the idea.
In any case, I'd speculate that overall, as a practical matter, despite similar underlying motivational insights, the DDFA approach at the moment is potentially more flexible about not getting stuck in the search for diverse features. The whole idea of searching for diverse features appears to be just another problem where you can go about it by searching for novelty or searching objectively and perhaps novelty is more effective here, but there is one very important and unique distinction in this case: Here, our goal is explicitly to accumulate diversity, so unlike e.g. in robot maze navigation (where a solution through novelty comes as a side effect), novelty search is perfectly and explicitly aligned with what we are trying to do. I think that is an important signal of things to come - when you need to accumulate a diverse repertoire of something (especially something high-dimensional like feature-response vectors), a pressure towards novelty of some sort makes practical sense.
You also note that in Jan's case he is using the evolved features as input into a recurrent network whereas in our example we used them in a classification problem, but I don't think that distinctio n is really important here. He could just as easily input his features into a classification problem and we could input ours into a controller. Perhaps having only a few features (i.e. compressed) makes more sense in one domain or another, but it's not really clear. It may be a moot point though if you can select the density of DDFA features you want anyway (since in that case either one could get you a desired level of compression). So I think the main issue here is what is the best way to accumulate diverse features; once you know how to do that, you can feed them into virtually anything that benefits from learning off such features.Best,ken
- << Previous post in topic Next post in topic >>