6378Re: [neat] New E-print: Neuroevolution Offers New Approach to Deep Learning
- Jul 17, 2014Hi Ken,
Thank you for sharing. This is a very interesting idea, and I look forward to speaking with you about it more at ALife.
I have a paper at ALife that is also about evolving features, but my approach is geared toward an online learning framework where selection is informed by the current features' "utility" at solving the given learning problem. Since we start with an extreme learning machine and then evolve the features online, we call this approach an Online Extreme Evolutionary Learning Machine (OEELM):
Online Extreme Evolutionary Learning Machines. J. Auerbach, C. Fernando and D. Floreano.
Artificial Life 14: International Conference on the Synthesis and Simulation of Living Systems, New York, NY, USA, July 30-August 2, 2014.
I am quite interested in how our approach might be augmented by additionally searching for divergent features as you do here, especially because I see a major limitation of OEELMs, as they currently stand, to be the lack of feature diversity in more challenging problems.
On another note, I just (this week) came across some recent papers from Koutnik, Schmidhuber and Gomez that seems to be doing something very similar to your DDFA (evolving diverse features):
- Jan Koutnik, Juergen Schmidhuber and Faustino Gomez, Online Evolution of Deep Convolutional Network for Vision-Based Reinforcement Learning Proceedings of the Simulation of Adaptive Behavior Conference (SAB, Castellon, ES) , 2014
- Jan Koutnik, Juergen Schmidhuber and Faustino Gomez, Evolving Deep Unsupervised Convolutional Networks for Vision-Based Reinforcement Learning, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO, Vancouver, CA) , 2014
Are you familiar with this work? Can you comment on how it compares to your approach? One thing is that they are intentionally using this feature space as a compressed representation, whereas you use it to accumulate a growing set of features. Forming a compressed representation seems more useful to me if you want to use this as the input to a controller (policy) network as they do, but perhaps for other uses (such as classification) having novel dimensionality expansions is useful.
I am very interested to see what you have to say about any of this!
On 06/11/2014 04:56 AM, kstanley@... [neat] wrote:
My coauthors Paul Szerlip, Greg Morse, Justin Pugh, and myself are excited to announce our new arXiv e-print, "Unsupervised Feature Learning through Divergent Discriminative Feature Accumulation":
arXiv link: http://arxiv.org/abs/1406.1833
While this paper is not yet published in a journal or conference, because we think its implications are broad for the neuroevolution community and beyond, we decided it's important to share it now. "Unsupervised feature learning" has become a very popular research area in recent years with the rise of "deep learni ng." In fact, the field of deep learning has a whole subfield dedicated to this topic, usually centered on the idea of pre-training layers of a future classifier network, often through an autoencoder or related technique. The autoencoder is often viewed as the key piece of unsupervised apparatus for learning features without the need for labelled data. Often it is argued that pretraining a network in this way sets it up for increased success later on when training more conventionally (e.g. with backprop) on a classification problem.
We realized recently that there is an appealing alternative to autoencoders that derives much of its power from recent progress in the field neuroevolution. This alternative, called "divergent discriminative feature accumulation" (DDFA), basically uses novelty search to accumulate a continual stream of novel discriminative features. In other words , novelty search is the feature learning algorithm (and in the paper, HyperNEAT represents the features). This setup provides an entirely new perspective on feature learning that is quite different from autoencoders. For example, it can run indefinitely and keep accumulating new features, which means you don't need to know how many features are needed when you start the search. It also does not converge because novelty search is divergent, so it just keeps on going. It further benefits from being non-objective, so the representations of features you get out of it are likely more evolvable (i.e. better representations). On top of all that, it benefits from the geometric capabilities of HyperNEAT. I think it also offers an interesting new way to think about learning creatively through a divergent process. After all, divergent thought is often attributed to the most creative people. This algorithm literally accumulates new per spectives on the world divergently.
We tried running DDFA on MNIST by generating a bunch of features (3,000 in the larger case) and then training a classifier on top of them with simple backprop (similarly to procedures with autoencoders in deep learning). With only a simple one-hidden-layer network with none of the usual tricks used in deep learning (i.e. no preprocessing, regularization, special activation functions, dropout, etc.) DDFA was able to achieve 1.25% error on MNIST. To give perspective on that, Hinton's original 4-layer deep network achieved 1.2% with a much deeper network. One big conclusion for this group is that neuroevolution can contribute meaningfully to deep learning and has a lot to offer, and that we can indeed achieve serious competitive results. More broadly, the results raise interesting questions for the broader field of machine learning, like whether optimization (i.e. mi nimizing error) is really always the best way to think about learning, and whether sometimes divergence is more powerful than convergence.
There are a lot of interesting future possibilities for DDFA and we are happy to hear your thoughts!
- << Previous post in topic Next post in topic >>