Re: CPPN IO encoding for fully connected/recurrent networks using HyperNEAT
- Hi Oliver, interesting topic. I am not aware of any published work with such fully-connected recurrent grid-like structures. Just for thinking about it, you might even ask the same for a 2D recurrent grid to simplify the issue without really changing the fundamental question.
One problem with such a structure that is not exactly the same as the one you raised is the "spaghetti" type of connectivity that you can get if anything can connect to literally anything. That can probably be reduced effectively with the HyperNEAT-LEO (Link Expression Output) that we recently introduced, which helps reduce connectivity:
However, of course, you are raising a bigger issue about how to best structure a CPPN to encode such a pattern in general. There may be a clever way to do it, i.e. something analogous to the "one output per layer" concept, but modified to be more efficient and thereby amenable to large recurrent networks. I'd leave that possibility open, since I don't have an idea for it right now.
But my guess is that what is really instrumental in this discussion is the issue of what is "easier to generate" with certain kinds of CPPNs. You point out that it seems "easier" to generate divergent layer patterns if they are represented by different CPPN outputs. But the word "easier" here is relative to a particular fitness function. What it really is saying is that the stepping stones that might lead to divergent patterns are somehow not rewarded (and therefore not preserved) when there is a single output.
However, that does not really mean that such stepping stones don't exist, or that they are even in principle "hard" to discover. It's just that the fitness function doesn't recognize them. So I think the real trick here might be with how stepping stones are rewarded rather than with manipulating the encoding, though of course there's still room for that. But if you look for example at CPPN-generated images on Picbreeder, you can see that areas of differentiation (even hard differentiation) emerge consistently. You can get one part of an image doing one thing, and the other doing something else. Yet I think the problem is that if you look at the ancestry of such images, you will find the ancestors hardly resemble their descendents (at least visually). So we're really facing a stepping stone problem, or more fundamentally, the paradox of objectives in general.
So I'd guess that single-output (or few output) CPPNs could actually produce the kind of differentiated layer structures we'd want in such networks, but that the path to them is likely not going to be followed by a typical objective function. Some kind of non-objective search that is more open-ended is more likely to accumulate the prerequisites we would need.
This perspective does not offer a solution so much as pose the question a different way. Instead of a question of how to structure the CPPN, perhaps the question is, what kind of non-objective process can lead to such structures emerging systematically?
--- In firstname.lastname@example.org, "olivercoleman04" <oliver.coleman@...> wrote:
> I've been thinking about methods for evolving recurrent networks or network modules using HyperNEAT, and have been wondering about the input and output representation used by the CPPN.
> Most papers that evolve networks with multiple layers use 2D layers (and so 2 inputs for the CPPN) and an output in the CPPN per every pair of connected layers (call it a weight layer). An alternative is to use an additional input to specify the connection layer, or z-axis coordinates; it's just another dimension orthogonal to the two dimensions of the layers. Or if you're allowing connections between arbitrary layers (rather than just the next layer), then you would use two additional inputs to specify the z-axis coordinates for source and target neurons.
> An output for each weight layer, as is done in most/all published experiments to date, seems to make sense because we generally want quite different patterns of weights between different pairs of layers (even if different connection layer patterns are geometrically correlated they are still typically quite different), and it seems to be easier to generate CPPNs that produce very different (but correlated) patterns from different outputs than it is to generate CPPNs that produce very different patterns depending on the value of an input (at least, I tried both encoding schemes and this is what I found).
> Getting back to recurrent networks; what does the above mean for a collection of neurons arranged in some N-dimensional space that are potentially fully recurrent/connected? Let's say the substrate neurons are arranged in a 3-dimensional space. We would create CPPNS with 6 inputs (2 for each dimension for source and target neuron coordinates) and one output. However, allowing recurrent connections essentially means that we have many different 2D (and 1D) layers connecting to many other layers (both parallel and orthogonal in two ways), so according to the above discussion perhaps we need an output for each possible combination of layers? This doesn't seem like a very good solution as we would quickly end up with vast numbers of outputs.
> So how could we make it easier to generate CPPNs that have an input (or some constant number of inputs) for each dimension and one output for encoding fully connected/recurrent substrate networks? If we assume the reason having multiple outputs works is because the CPPN can easily encode largely disparate (but correlated) functions for each output, the challenge for a single output is to enable "piping" or multiplexing largely disparate functions through it. Perhaps one method is to introduce a transfer function available to the CPPN that produces a high/on output only for a specific range of input value, and low/off output for everything else. Seems plausible, except we already commonly use tthe Gaussian function with a bias, which can already perform this function (especially when combined serially with a step function). Perhaps introducing a specialised transfer function (eg akin to a Gaussian and step function in series, or something more like an entire multiplexer in a node) for this task would help things along?
> Thoughts and/or experimental results, anyone? :)