Sorry, an error occurred while loading the content.

## Re: [neat] Re: IEX musings

Expand Messages
• ... [thinking cap on] With a 3x3 grid there are 14 different possible sets of inputs right, so 9 individual squares + 4 (groups of 4) + 1 for the whole 9
Message 1 of 15 , Sep 1, 2004
John Arrowwood wrote:

>>From: Colin Green <cgreen@...>
>>
>>John Arrowwood wrote:
>>
>>
>>
>>> <>If someone else
>>> would be willing to give this experiment a try with their version of
>>> NEAT
>>> (just this 3x3 grid prediction) and let it run for a few days to see if
>>>
>>The problem seems straight forward enough. Yeh OK I'll have a look at
>>implementing it in SharpNEAT. What do you recon, should I used a fixed
>>number of activations per evaluation or should I relax the network -
>>which allows for open-ended recursiveness. Maybe I'll try a few
>>variations and see what happens eh. I'll try and kick it off tomorrow eve.
>>
>>
>
>Cool. Thanks!
>
>Yeah, it's pretty straight forward. I'll upload mux.pl to the group web
>site so you can have a reference to make sure you are using the same inputs
>and the same fitness function.
>
>
[thinking cap on] With a 3x3 grid there are 14 different possible sets
of inputs right, so 9 individual squares + 4 (groups of 4) + 1 for the
whole 9 together. With such a small number of possibilities it's highly
likely that a solution will be found by classifying each of these 14
groups, e.g. 14 hidden nodes each representing one of the groups.
However this classification approach doesn't scale well, with a 4x4 grid
the number of possible groupings becomes 30 (hope i got that right) and
with a 5x5 there are 55, so we can see that this will quickly become an
unfeasable technique.

My point is that this problem will probably be solved using a structrue
that takes advantage of the small number of inputs and that wouldn't
work with a larger number of inputs. Anyway I'll give it a go and try to
code it so I can vary the grid size and experiment with larger grids as
well.

Colin.
• ... Here are some results using the very basic test domain John described. This is after about 40k gens, say 30-45 mins (not keeping track of time)... Coords
Message 2 of 15 , Sep 1, 2004
Colin Green wrote:

>[...]
>My point is that this problem will probably be solved using a structrue
>that takes advantage of the small number of inputs and that wouldn't
>work with a larger number of inputs.
>
>
Here are some results using the very basic test domain John described.
This is after about 40k gens, say 30-45 mins (not keeping track of time)...

Coords
(0.0, 0.0)(0.3, 0.3) expected= 1.00, actual=1.00, diff=0.000
(0.3, 0.0)(0.7, 0.3) expected= 2.00, actual=2.00, diff=0.003
(0.7, 0.0)(1.0, 0.3) expected= 3.00, actual=2.94, diff=0.057
(0.0, 0.3)(0.3, 0.7) expected= 2.00, actual=2.00, diff=0.000
(0.3, 0.3)(0.7, 0.7) expected= 2.00, actual=1.99, diff=0.008
(0.7, 0.3)(1.0, 0.7) expected= 3.00, actual=3.00, diff=0.002
(0.0, 0.7)(0.3, 1.0) expected= 3.00, actual=2.99, diff=0.008
(0.3, 0.7)(0.7, 1.0) expected= 3.00, actual=3.00, diff=0.003
(0.7, 0.7)(1.0, 1.0) expected= 3.00, actual=3.05, diff=0.049
(0.0, 0.0)(0.7, 0.7) expected= 1.75, actual=1.77, diff=0.017
(0.3, 0.0)(1.0, 0.7) expected= 2.50, actual=2.60, diff=0.103
(0.0, 0.3)(0.7, 1.0) expected= 2.50, actual=2.48, diff=0.024
(0.3, 0.3)(1.0, 1.0) expected= 2.75, actual=2.65, diff=0.103
(0.0, 0.0)(1.0, 1.0) expected= 2.44, actual=2.46, diff=0.013

The figures are rounded so don't worry about the 0.7, that is two thirds :)

The answers are pretty close, the network in question has 20 hidden
nodes and 58 conections. Given that there are only 14 test cases I would
say that this network has just learned to give the correct answers
rather than actually calculating the average of the inputs. So we can
address this problem by varying the test data and/or using a larger grid
size - but for now I would say your NEAT code has some problems if it's
not finding a solution. A couple of quick questions - what activation
fn are you using? and are you scaling the outputs to the expected range?

Colin
• ... Err, you are using neurones which can output more than 1? Or scaling? Also, do you have bias on your nodes, I expect you do, but I thought I d check
Message 3 of 15 , Sep 2, 2004
At 17:17 31/08/2004, John wrote:
>So I am currently running an experiment that just learns to handle
>
>1 2 3
>2 2 3
>3 3 3
>
>(0,0 - 1/3,1/3) = 1
>(1/3, 0 - 2/3, 1/3 ) = 2
>(2/3, 0 - 1, 1/3 ) = 3

Err, you are using neurones which can output more than 1? Or scaling?

Also, do you have bias on your nodes, I expect you do, but I thought I'd
check because this definitely needs it.

If you are interested (and it may be getting a bit off base). Does
inputting the coords as row and column selectors help?

E.g. one input for column one, one for column two, one for column three,
only one ever turned on at a time...

ditto on the rows

Then I guess you need to double that up to give you bottom-left and top-right.

Actually that's another thought. How about trying to index a single cell
with one coordinate as a first step?

Ian

Living@Home - Open Source Evolving Organisms -
http://livingathome.sourceforge.net/
• ... Good catch. The output neuron is a summation node, not a sigmoid. ... Yep! ... It might, but it won t help with where I want to get to. For enlargement,
Message 4 of 15 , Sep 2, 2004
>
>At 17:17 31/08/2004, John wrote:
> >So I am currently running an experiment that just learns to handle
> >
> >1 2 3
> >2 2 3
> >3 3 3
> >
> >(0,0 - 1/3,1/3) = 1
> >(1/3, 0 - 2/3, 1/3 ) = 2
> >(2/3, 0 - 1, 1/3 ) = 3
>
>Err, you are using neurones which can output more than 1? Or scaling?

Good catch. The output neuron is a summation node, not a sigmoid.

>Also, do you have bias on your nodes, I expect you do, but I thought I'd
>check because this definitely needs it.

Yep!

>If you are interested (and it may be getting a bit off base). Does
>inputting the coords as row and column selectors help?

It might, but it won't help with where I want to get to. For enlargement, I
want to be able to select an area the is smaller than one of the input
pixels, and that is not centered in one of those pixels.

+----+----+----+
| .| | |
+----+----+----+
| | | |
+----+----+----+
| | | |
+----+----+----+

For example, in the above illustration, the area occupied by the little
black dot is much smaller than the whole square it sits in, and it is
off-center of that square. Specifying the inputs as columns or rows being
either active or inactive wouldn't allow me that flexibility. For
enlargement, I *need* to be able to specify any area of any size at any
location, both larger and smaller than the original pixels, and have it
return the average intensity of the 'function' of which the inputs are a
sampling. The job of the network is to interpolate the function and then
allow me to query that graph as a continuous function.

There is a way of doing that already. They do it with something called
'splines'. And it works, it's just very computationally intensive. And
it's not quite as good as I have seen a neural net be. But the neural net
also could only do a fixed enlargement, not a smooth and unlimited scaling,
like splines. I'm trying to get the best of both worlds...if it can be
done.

-- John
• ... Agreed. So the next step is to evaluate it multiple times per generation with different inputs each time. Assign the 9 pixel inputs a random number from
Message 5 of 15 , Sep 2, 2004
>From: Colin Green <cgreen@...>
>
>Colin Green wrote:
>
>Here are some results using the very basic test domain John described.
>This is after about 40k gens, say 30-45 mins (not keeping track of time)...
>
>Coords
>(0.0, 0.0)(0.3, 0.3) expected= 1.00, actual=1.00, diff=0.000
>(0.3, 0.0)(0.7, 0.3) expected= 2.00, actual=2.00, diff=0.003
>(0.7, 0.0)(1.0, 0.3) expected= 3.00, actual=2.94, diff=0.057
>(0.0, 0.3)(0.3, 0.7) expected= 2.00, actual=2.00, diff=0.000
>(0.3, 0.3)(0.7, 0.7) expected= 2.00, actual=1.99, diff=0.008
>(0.7, 0.3)(1.0, 0.7) expected= 3.00, actual=3.00, diff=0.002
>(0.0, 0.7)(0.3, 1.0) expected= 3.00, actual=2.99, diff=0.008
>(0.3, 0.7)(0.7, 1.0) expected= 3.00, actual=3.00, diff=0.003
>(0.7, 0.7)(1.0, 1.0) expected= 3.00, actual=3.05, diff=0.049
>(0.0, 0.0)(0.7, 0.7) expected= 1.75, actual=1.77, diff=0.017
>(0.3, 0.0)(1.0, 0.7) expected= 2.50, actual=2.60, diff=0.103
>(0.0, 0.3)(0.7, 1.0) expected= 2.50, actual=2.48, diff=0.024
>(0.3, 0.3)(1.0, 1.0) expected= 2.75, actual=2.65, diff=0.103
>(0.0, 0.0)(1.0, 1.0) expected= 2.44, actual=2.46, diff=0.013
>
>
>The figures are rounded so don't worry about the 0.7, that is two thirds :)
>
>The answers are pretty close, the network in question has 20 hidden
>nodes and 58 conections. Given that there are only 14 test cases I would
>say that this network has just learned to give the correct answers
>rather than actually calculating the average of the inputs.

Agreed. So the next step is to evaluate it multiple times per generation
with different inputs each time. Assign the 9 pixel inputs a random number
from -1 to 1. Then calculate the expected outputs based on those random
inputs. Calculate each network against the same 100 different random inputs
per generation, but each generation select a different set of test
conditions.

>but for now I would say your NEAT code has some problems if it's
>not finding a solution.

It did have a problem, actually, which I fixed. But remember, my code is
(by design) slower. It tries to be very thorough about the search. That
means it will take longer, in terms of generations, before it finds a
solution. But it is unlikely to fail to find a solution if one exists.
However, I'm still contemplating how I can speed it up without unnecessarily
increasing the risk of not finding a solution. It's easier to find ways of
making it MORE thorough (=slower) than it is to find shortcuts and
equivalences that let us reduce the computational complexity of the search.

>A couple of quick questions - what activation
>fn are you using?

For the hidden nodes, sigmoid. For the output node, a simple summation.
That way the output can be greater than 1.

>and are you scaling the outputs to the expected range?

No. The use of a summation node for the output should take care of that for
me.
• ... handle ... scaling? ... Is there any particular reason you don t use a sigmoid at the outputs. After all, output can always be scaled to any range. But
Message 6 of 15 , Sep 2, 2004
--- In neat@yahoogroups.com, "John Arrowwood" <jarrowwx@h...> wrote:
> >
> >At 17:17 31/08/2004, John wrote:
> > >So I am currently running an experiment that just learns to
handle
> > >
> > >1 2 3
> > >2 2 3
> > >3 3 3
> > >
> > >(0,0 - 1/3,1/3) = 1
> > >(1/3, 0 - 2/3, 1/3 ) = 2
> > >(2/3, 0 - 1, 1/3 ) = 3
> >
> >Err, you are using neurones which can output more than 1? Or
scaling?
>
> Good catch. The output neuron is a summation node, not a sigmoid.
>

Is there any particular reason you don't use a sigmoid at the
outputs. After all, output can always be scaled to any range. But
if you don't use a sigmoid, then your output has an arbitrary range,
and it might be difficult for the bias to be effective in such a
situation, though then it again it might not matter.

Also, are you capping your link weights at a max and min value?

ken
• ... The raw inputs are scaled before entering them into the network. The output is then scaled back by reversing the scaling using the same parameters.
Message 7 of 15 , Sep 2, 2004
>From: "Kenneth Stanley" <kstanley@...>
> > >Err, you are using neurones which can output more than 1? Or
>scaling?
> >
> > Good catch. The output neuron is a summation node, not a sigmoid.
> >
>
>Is there any particular reason you don't use a sigmoid at the
>outputs. After all, output can always be scaled to any range.

The raw inputs are scaled before entering them into the network. The output
is then scaled back by reversing the scaling using the same parameters.
However, since the inputs represent an 'average' of the light intensity
within a given area, it is possible that areas within that region may be
brighter or dimmer than the average, and indeed may be brighter or dimmer
than the dynamic range of inputs allows for. Thus, in order for the output
to be able to handle this condition, the output must be at a slightly
different scale than the inputs.

There are, I suppose, two different ways of handling this. You could use
either a summation node for the output, which is what I have done, and what
the HP team did with their experiments. Or you can use a different (2x)
range for the output scaling. Then, during training it can learn to limit
its output to within the appropriate range, but still retain the ability to
go outside that range when necessary.

I would suggest that for the output node, either one should work equally
well. If the network learns to keep its outputs in the right range for the
sigmoid function (which is essentially linear within that range), it can
learn to keep it in the right range for the summation function (which is
entirely linear). And since the summation function is less expensive, there
is no reason not to use it. I would not make the same argument for the
hidden nodes, of course.

>But
>if you don't use a sigmoid, then your output has an arbitrary range,
>and it might be difficult for the bias to be effective in such a
>situation, though then it again it might not matter.

I don't know that the bias really applies that much for the output. Though
it is present.

>Also, are you capping your link weights at a max and min value?

Yes. +/- 10. And sometimes it hits the cap. Other times, it stays well
below.

-- John
• You said this before but I was too busy to pick up on it. ... Well, they interpolate using a straight line or a cubic curve which are the simplest things you
Message 8 of 15 , Sep 9, 2004
You said this before but I was too busy to pick up on it.

>There is a way of doing that already. They do it with something called
>'splines'.

Well, they interpolate using a straight line or a "cubic" curve which are
the simplest things you can call splines.

> And it works, it's just very computationally intensive.

They aren't as cpu intensive as evaluating a neural network or any size...

"linear" involves calculating:

a*f + b*(1-f) three times per pixel and cubic is something like:

a * f^3 + b * f(1-f)^2 + c * f^2*(1-f) + d * (1-f)^3

> And
>it's not quite as good as I have seen a neural net be. But the neural net
>also could only do a fixed enlargement, not a smooth and unlimited scaling,
>like splines. I'm trying to get the best of both worlds...if it can be
>done.

Right, this is where you may not be comparing apples with apples. The
"spline" techniques are merely interpolation. They are not trying to
introduce any missing information into the enlarged image. They're just
trying to interpolate values between the original pixels in clever
ways. If that's what you want to do, then I you need to form your
experiment a little differently (by training for the ability to reproduce
some sort of high-end interpolation).

However, from our previous discussions, you want networks which will
"estimate" the missing values, not merely by maths from the surrounding
pixels, but by building in some sort of assumptions about the picture
domain (yes?).

The more I think about that, the more I wonder if this whole "multiplexing"
angle is a serious impediment. e.g. with you experiment phrased the way it
is, the net is going to need a way to focus its attention in the area
indicated by the coordinates. Even putting aside the idea of that being a
region (TL - BR) and thinking about a point, the network really has its
work cut out. e.g. for any point (x,y) it is going to have to consider
some input values around the point (say 5 or 6 of them) and for each of
those values it will need a complete copy of the multiplexer (maybe 100 -
200 neurones). So that's around 1000 neurones, _before_ building in any
assumptions about the image content...

I'm not saying such a system cannot be evolved, I'm just wondering whether,
even with the clever staged fitness scheme you have, the entry probability
barrier is too steep.

I know it's not to you liking, but how about abandoning coordinates as a
form of input data and trying to evolve a network which generates a fixed
interpolation when given a small region around it?

e.g. you could have a net which, given:

X X X X
X X.X X
X X X X

("X" is any existing pixel)

Interpolates the value for the dotted location (e.g. half way between two
existing pixels).

Repeated applications of that serve to double the X dimension (stage I).

X.X.X.X
X.X.X.X
X.X.X.X

Rotate it and you can double the Y dimension (stage J):

X X X X
. . . .
X X X X
. . . .
X X X X
. . . .
X X X X

Combine both of those for:

X.X.X.X
. . . .
X.X.X.X
. . . .
X.X.X.X
. . . .
X.X.X.X

And now you just have 25% of the pixels missing. Apply it again (stage
"K") ...

And you can repeat the whole thing for more enlargement.

Now, you are bulding errors up errors here (because you later use outputs
as inputs to the next application). But once you have a single network
with some degree of functionality, you can clone it into a separate
population and start specialising that population to a part of the
process. e.g. Stage K is clearly a different process to I and J so a first
stage might be to break that out into a different network. After than,
well, are I and J really the same, or is a rotated image losing something
which could be gained from the fact that the light is generally from above...

Ian

(From a signal processing point of view, spline interpolation is naive and
introduces new frequencies which were never in the original image
(frequencies across the image, that is). There are those who advocate
changing image resolution by transforming the image into the frequency
domain (FFT), and resampling at the new resolution. Now that's CPU intensive.)

Living@Home - Open Source Evolving Organisms -
http://livingathome.sourceforge.net/
Your message has been successfully submitted and would be delivered to recipients shortly.