Loading ...
Sorry, an error occurred while loading the content.
 

Re: [neat] Re: IEX musings

Expand Messages
  • Colin Green
    ... [thinking cap on] With a 3x3 grid there are 14 different possible sets of inputs right, so 9 individual squares + 4 (groups of 4) + 1 for the whole 9
    Message 1 of 15 , Sep 1, 2004
      John Arrowwood wrote:

      >>From: Colin Green <cgreen@...>
      >>
      >>John Arrowwood wrote:
      >>
      >>
      >>
      >>> <>If someone else
      >>> would be willing to give this experiment a try with their version of
      >>> NEAT
      >>> (just this 3x3 grid prediction) and let it run for a few days to see if
      >>>
      >>The problem seems straight forward enough. Yeh OK I'll have a look at
      >>implementing it in SharpNEAT. What do you recon, should I used a fixed
      >>number of activations per evaluation or should I relax the network -
      >>which allows for open-ended recursiveness. Maybe I'll try a few
      >>variations and see what happens eh. I'll try and kick it off tomorrow eve.
      >>
      >>
      >
      >Cool. Thanks!
      >
      >Yeah, it's pretty straight forward. I'll upload mux.pl to the group web
      >site so you can have a reference to make sure you are using the same inputs
      >and the same fitness function.
      >
      >
      [thinking cap on] With a 3x3 grid there are 14 different possible sets
      of inputs right, so 9 individual squares + 4 (groups of 4) + 1 for the
      whole 9 together. With such a small number of possibilities it's highly
      likely that a solution will be found by classifying each of these 14
      groups, e.g. 14 hidden nodes each representing one of the groups.
      However this classification approach doesn't scale well, with a 4x4 grid
      the number of possible groupings becomes 30 (hope i got that right) and
      with a 5x5 there are 55, so we can see that this will quickly become an
      unfeasable technique.

      My point is that this problem will probably be solved using a structrue
      that takes advantage of the small number of inputs and that wouldn't
      work with a larger number of inputs. Anyway I'll give it a go and try to
      code it so I can vary the grid size and experiment with larger grids as
      well.

      Colin.
    • Colin Green
      ... Here are some results using the very basic test domain John described. This is after about 40k gens, say 30-45 mins (not keeping track of time)... Coords
      Message 2 of 15 , Sep 1, 2004
        Colin Green wrote:

        >[...]
        >My point is that this problem will probably be solved using a structrue
        >that takes advantage of the small number of inputs and that wouldn't
        >work with a larger number of inputs.
        >
        >
        Here are some results using the very basic test domain John described.
        This is after about 40k gens, say 30-45 mins (not keeping track of time)...

        Coords
        (0.0, 0.0)(0.3, 0.3) expected= 1.00, actual=1.00, diff=0.000
        (0.3, 0.0)(0.7, 0.3) expected= 2.00, actual=2.00, diff=0.003
        (0.7, 0.0)(1.0, 0.3) expected= 3.00, actual=2.94, diff=0.057
        (0.0, 0.3)(0.3, 0.7) expected= 2.00, actual=2.00, diff=0.000
        (0.3, 0.3)(0.7, 0.7) expected= 2.00, actual=1.99, diff=0.008
        (0.7, 0.3)(1.0, 0.7) expected= 3.00, actual=3.00, diff=0.002
        (0.0, 0.7)(0.3, 1.0) expected= 3.00, actual=2.99, diff=0.008
        (0.3, 0.7)(0.7, 1.0) expected= 3.00, actual=3.00, diff=0.003
        (0.7, 0.7)(1.0, 1.0) expected= 3.00, actual=3.05, diff=0.049
        (0.0, 0.0)(0.7, 0.7) expected= 1.75, actual=1.77, diff=0.017
        (0.3, 0.0)(1.0, 0.7) expected= 2.50, actual=2.60, diff=0.103
        (0.0, 0.3)(0.7, 1.0) expected= 2.50, actual=2.48, diff=0.024
        (0.3, 0.3)(1.0, 1.0) expected= 2.75, actual=2.65, diff=0.103
        (0.0, 0.0)(1.0, 1.0) expected= 2.44, actual=2.46, diff=0.013


        The figures are rounded so don't worry about the 0.7, that is two thirds :)

        The answers are pretty close, the network in question has 20 hidden
        nodes and 58 conections. Given that there are only 14 test cases I would
        say that this network has just learned to give the correct answers
        rather than actually calculating the average of the inputs. So we can
        address this problem by varying the test data and/or using a larger grid
        size - but for now I would say your NEAT code has some problems if it's
        not finding a solution. A couple of quick questions - what activation
        fn are you using? and are you scaling the outputs to the expected range?

        Colin
      • Ian Badcoe
        ... Err, you are using neurones which can output more than 1? Or scaling? Also, do you have bias on your nodes, I expect you do, but I thought I d check
        Message 3 of 15 , Sep 2, 2004
          At 17:17 31/08/2004, John wrote:
          >So I am currently running an experiment that just learns to handle
          >
          >1 2 3
          >2 2 3
          >3 3 3
          >
          >(0,0 - 1/3,1/3) = 1
          >(1/3, 0 - 2/3, 1/3 ) = 2
          >(2/3, 0 - 1, 1/3 ) = 3

          Err, you are using neurones which can output more than 1? Or scaling?

          Also, do you have bias on your nodes, I expect you do, but I thought I'd
          check because this definitely needs it.

          If you are interested (and it may be getting a bit off base). Does
          inputting the coords as row and column selectors help?

          E.g. one input for column one, one for column two, one for column three,
          only one ever turned on at a time...

          ditto on the rows

          Then I guess you need to double that up to give you bottom-left and top-right.

          Actually that's another thought. How about trying to index a single cell
          with one coordinate as a first step?

          Ian


          Living@Home - Open Source Evolving Organisms -
          http://livingathome.sourceforge.net/
        • John Arrowwood
          ... Good catch. The output neuron is a summation node, not a sigmoid. ... Yep! ... It might, but it won t help with where I want to get to. For enlargement,
          Message 4 of 15 , Sep 2, 2004
            >From: Ian Badcoe <ian_badcoe@...>
            >
            >At 17:17 31/08/2004, John wrote:
            > >So I am currently running an experiment that just learns to handle
            > >
            > >1 2 3
            > >2 2 3
            > >3 3 3
            > >
            > >(0,0 - 1/3,1/3) = 1
            > >(1/3, 0 - 2/3, 1/3 ) = 2
            > >(2/3, 0 - 1, 1/3 ) = 3
            >
            >Err, you are using neurones which can output more than 1? Or scaling?

            Good catch. The output neuron is a summation node, not a sigmoid.

            >Also, do you have bias on your nodes, I expect you do, but I thought I'd
            >check because this definitely needs it.

            Yep!

            >If you are interested (and it may be getting a bit off base). Does
            >inputting the coords as row and column selectors help?

            It might, but it won't help with where I want to get to. For enlargement, I
            want to be able to select an area the is smaller than one of the input
            pixels, and that is not centered in one of those pixels.

            +----+----+----+
            | .| | |
            +----+----+----+
            | | | |
            +----+----+----+
            | | | |
            +----+----+----+

            For example, in the above illustration, the area occupied by the little
            black dot is much smaller than the whole square it sits in, and it is
            off-center of that square. Specifying the inputs as columns or rows being
            either active or inactive wouldn't allow me that flexibility. For
            enlargement, I *need* to be able to specify any area of any size at any
            location, both larger and smaller than the original pixels, and have it
            return the average intensity of the 'function' of which the inputs are a
            sampling. The job of the network is to interpolate the function and then
            allow me to query that graph as a continuous function.

            There is a way of doing that already. They do it with something called
            'splines'. And it works, it's just very computationally intensive. And
            it's not quite as good as I have seen a neural net be. But the neural net
            also could only do a fixed enlargement, not a smooth and unlimited scaling,
            like splines. I'm trying to get the best of both worlds...if it can be
            done.

            -- John
          • John Arrowwood
            ... Agreed. So the next step is to evaluate it multiple times per generation with different inputs each time. Assign the 9 pixel inputs a random number from
            Message 5 of 15 , Sep 2, 2004
              >From: Colin Green <cgreen@...>
              >
              >Colin Green wrote:
              >
              >Here are some results using the very basic test domain John described.
              >This is after about 40k gens, say 30-45 mins (not keeping track of time)...
              >
              >Coords
              >(0.0, 0.0)(0.3, 0.3) expected= 1.00, actual=1.00, diff=0.000
              >(0.3, 0.0)(0.7, 0.3) expected= 2.00, actual=2.00, diff=0.003
              >(0.7, 0.0)(1.0, 0.3) expected= 3.00, actual=2.94, diff=0.057
              >(0.0, 0.3)(0.3, 0.7) expected= 2.00, actual=2.00, diff=0.000
              >(0.3, 0.3)(0.7, 0.7) expected= 2.00, actual=1.99, diff=0.008
              >(0.7, 0.3)(1.0, 0.7) expected= 3.00, actual=3.00, diff=0.002
              >(0.0, 0.7)(0.3, 1.0) expected= 3.00, actual=2.99, diff=0.008
              >(0.3, 0.7)(0.7, 1.0) expected= 3.00, actual=3.00, diff=0.003
              >(0.7, 0.7)(1.0, 1.0) expected= 3.00, actual=3.05, diff=0.049
              >(0.0, 0.0)(0.7, 0.7) expected= 1.75, actual=1.77, diff=0.017
              >(0.3, 0.0)(1.0, 0.7) expected= 2.50, actual=2.60, diff=0.103
              >(0.0, 0.3)(0.7, 1.0) expected= 2.50, actual=2.48, diff=0.024
              >(0.3, 0.3)(1.0, 1.0) expected= 2.75, actual=2.65, diff=0.103
              >(0.0, 0.0)(1.0, 1.0) expected= 2.44, actual=2.46, diff=0.013
              >
              >
              >The figures are rounded so don't worry about the 0.7, that is two thirds :)
              >
              >The answers are pretty close, the network in question has 20 hidden
              >nodes and 58 conections. Given that there are only 14 test cases I would
              >say that this network has just learned to give the correct answers
              >rather than actually calculating the average of the inputs.

              Agreed. So the next step is to evaluate it multiple times per generation
              with different inputs each time. Assign the 9 pixel inputs a random number
              from -1 to 1. Then calculate the expected outputs based on those random
              inputs. Calculate each network against the same 100 different random inputs
              per generation, but each generation select a different set of test
              conditions.

              >but for now I would say your NEAT code has some problems if it's
              >not finding a solution.

              It did have a problem, actually, which I fixed. But remember, my code is
              (by design) slower. It tries to be very thorough about the search. That
              means it will take longer, in terms of generations, before it finds a
              solution. But it is unlikely to fail to find a solution if one exists.
              However, I'm still contemplating how I can speed it up without unnecessarily
              increasing the risk of not finding a solution. It's easier to find ways of
              making it MORE thorough (=slower) than it is to find shortcuts and
              equivalences that let us reduce the computational complexity of the search.

              >A couple of quick questions - what activation
              >fn are you using?

              For the hidden nodes, sigmoid. For the output node, a simple summation.
              That way the output can be greater than 1.

              >and are you scaling the outputs to the expected range?

              No. The use of a summation node for the output should take care of that for
              me.
            • Kenneth Stanley
              ... handle ... scaling? ... Is there any particular reason you don t use a sigmoid at the outputs. After all, output can always be scaled to any range. But
              Message 6 of 15 , Sep 2, 2004
                --- In neat@yahoogroups.com, "John Arrowwood" <jarrowwx@h...> wrote:
                > >From: Ian Badcoe <ian_badcoe@y...>
                > >
                > >At 17:17 31/08/2004, John wrote:
                > > >So I am currently running an experiment that just learns to
                handle
                > > >
                > > >1 2 3
                > > >2 2 3
                > > >3 3 3
                > > >
                > > >(0,0 - 1/3,1/3) = 1
                > > >(1/3, 0 - 2/3, 1/3 ) = 2
                > > >(2/3, 0 - 1, 1/3 ) = 3
                > >
                > >Err, you are using neurones which can output more than 1? Or
                scaling?
                >
                > Good catch. The output neuron is a summation node, not a sigmoid.
                >

                Is there any particular reason you don't use a sigmoid at the
                outputs. After all, output can always be scaled to any range. But
                if you don't use a sigmoid, then your output has an arbitrary range,
                and it might be difficult for the bias to be effective in such a
                situation, though then it again it might not matter.

                Also, are you capping your link weights at a max and min value?

                ken
              • John Arrowwood
                ... The raw inputs are scaled before entering them into the network. The output is then scaled back by reversing the scaling using the same parameters.
                Message 7 of 15 , Sep 2, 2004
                  >From: "Kenneth Stanley" <kstanley@...>
                  > > >Err, you are using neurones which can output more than 1? Or
                  >scaling?
                  > >
                  > > Good catch. The output neuron is a summation node, not a sigmoid.
                  > >
                  >
                  >Is there any particular reason you don't use a sigmoid at the
                  >outputs. After all, output can always be scaled to any range.

                  The raw inputs are scaled before entering them into the network. The output
                  is then scaled back by reversing the scaling using the same parameters.
                  However, since the inputs represent an 'average' of the light intensity
                  within a given area, it is possible that areas within that region may be
                  brighter or dimmer than the average, and indeed may be brighter or dimmer
                  than the dynamic range of inputs allows for. Thus, in order for the output
                  to be able to handle this condition, the output must be at a slightly
                  different scale than the inputs.

                  There are, I suppose, two different ways of handling this. You could use
                  either a summation node for the output, which is what I have done, and what
                  the HP team did with their experiments. Or you can use a different (2x)
                  range for the output scaling. Then, during training it can learn to limit
                  its output to within the appropriate range, but still retain the ability to
                  go outside that range when necessary.

                  I would suggest that for the output node, either one should work equally
                  well. If the network learns to keep its outputs in the right range for the
                  sigmoid function (which is essentially linear within that range), it can
                  learn to keep it in the right range for the summation function (which is
                  entirely linear). And since the summation function is less expensive, there
                  is no reason not to use it. I would not make the same argument for the
                  hidden nodes, of course.

                  >But
                  >if you don't use a sigmoid, then your output has an arbitrary range,
                  >and it might be difficult for the bias to be effective in such a
                  >situation, though then it again it might not matter.

                  I don't know that the bias really applies that much for the output. Though
                  it is present.

                  >Also, are you capping your link weights at a max and min value?

                  Yes. +/- 10. And sometimes it hits the cap. Other times, it stays well
                  below.

                  -- John
                • Ian Badcoe
                  You said this before but I was too busy to pick up on it. ... Well, they interpolate using a straight line or a cubic curve which are the simplest things you
                  Message 8 of 15 , Sep 9, 2004
                    You said this before but I was too busy to pick up on it.

                    >There is a way of doing that already. They do it with something called
                    >'splines'.

                    Well, they interpolate using a straight line or a "cubic" curve which are
                    the simplest things you can call splines.

                    > And it works, it's just very computationally intensive.

                    They aren't as cpu intensive as evaluating a neural network or any size...

                    "linear" involves calculating:

                    a*f + b*(1-f) three times per pixel and cubic is something like:

                    a * f^3 + b * f(1-f)^2 + c * f^2*(1-f) + d * (1-f)^3

                    > And
                    >it's not quite as good as I have seen a neural net be. But the neural net
                    >also could only do a fixed enlargement, not a smooth and unlimited scaling,
                    >like splines. I'm trying to get the best of both worlds...if it can be
                    >done.

                    Right, this is where you may not be comparing apples with apples. The
                    "spline" techniques are merely interpolation. They are not trying to
                    introduce any missing information into the enlarged image. They're just
                    trying to interpolate values between the original pixels in clever
                    ways. If that's what you want to do, then I you need to form your
                    experiment a little differently (by training for the ability to reproduce
                    some sort of high-end interpolation).

                    However, from our previous discussions, you want networks which will
                    "estimate" the missing values, not merely by maths from the surrounding
                    pixels, but by building in some sort of assumptions about the picture
                    domain (yes?).

                    The more I think about that, the more I wonder if this whole "multiplexing"
                    angle is a serious impediment. e.g. with you experiment phrased the way it
                    is, the net is going to need a way to focus its attention in the area
                    indicated by the coordinates. Even putting aside the idea of that being a
                    region (TL - BR) and thinking about a point, the network really has its
                    work cut out. e.g. for any point (x,y) it is going to have to consider
                    some input values around the point (say 5 or 6 of them) and for each of
                    those values it will need a complete copy of the multiplexer (maybe 100 -
                    200 neurones). So that's around 1000 neurones, _before_ building in any
                    assumptions about the image content...

                    I'm not saying such a system cannot be evolved, I'm just wondering whether,
                    even with the clever staged fitness scheme you have, the entry probability
                    barrier is too steep.

                    I know it's not to you liking, but how about abandoning coordinates as a
                    form of input data and trying to evolve a network which generates a fixed
                    interpolation when given a small region around it?

                    e.g. you could have a net which, given:

                    X X X X
                    X X.X X
                    X X X X

                    ("X" is any existing pixel)

                    Interpolates the value for the dotted location (e.g. half way between two
                    existing pixels).

                    Repeated applications of that serve to double the X dimension (stage I).

                    X.X.X.X
                    X.X.X.X
                    X.X.X.X

                    Rotate it and you can double the Y dimension (stage J):

                    X X X X
                    . . . .
                    X X X X
                    . . . .
                    X X X X
                    . . . .
                    X X X X

                    Combine both of those for:

                    X.X.X.X
                    . . . .
                    X.X.X.X
                    . . . .
                    X.X.X.X
                    . . . .
                    X.X.X.X

                    And now you just have 25% of the pixels missing. Apply it again (stage
                    "K") ...

                    And you can repeat the whole thing for more enlargement.

                    Now, you are bulding errors up errors here (because you later use outputs
                    as inputs to the next application). But once you have a single network
                    with some degree of functionality, you can clone it into a separate
                    population and start specialising that population to a part of the
                    process. e.g. Stage K is clearly a different process to I and J so a first
                    stage might be to break that out into a different network. After than,
                    well, are I and J really the same, or is a rotated image losing something
                    which could be gained from the fact that the light is generally from above...

                    Ian

                    (From a signal processing point of view, spline interpolation is naive and
                    introduces new frequencies which were never in the original image
                    (frequencies across the image, that is). There are those who advocate
                    changing image resolution by transforming the image into the frequency
                    domain (FFT), and resampling at the new resolution. Now that's CPU intensive.)



                    Living@Home - Open Source Evolving Organisms -
                    http://livingathome.sourceforge.net/
                  Your message has been successfully submitted and would be delivered to recipients shortly.