Loading ...
Sorry, an error occurred while loading the content.

Re: [neat] Re: High rez input (i.e. Video) generalization

Expand Messages
  • Derek James
    ... I hadn t actually read the Automobile Warning System portion of your dissertation, but I did so after Reuben s mention of it. It s interesting stuff.
    Message 1 of 9 , Dec 1, 2004
    • 0 Attachment
      > --- In neat@yahoogroups.com, "Reuben" <reuben.grinberg@y...> wrote:
      >
      > > Ken - when you were running NEAT in the car domain, did you try to
      > perturb the way the
      > > video was presented? While it would certainly increase the number
      > of generations needed
      > > to get something acceptable, it might also make a solution that is
      > able to generalize. That
      > > is, it might make a solution where there isn't a pixel-feature
      > correspondance.
      > >
      > > Any thoughts?
      > >
      >
      > I think that's a totally valid approach and should be looked into.
      > It may help a lot. We just had results today suggesting that
      > longer, more diverse training experiences lead to better
      > generalization, which suggests that the more varied the examples,
      > the better it generalizes. That in turn suggests that adding some
      > noise might help.
      >
      > ken

      I hadn't actually read the "Automobile Warning System" portion of your
      dissertation, but I did so after Reuben's mention of it. It's
      interesting stuff. But I did want to clarify.

      This is the domain Reuben's referring to when he talks about the "car
      domain", correct?

      And according to the paper, you're not using video data, right?

      The dissertation says:

      "Using raw data provided by RARS, two kinds of sensor systems were
      developed for this
      project and provided as input to NEAT neural networks. First,
      rangefinder sensors project rays at several angles relative to the
      car's heading to the edge of the road (figure 8.2). The rangefinders
      give the car an indication of its position and heading relative to the
      sides of the road, and also of the curvature of the road.

      Second, several simulated radar sensors detect other cars or obstacles
      (figure 8.3). The radar sensors are convenient for detecting discrete
      objects by locating them inside of one of several slices that
      represent relative angles and positions. The radars return a value
      equivalent to the distance of the nearest car in each slice, or a
      maximum value if there is no car."

      So are the first type of sensors analogous to feedback from laser sensors?

      And it sounded from the first experiment, just learning to drive on an
      empty track, that only one track was used...is that correct?

      Does RARS support randomization of track designs? This is a type of
      noise that could have been added to the environment. You also talked
      here about possibly adding noise to the controls, in this case the
      gas, brake, and steering. I guess one could also artificially add
      noise to the sensor data that the cars are receiving. But none of
      these were actually done, right? Would you think any or all of them
      would be effective?

      Derek
    • Kenneth Stanley
      The first thing I should clarify is that when my dissertation (and the chapter on RARS) was written, we had *not* used raw visual input, as you quoted.
      Message 2 of 9 , Dec 1, 2004
      • 0 Attachment
        The first thing I should clarify is that when my dissertation (and
        the chapter on RARS) was written, we had *not* used raw visual
        input, as you quoted. However, just within the last couple weeks we
        have been experimenting with such input and that's what Reuben is
        referring to with his quotes.

        The first set of sensors we used previously were like laser
        rangefinder sensors, and yes we did evolution one track for the most
        part. (We may have evolved on a few different tracks, but each
        individual evolution was on one track)

        In answers to your questions about noise, there are several types of
        noise that can poitentially be added, but my disucssions with Reuben
        were specifically about adding noise to prevent NEAT from learning
        that specific pixels in a raw visual field are indicators of
        specific states or necessary actions. That kind of correlation is
        bad because it's probably just a fluke if some pixel came up intense
        during a certain type of crash or situation in a particular training
        run, and we don't want NEAT learning such correlations. So one way
        to prevent that might be to add noise to the pixels. It's only a
        suggestion as we have not tried it yet.

        In general, it has been shown that for real world transfer "model
        noise" is quite effective, meaning that the actions requested by the
        outputs result in slightly unpredictable results. Faustino Gomez
        showed this here at UT.

        However, in the particular case of raw visual input, we are talking
        about a specific kind of problem- learning pixel correlations to
        states - and that might be best served by some noise in the
        display.

        ken




        --- In neat@yahoogroups.com, Derek James <djames@g...> wrote:
        > > --- In neat@yahoogroups.com, "Reuben" <reuben.grinberg@y...>
        wrote:
        > >
        > > > Ken - when you were running NEAT in the car domain, did you
        try to
        > > perturb the way the
        > > > video was presented? While it would certainly increase the
        number
        > > of generations needed
        > > > to get something acceptable, it might also make a solution
        that is
        > > able to generalize. That
        > > > is, it might make a solution where there isn't a pixel-feature
        > > correspondance.
        > > >
        > > > Any thoughts?
        > > >
        > >
        > > I think that's a totally valid approach and should be looked
        into.
        > > It may help a lot. We just had results today suggesting that
        > > longer, more diverse training experiences lead to better
        > > generalization, which suggests that the more varied the examples,
        > > the better it generalizes. That in turn suggests that adding
        some
        > > noise might help.
        > >
        > > ken
        >
        > I hadn't actually read the "Automobile Warning System" portion of
        your
        > dissertation, but I did so after Reuben's mention of it. It's
        > interesting stuff. But I did want to clarify.
        >
        > This is the domain Reuben's referring to when he talks about
        the "car
        > domain", correct?
        >
        > And according to the paper, you're not using video data, right?
        >
        > The dissertation says:
        >
        > "Using raw data provided by RARS, two kinds of sensor systems were
        > developed for this
        > project and provided as input to NEAT neural networks. First,
        > rangefinder sensors project rays at several angles relative to the
        > car's heading to the edge of the road (figure 8.2). The
        rangefinders
        > give the car an indication of its position and heading relative to
        the
        > sides of the road, and also of the curvature of the road.
        >
        > Second, several simulated radar sensors detect other cars or
        obstacles
        > (figure 8.3). The radar sensors are convenient for detecting
        discrete
        > objects by locating them inside of one of several slices that
        > represent relative angles and positions. The radars return a value
        > equivalent to the distance of the nearest car in each slice, or a
        > maximum value if there is no car."
        >
        > So are the first type of sensors analogous to feedback from laser
        sensors?
        >
        > And it sounded from the first experiment, just learning to drive
        on an
        > empty track, that only one track was used...is that correct?
        >
        > Does RARS support randomization of track designs? This is a type
        of
        > noise that could have been added to the environment. You also
        talked
        > here about possibly adding noise to the controls, in this case the
        > gas, brake, and steering. I guess one could also artificially add
        > noise to the sensor data that the cars are receiving. But none of
        > these were actually done, right? Would you think any or all of
        them
        > would be effective?
        >
        > Derek
      • Derek James
        On Thu, 02 Dec 2004 02:51:53 -0000, Kenneth Stanley ... Ah...okay. Mystery solved. ... I guess my next question is how you are feeding pixel input from the
        Message 3 of 9 , Dec 2, 2004
        • 0 Attachment
          On Thu, 02 Dec 2004 02:51:53 -0000, Kenneth Stanley
          <kstanley@...> wrote:
          >
          > The first thing I should clarify is that when my dissertation (and
          > the chapter on RARS) was written, we had *not* used raw visual
          > input, as you quoted. However, just within the last couple weeks we
          > have been experimenting with such input and that's what Reuben is
          > referring to with his quotes.

          Ah...okay. Mystery solved.

          > In answers to your questions about noise, there are several types of
          > noise that can poitentially be added, but my disucssions with Reuben
          > were specifically about adding noise to prevent NEAT from learning
          > that specific pixels in a raw visual field are indicators of
          > specific states or necessary actions.

          I guess my next question is how you are feeding pixel input from the
          RARS model into the neural network. Are you using an active vision
          approach? Is there just a front-end view? Or also a rear view?

          Are you inputting grayscale values, or including color? How visually
          complex is the RARS model (i.e., Are walls always the same color and
          brightness? Is there shading in this environement?) ? It would seem
          like more richness and variation in the simulation environment would
          lead to more robustness.

          > In general, it has been shown that for real world transfer "model
          > noise" is quite effective, meaning that the actions requested by the
          > outputs result in slightly unpredictable results. Faustino Gomez
          > showed this here at UT.
          >
          > However, in the particular case of raw visual input, we are talking
          > about a specific kind of problem- learning pixel correlations to
          > states - and that might be best served by some noise in the
          > display.

          Well, for car driving, if what you're saying is true, then it would
          make sense to add noise to the control outputs. But for the case of
          just correlating pixel information with particular states or
          situations, that makes sense.

          For the fingerprint domain, as we did with the basic shapes, we
          randomize the images in the evaluation set by slightly moving them a
          random number of pixels up/down, right/left, scaling them up/down by
          10-20%, and rotating them in either direction 10-20 degrees. We
          thought about adding random pixel noise, as Floreano did, and also
          considered randomizing other visual features, such as brightness, but
          in the end decided that the noise we were adding was probably
          sufficient.

          Again, I don't know how sophisticated the RARS visual modeling is, but
          that would seem to be one of the big challenges in such an accident
          warning system...the complexity of real-world visual scenes,
          complicated by the fact that people drive all hours of the day and
          night, so there's a huge amount of variance in lighting conditions.
          Does RARS, for example, have night driving?

          Derek
        • Reuben Grinberg
          Hi Derek, Yes, the pole-balancing approach is a proof-of-concept. I m only going to be using 1 pole. I want to see whether training on high rez video is
          Message 4 of 9 , Dec 2, 2004
          • 0 Attachment
            Hi Derek,

            Yes, the pole-balancing approach is a proof-of-concept. I'm only going
            to be using 1 pole. I want to see whether training on high rez video is
            possible.

            However, since I don't have a physical pole balancing system (and don't
            have the time to evolve a physical system), I'm going to do the
            following:
            Take many still-frames of a block (the car) and a dowel rod or a ruler
            (the pole) in many different positions. I'm going to simulate the
            physics of the system, find the still-frame that corresponds to the
            current pole and cart position, and use that frame as the input. A
            simpler way would be to use OpenGL output from the simulator - wouldn't
            be as "neat" though ;)

            Eventually, I'd like it to be able to generalize to different lighting
            conditions, backgrounds, pole and cart starting positions and colors,
            and perhaps even to viewing angles and zoom.

            To simplify the problem I could break up the problem: one network to
            get pole angles and cart position from the video, and then the second,
            trivial network to do the balancing based on these values. However,
            it's my hypothesis that the combined approach (although it may take
            longer to train) will yield a smaller network. There might be cues in
            the video feed that allow pole-balancing without having to explicitly
            calculate these values. Risto (Ken's advisor) mentioned to me that the
            pole-balancing problem might be too simple to find differences between
            the two approaches.

            A lot of vision today is done by explicitly labeling the world and then
            operating on those labels. However, animals don't really operate that
            way.

            Using the roving eye for this application might work - I'll try it out.

            Thanks,
            Reuben

            P.S. Just submitted my first CS Grad School application yesterday!
            Several more to go...

            On Nov 30, 2004, at 2:04 PM, Derek James wrote:

            > On Tue, 30 Nov 2004 17:00:27 -0000, Reuben <reuben.grinberg@...>
            > wrote:
            > >
            > > I just recently discovered NEAT and am about to try to use it to
            > evolve a vision and
            > > control system for pole-balancing. That is, instead of feeding in
            > the angles and cart
            > > position, I'm
            > > going to use a "video feed" of the system. I say "video feed" in
            > quotes because I'm going > to
            > > use an inverted pendulum simulator to keep track of the physics and
            > feed corresponding
            > > still frames as input.
            > >
            > > Over email, Ken told me that some work they've down with low-rez
            > video. It seems that
            > > the results don't generalize well.
            >
            > I guess my first question is: Why are you taking this particular
            > approach?  Are you wanting to apply this approach to another
            > real-world domain, and this is a first cut?
            >
            > To what extent are you wanting it to generalize? 
            >
            > We're currently experimenting with fingerprint classification, so
            > there might be some overlap in the sorts of issues we're interested
            > in.  For our domain, we're applying an active vision approach, in
            > order to drastically reduce the visual input for a given time step,
            > and to more closely simulate biological vision (there's plenty in the
            > message archives not only on this, but on plenty of interesting
            > domains).
            >
            > You say you're going to feed in still frames as input.  Could you be a
            > little more specific on how you intend to do this?  With pole
            > balancing (by the way, are you going to try single and double?), you
            > wouldn't need to input the entire scene.  Especially since, if you're
            > talking about "hi-resolution", a given scene could have thousands of
            > pixels, or more.
            >
            > The only things you care about in the scene are the angles of the
            > poles and the position and velocity of the cart, right?  So if you set
            > up your virtual camera so that it is viewing the profile of the cart,
            > you could just feed in pixel values from small windows on either side
            > of the starting pole position, and from a thin strip along the path of
            > the cart.  But then, this is hand-picking what the system sees, and
            > wouldn't really be much different from just directly inputting the
            > angles and other information.
            >
            > I would imagine that this would be the way a human would solve the
            > problem, by moving the cart with their hand, while positioning their
            > eyes to be in profile with the cart to watch the angle of the pole.
            > You might want such a system to be robust to slight changes in the
            > visual input, but you probably wouldn't need a system that could, for
            > example, balance the pole(s) by only looking at a top view of the
            > scene.  Is this what you're going for?
            >
            > By the way, these double pole-balancing movies using the ESP technique
            > are pretty cool:
            >
            > http://nn.cs.utexas.edu/pages/research/espdemo/
            >
            > Derek
            >
            > > Ken wrote: "However, we noticed an interesting problem.  It is
            > learning which pixel means
            > > what.  In other words, it is not learning an abstraction at all. 
            > It's just learning off the
            > > specific pixels.  That means you get really bad generalization
            > performance if you test it on
            > > stuff it hasn't seen before. "
            > >
            > > Ken - when you were running NEAT in the car domain, did you try to
            > perturb the way the
            > > video was presented? While it would certainly increase the number
            > of generations needed
            > > to get something acceptable, it might also make a solution that is
            > able to generalize. That
            > > is, it might make a solution where there isn't a pixel-feature
            > correspondance.
            > >
            > > Any thoughts?
            > >
            > > -Reuben
            > > --------------
            > > Reuben Grinberg
            > > reuben.grinberg@...
            > > Trumbull College, Yale University
            > > Computer Science, Class of '05
            > >
            > > Yale Social Robotics Lab - http://gundam.cs.yale.edu
            > >
            > >
            > >
            > > Yahoo! Groups Links
            > >
            > >
            > >
            > >
            > >
            >
            >
            >
            > Yahoo! Groups Sponsor
            >
            > ADVERTISEMENT
            > <111704_1104_g_300250a.gif>
            > <l.gif>
            >
            > Yahoo! Groups Links
            >
            > • To visit your group on the web, go to:
            > http://groups.yahoo.com/group/neat/
            >  
            > • To unsubscribe from this group, send an email to:
            > neat-unsubscribe@yahoogroups.com
            >  
            > • Your use of Yahoo! Groups is subject to the Yahoo! Terms of
            > Service.
            >
            >
          • Kenneth Stanley
            ... the ... Just front end, non-active vision. Basically, we feed the view out the front window. ... visually ... and ... seem ... would ... True, the richer
            Message 5 of 9 , Dec 5, 2004
            • 0 Attachment
              --- In neat@yahoogroups.com, Derek James <djames@g...> wrote:
              > I guess my next question is how you are feeding pixel input from
              the
              > RARS model into the neural network. Are you using an active vision
              > approach? Is there just a front-end view? Or also a rear view?
              >

              Just front end, non-active vision. Basically, we feed the view out
              the front window.

              > Are you inputting grayscale values, or including color? How
              visually
              > complex is the RARS model (i.e., Are walls always the same color
              and
              > brightness? Is there shading in this environement?) ? It would
              seem
              > like more richness and variation in the simulation environment
              would
              > lead to more robustness.
              >

              True, the richer the better. RARS gives a decent, though a bit
              video-gamish, first-person driver view. It's not photorealistic by
              any stretch.

              > > In general, it has been shown that for real world transfer "model
              > > noise" is quite effective, meaning that the actions requested by
              the
              > > outputs result in slightly unpredictable results. Faustino Gomez
              > > showed this here at UT.
              > >
              > > However, in the particular case of raw visual input, we are
              talking
              > > about a specific kind of problem- learning pixel correlations to
              > > states - and that might be best served by some noise in the
              > > display.
              >
              > Well, for car driving, if what you're saying is true, then it would
              > make sense to add noise to the control outputs. But for the case
              of
              > just correlating pixel information with particular states or
              > situations, that makes sense.
              >
              > For the fingerprint domain, as we did with the basic shapes, we
              > randomize the images in the evaluation set by slightly moving them
              a
              > random number of pixels up/down, right/left, scaling them up/down
              by
              > 10-20%, and rotating them in either direction 10-20 degrees. We
              > thought about adding random pixel noise, as Floreano did, and also
              > considered randomizing other visual features, such as brightness,
              but
              > in the end decided that the noise we were adding was probably
              > sufficient.
              >
              > Again, I don't know how sophisticated the RARS visual modeling is,
              but
              > that would seem to be one of the big challenges in such an accident
              > warning system...the complexity of real-world visual scenes,
              > complicated by the fact that people drive all hours of the day and
              > night, so there's a huge amount of variance in lighting
              conditions.
              > Does RARS, for example, have night driving?
              >

              No I don't believe it does. We aren't really trying to solve the
              visual recognition task though. What we are doing is trying various
              levels of input from processed radars to raw visual input to see
              what NEAT can figure out from different kinds of input. A truly
              industrial strength warning system would include some visual
              preprocessing algorithms at the front end before things get to NEAT.

              We haven't really waded into action vision territory yet, and
              whether we will has not yet been determined.

              ken
            • Derek James
              On Mon, 06 Dec 2004 05:39:20 -0000, Kenneth Stanley ... How big a scene is that? How many pixels are we talking about? I would imagine that the front end
              Message 6 of 9 , Dec 6, 2004
              • 0 Attachment
                On Mon, 06 Dec 2004 05:39:20 -0000, Kenneth Stanley
                <kstanley@...> wrote:
                >
                > Just front end, non-active vision. Basically, we feed the view out
                > the front window.

                How big a scene is that? How many pixels are we talking about? I
                would imagine that the front end visual scene of a simulated car would
                be at least something like 500x400 pixels. You aren't inputting
                20,000 pixels inputs into a neural network, are you? If not, then how
                are you limiting the input, if not by active vision? Greatly reducing
                the resolution?

                > > Again, I don't know how sophisticated the RARS visual modeling is,
                > but
                > > that would seem to be one of the big challenges in such an accident
                > > warning system...the complexity of real-world visual scenes,
                > > complicated by the fact that people drive all hours of the day and
                > > night, so there's a huge amount of variance in lighting
                > conditions.
                > > Does RARS, for example, have night driving?
                > >
                >
                > No I don't believe it does. We aren't really trying to solve the
                > visual recognition task though.

                Well, I understand that. But you are trying to solve the accident
                warning system problem, right? And for any system to work reasonably
                well, it would have to be robust to highly variant road conditions,
                including weather, lighting, etc. Rain, for example, is going to add
                an extreme amount of noise to any sensory input (radar or visual), and
                I would imagine that lots of accidents happen in rainy weather.

                Derek
              Your message has been successfully submitted and would be delivered to recipients shortly.