Loading ...
Sorry, an error occurred while loading the content.

Re: [neat] I think this guy has rediscovered Novelty search

Expand Messages
  • Colin Green
    ... The other way of looking at this is as follows. Think about a typical entropic force such as diffusion of a gas in a tube (see the diagram on the wiki page
    Message 1 of 8 , Apr 27, 2013
    • 0 Attachment
      On 25 April 2013 23:52, Colin Green <colin.green1@...> wrote:
      > On 25 April 2013 19:12, Jeff Clune <jclune@...> wrote:
      >>
      >> 2) More importantly: how do they figure out what actions maximize the
      >> number of future actions?
      >
      > I figure you can get an approximate measure using a number of monte
      > carlo simulations.

      The other way of looking at this is as follows. Think about a typical
      entropic force such as diffusion of a gas in a tube (see the diagram
      on the wiki page http://en.wikipedia.org/wiki/Entropic_force) - so we
      have a barrier in the tube with some gas to the left of it and a
      vacuum to the right. We remove the barrier and the gas as a whole
      moves from left to right as if there is a force pulling it (or
      pushing), but in reality each particle is exhibiting brownian motion
      (random motion), but the random movements to the left or more likely
      to hit another particle and retard progress in that direction, so the
      overall effect of this is that the gas as a whole moves to the right.

      Another representation of this idea is in figure 1(b) of the paper:

      http://math.mit.edu/~freer/papers/PhysRevLett_110-168702.pdf

      In the EC world we could see this as being akin to applying very light
      selection pressure applied by removing some of the offspring genomes
      that have higher entropy, e.g. where the inverted pendulum falls. So
      we aren't directly measuring number of possible future paths for each
      genome (as this would be expensive), but nudging the population away
      from genomes we know have less future paths. The end ('macroscopic')
      result is maximising future paths in the population as a whole.

      So in an EC context my takeaway point here is that instead of
      selecting a few 'of the fittest genomes to seed the next generation,
      we (A) eliminate a small proportion of poor genomes and (B) measure
      fitness as avoidance of known high entropy states (collapsed pendulum,
      fallen walker, stuck in 'dumb' part of the maze, etc). Or in other
      words, we apply light pressure away from dumb regions of the fitness
      space rather than strong pressure towards godo regions - which we know
      tends to lead to local optima that are globally poor.

      Colin
    • Jeff Clune
      Thanks for the ideas Colin. It seems you are trying to come up with your own new algorithm for how you could maximize entropy of future states. It is telling
      Message 2 of 8 , Apr 29, 2013
      • 0 Attachment
        Thanks for the ideas Colin.

        It seems you are trying to come up with your own new algorithm for how you could maximize entropy of future states. It is telling that none of us can seem to figure out how the authors did it, though! A paper should do a better job of explaining its innovations/methods, in my opinion. 

        Your idea could work, but the authors suggest that their mechanism may explain things in natureā€¦and I don't think nature does monte carlo simulations to try to determine which actions will keep the number of future paths open (at least, not until it already invented a brain smart enough to simulate future actions). 

        I remain in the dark about how the paper achieves its results. That, of course, is only a comment on the original paper, not your ideas, which are interesting. 


        Best regards,
        Jeff Clune

        Assistant Professor
        Computer Science
        University of Wyoming
        jeffclune@...
        jeffclune.com

        On Apr 25, 2013, at 4:52 PM, Colin Green <colin.green1@...> wrote:

         

        On 25 April 2013 19:12, Jeff Clune <jclune@...> wrote:
        >
        > 2) More importantly: how do they figure out what actions maximize the
        > number of future actions?

        I figure you can get an approximate measure using a number of monte
        carlo simulations. If there are small variations in initial state (x)
        for each run then the terminal states will vary. If all terminal
        states are similar (clustered) then the number of possible future
        states for init state x is low, otherwise it is high(er). So it's a
        relative and approximate metric rather than an absolute one.

        Any state which has lots of diverse terminal states is low entropy and
        has the maximum number of future states (or pathways, or future
        histories), e.g. a broken mirror is high entropy and does not have an
        unbroken mirror in it's set of possible terminal states.

        So in our case I figure we would make a number of offspring and we
        would evaluate each one by running multiple monte carlo sims, and we
        would assign fitness based on the above 'maximum future possibilities'
        metric.

        Or I may have the wrong end of the stick entirely :)

        Thoughts?

        Colin


      • martin_pyka
        here are my two cents on this paper (I hope I correctly understood the paper and Colins explanations) . the idea to search for solutions with maximal entropy
        Message 3 of 8 , Apr 29, 2013
        • 0 Attachment
          here are my two cents on this paper (I hope I correctly understood the paper and Colins explanations) .

          the idea to search for solutions with maximal entropy in the offsprings of the solution is very interesting and Colins translation to our field should be easy to implement (my suggestion is to name this new algorithm "Entropy Search" ;)).

          But I agree with Jeff that the merit of this paper is more the algorithmic idea to explore the search space rather than a new fundamental insight relevant for all life-sciences (as they claim).

          Some examples are not clear to me, for example the tool use puzzle.

          Fig. 3 of the paper indicates that disk 1 releases disk 3. However, in the video you can see that disk 1, 2 and 3 remain at their locations (after a short journey of disk 2). This makes to me more sense as it seems to be a state that offers the most future directions (highest entropy in future states).

          How does the validation in the stock-market example look like?

          An important variable in their model seems to be tau (the time interval for which future states are evaluated). The decisions of the algorithm seem to heavily depend on tau. If tau is too short, the algorithm might fail to see the future entropy of a given solution. If tau is too long, the search becomes impractical.

          The authors associate their algorithm with "adaptive behavior" and the "cognitive niche", which is difficult for several reasons

          The term "cognitive niche" itself is controverse. It refers more to goal-oriented action planning and execution, involving communication (language in particular) and cooperation. The algorithm by Wissner-Gross might show behaviour that resembles natural behaviour in some limited simulations but it does not mechanistically explain (and this is what they somehow claim) natural behaviour (of course, implicitly I assume that natural organisms are not driven by causal entropic forces).

          I suppose while a human artist would collect all colouring and brushes to paint a picture and then paints a picture, an entropica artist would collect all colouring and brushes and then remain in a stable state to keep all possible future histories alive (unless you increase tau to see that painting a picture brings you into a state with future histories that have even higher entropy).

          Best,
          Martin
        Your message has been successfully submitted and would be delivered to recipients shortly.