Loading ...
Sorry, an error occurred while loading the content.
 

SLEAT TTT first results

Expand Messages
  • Ian Badcoe
    Hi, I ve uploaded a zip file of my first results to the SLEAT folder in the file area. It is TTT 1 Analysis.zip and all you need to do is unzip it into a
    Message 1 of 7 , May 16, 2006
      Hi,

      I've uploaded a zip file of my first results to
      the SLEAT folder in the file area. It is "TTT 1
      Analysis.zip" and all you need to do is unzip it into
      a folder and open "results.htm".

      I'll explain more what it means when I have
      time...

      Ian

      p.s. Quiet in here....

      Send instant messages to your online friends http://uk.messenger.yahoo.com
    • Kenneth Stanley
      Hi Ian, looks like you did a pretty comprehensive analysis. I don t know what all the variables mean so I ll wait for you explanation, but one thing I noticed
      Message 2 of 7 , May 17, 2006
        Hi Ian, looks like you did a pretty comprehensive analysis. I don't
        know what all the variables mean so I'll wait for you explanation,
        but one thing I noticed that was surprising is a graph showing that
        not keeping champions climbed faster than keeping champtions. That
        seems strange, but it's possible it has something to do with SLEAT
        being steady state? In a steady state GA of course there is no need
        for explicit elitism, and it may even hurt. In any case, I'm not
        certain that's what's going on.

        ken

        --- In neat@yahoogroups.com, Ian Badcoe <ian_badcoe@...> wrote:
        >
        > Hi,
        >
        > I've uploaded a zip file of my first results to
        > the SLEAT folder in the file area. It is "TTT 1
        > Analysis.zip" and all you need to do is unzip it into
        > a folder and open "results.htm".
        >
        > I'll explain more what it means when I have
        > time...
        >
        > Ian
        >
        > p.s. Quiet in here....
        >
        > Send instant messages to your online friends
        http://uk.messenger.yahoo.com
        >
      • Ian Badcoe
        Hi, OK so I finally have to results to ponder. Let me tell you what I think in no particular order... ...and ask everyone for anything they can think of to
        Message 3 of 7 , May 17, 2006
          Hi,

          OK so I finally have to results to ponder. Let me tell you what I
          think in no particular order...

          ...and ask everyone for anything they can think of to add.

          These experiments were to create TicTacToe players, using the
          "cheating" method of evaluating with the "Perfect Player Score"
          (TTT-PPS) that I outlined before.

          What I've been doing is fishing for reasonable parameters. So I set
          up a basic configuration, and (to avoid a combinatorial explosion of
          tests) just varied one parameter at a time.

          --

          I was also using:

          SLEAT - where multiple chromosomes communicate to one another via
          rewritten versions of the input data.

          Variance - where all weights and FIDs (FIDs are a detail of SLEAT)
          have an accompanying "variance" and every time they are reproduced
          they are varied within a normal distribution centered on their
          current value and with a width based on the variance (this is like
          Evolutionary Computation (EC)).

          Control of genome size via only 50% chance of inclusion of unpaired
          genes in cross breeding.

          Multiple node modes (sigmoid, and, or, xor, ...)

          The simplest form of my "advanced speciation"

          And I ran three different random number seeds for each one. And ran
          each run up to 10,000,000 cycles (call that 100,000-ish generations).

          --

          The parameters I varied were:

          1) population size
          2) how much to mutate to create a new species
          3) mutation rate
          4) how long to keep trying with a stalled species
          5) whether to mostly crossover/mutate or mostly clone/mutate
          6) whether to keep champions when merging species (preserving merge)
          7) whether to punish illegal moves (PPS parameter)
          8) whether to reward partial success (PPS parameter)

          And the results of this are in the file I uploaded. For each there
          are two graphs (i) the whole thing, (ii) zoomed on the 0.9 to 0.95
          region of fitness. The latter is because it is clear that my biggest
          problem is the way the search slows at high scores.

          --

          Conclusions and puzzlements:

          1) population of 50 is best (this is per species, I use up to 2
          species at a time at the moment). You can see how small population
          sizes work, but are unreliable and a size of 100 works but is slower...

          2) there is no overall conclusion from the degree of mutation for a
          new species, as "20 mutations" is highest at the end, we could call
          that a winner, but as one of my other conclusions is going to be a
          more general "mutate more", I think I reserve judgement on this.

          3) low mutation just failed, I'm not 100% sure why, but I'm thinking
          that individual added genes were so rare they never met one another...

          High mutation is better, so my conclusion is to try higher mutation
          rates still and try to find an optimum.

          4) no overall conclusion about how long to wait on a stalled species
          before spawning a new one, less patient might be better but overall I
          would say this wasn't conclusive yet. Trying higher mutation rates
          might change the scale of these anyway.

          5) Too much breeding verses just clone/mutate is slightly worse, but
          only slightly.

          6) Not keeping champions during species merges is slightly better. I
          have no idea why, except that maybe the species merge doesn't tend to
          lose anything even without this (thus it doesn't help) and keeping
          champions occasionally gets in the way of a less fit hybrid that has
          the _potential_ to be more fit. I'm going to keep champions until I
          have firmed up the other params, and then try this again in more detail.

          7/8) These change the scoring system and cannot be simply compared
          with the other absolute score values. I will recalculate their final
          results in the same system as the others as a guide. Generally,
          however, neither of these curves looks especially better or
          worse. Punishing illegal moves shifts the whole curve down,
          rewarding partial success pushes the whole curve up, but generally my
          guess is this is just due to the change in the score system and the
          results are about the same.

          --

          One embarrassment I have from this is I underestimated how much I
          needed to change the random numbers to get a genuinly different
          run. e.g. for my data set #2 I just advanced the random number
          sequence by a couple of hundred thousand, thinking that although the
          sequences would overlap, every number would be used in a different
          context and hence very different. In fact, some of my curves in
          these two sets start as time-shifted versions of one another, and
          manage to hold on to that similarity for up to half the run!

          So the lesson there is "always use completely different random number seeds"!

          --

          This represents about 18 CPU-weeks of work...

          --

          Special thanks to Colin for providing 1/3 of the CPU. My next step
          is a smaller series of higer mutation studies, then look at
          speciation params again, then look at keeping champions in some
          detail... The current best player I have seen has a PPS of 94.8%. I
          need to upgrade some software before I can take it on man to man...

          Ian
          In 15 minutes everybody will be in the future.
        • Kenneth Stanley
          Ian, it sounds like some results are inconclusive, but that may not be a bad things because it may imply that the algorithm is somewhat robust to parameter
          Message 4 of 7 , May 21, 2006
            Ian, it sounds like some results are inconclusive, but that may not
            be a bad things because it may imply that the algorithm is somewhat
            robust to parameter variation, which would be good.

            Are you considering a writeup similar to what Colin has put up about
            SharpNEAT and some of his experiments? They aren't academic
            publications, but they are easy to read, include text and figures,
            and reveal to the reader the key issues. It would be great to have
            a reference like that for SLEAT.

            Also, I haven't spent much time in the TTT world; can people remind
            me how good 94.8% is? How is PPS measured again? Is that the same
            measure as in past research such as Derek and Philip's ANJI results?

            ken



            --- In neat@yahoogroups.com, Ian Badcoe <ian_badcoe@...> wrote:
            >
            > Hi,
            >
            > OK so I finally have to results to ponder. Let me tell you
            what I
            > think in no particular order...
            >
            > ...and ask everyone for anything they can think of to add.
            >
            > These experiments were to create TicTacToe players, using the
            > "cheating" method of evaluating with the "Perfect Player Score"
            > (TTT-PPS) that I outlined before.
            >
            > What I've been doing is fishing for reasonable parameters.
            So I set
            > up a basic configuration, and (to avoid a combinatorial explosion
            of
            > tests) just varied one parameter at a time.
            >
            > --
            >
            > I was also using:
            >
            > SLEAT - where multiple chromosomes communicate to one another via
            > rewritten versions of the input data.
            >
            > Variance - where all weights and FIDs (FIDs are a detail of SLEAT)
            > have an accompanying "variance" and every time they are reproduced
            > they are varied within a normal distribution centered on their
            > current value and with a width based on the variance (this is like
            > Evolutionary Computation (EC)).
            >
            > Control of genome size via only 50% chance of inclusion of
            unpaired
            > genes in cross breeding.
            >
            > Multiple node modes (sigmoid, and, or, xor, ...)
            >
            > The simplest form of my "advanced speciation"
            >
            > And I ran three different random number seeds for each one. And
            ran
            > each run up to 10,000,000 cycles (call that 100,000-ish
            generations).
            >
            > --
            >
            > The parameters I varied were:
            >
            > 1) population size
            > 2) how much to mutate to create a new species
            > 3) mutation rate
            > 4) how long to keep trying with a stalled species
            > 5) whether to mostly crossover/mutate or mostly clone/mutate
            > 6) whether to keep champions when merging species (preserving
            merge)
            > 7) whether to punish illegal moves (PPS parameter)
            > 8) whether to reward partial success (PPS parameter)
            >
            > And the results of this are in the file I uploaded. For each
            there
            > are two graphs (i) the whole thing, (ii) zoomed on the 0.9 to 0.95
            > region of fitness. The latter is because it is clear that my
            biggest
            > problem is the way the search slows at high scores.
            >
            > --
            >
            > Conclusions and puzzlements:
            >
            > 1) population of 50 is best (this is per species, I use up to 2
            > species at a time at the moment). You can see how small
            population
            > sizes work, but are unreliable and a size of 100 works but is
            slower...
            >
            > 2) there is no overall conclusion from the degree of mutation for
            a
            > new species, as "20 mutations" is highest at the end, we could
            call
            > that a winner, but as one of my other conclusions is going to be a
            > more general "mutate more", I think I reserve judgement on this.
            >
            > 3) low mutation just failed, I'm not 100% sure why, but I'm
            thinking
            > that individual added genes were so rare they never met one
            another...
            >
            > High mutation is better, so my conclusion is to try higher
            mutation
            > rates still and try to find an optimum.
            >
            > 4) no overall conclusion about how long to wait on a stalled
            species
            > before spawning a new one, less patient might be better but
            overall I
            > would say this wasn't conclusive yet. Trying higher mutation
            rates
            > might change the scale of these anyway.
            >
            > 5) Too much breeding verses just clone/mutate is slightly worse,
            but
            > only slightly.
            >
            > 6) Not keeping champions during species merges is slightly
            better. I
            > have no idea why, except that maybe the species merge doesn't tend
            to
            > lose anything even without this (thus it doesn't help) and keeping
            > champions occasionally gets in the way of a less fit hybrid that
            has
            > the _potential_ to be more fit. I'm going to keep champions until
            I
            > have firmed up the other params, and then try this again in more
            detail.
            >
            > 7/8) These change the scoring system and cannot be simply compared
            > with the other absolute score values. I will recalculate their
            final
            > results in the same system as the others as a guide. Generally,
            > however, neither of these curves looks especially better or
            > worse. Punishing illegal moves shifts the whole curve down,
            > rewarding partial success pushes the whole curve up, but generally
            my
            > guess is this is just due to the change in the score system and
            the
            > results are about the same.
            >
            > --
            >
            > One embarrassment I have from this is I underestimated how much I
            > needed to change the random numbers to get a genuinly different
            > run. e.g. for my data set #2 I just advanced the random number
            > sequence by a couple of hundred thousand, thinking that although
            the
            > sequences would overlap, every number would be used in a different
            > context and hence very different. In fact, some of my curves in
            > these two sets start as time-shifted versions of one another, and
            > manage to hold on to that similarity for up to half the run!
            >
            > So the lesson there is "always use completely different random
            number seeds"!
            >
            > --
            >
            > This represents about 18 CPU-weeks of work...
            >
            > --
            >
            > Special thanks to Colin for providing 1/3 of the CPU. My next
            step
            > is a smaller series of higer mutation studies, then look at
            > speciation params again, then look at keeping champions in some
            > detail... The current best player I have seen has a PPS of
            94.8%. I
            > need to upgrade some software before I can take it on man to man...
            >
            > Ian
            > In 15 minutes everybody will be in the future.
            >
          • Ian Badcoe
            ... This is true. I m now going to vary the ones that do have an effect a bit further, to try and find a rough optimum. Just to improve the speed of further
            Message 5 of 7 , May 22, 2006
              --- Kenneth Stanley <kstanley@...> wrote:

              > Ian, it sounds like some results are inconclusive,
              > but that may not
              > be a bad things because it may imply that the
              > algorithm is somewhat
              > robust to parameter variation, which would be good.

              This is true. I'm now going to vary the ones that do
              have an effect a bit further, to try and find a rough
              optimum. Just to improve the speed of further
              experiments.

              > Are you considering a writeup similar to what Colin
              > has put up about
              > SharpNEAT and some of his experiments? They aren't
              > academic
              > publications, but they are easy to read, include
              > text and figures,
              > and reveal to the reader the key issues. It would
              > be great to have
              > a reference like that for SLEAT.

              I am. I also have a promise of assistance from a
              couple of tame accademics, so I can write it up
              properly...

              > Also, I haven't spent much time in the TTT world;
              > can people remind
              > me how good 94.8% is? How is PPS measured again?
              > Is that the same
              > measure as in past research such as Derek and
              > Philip's ANJI results?

              No. PPS was the system I invented for myself a while
              back. It works by feeding the player every possible
              board and assessing their response against a full
              knowledge fo the game tree. Thus if a win is
              possible, they score 1.0 for picking any move that
              evenually leads to a win. Optionally they score less
              for a move that leads to a stalemate (partial
              success). Also optionally, illegal moves attract -ve
              scores.

              I'm acutally using the "normalised" verison of PPS,
              whcih treats all rotations/reflections of the same
              board as a single enantiomer.

              Ian

              Send instant messages to your online friends http://uk.messenger.yahoo.com
            • Derek James
              ... I may have to dig around in the archives and see if there s a more detailed explanation of this technique (did you post one before?). This certainly seems
              Message 6 of 7 , May 22, 2006
                On 5/22/06, Ian Badcoe <ian_badcoe@...> wrote:

                No.  PPS was the system I invented for myself a while
                back.  It works by feeding the player every possible
                board and assessing their response against a full
                knowledge fo the game tree.  Thus if a win is
                possible, they score 1.0 for picking any move that
                evenually leads to a win.  Optionally they score less
                for a move that leads to a stalemate (partial
                success).  Also optionally, illegal moves attract -ve
                scores.

                I may have to dig around in the archives and see if there's a more detailed explanation of this technique (did you post one before?).

                This certainly seems like a better evaluation method than some I've seen (if I'm understanding it correctly).  Seems like you're rewarding very generic, and not very good, play, though.  Let's say this is the board:

                123
                456
                789

                Let's say the ANN is playing X's.  It plays at 9.  You give it 1 point for that move, correct?  Then for all other possible next moves, you evaluate its responses.  What about the branch where O plays at 5?  Does the ANN receive the same score for a move at 4 as it does for a move at 7, 8, 6, or 3?  4 could eventually lead to a win, but it's a poorer move (in general) than one that makes a potential win for the next move, right?

                It would seem if you're not making these distinctions that the quality of evolved play could actually be very bad.  Have you then tested the player after using this kind of evaluation, e.g., by playing it yourself or putting it against a random player?

                Derek
              • Ian Badcoe
                ... Yup, look for something with TTT PPS in the title, I think. I think I even posted example code at one point... ... I don t care how far away in the
                Message 7 of 7 , May 22, 2006
                  --- Derek James <djames@...> wrote:

                  > On 5/22/06, Ian Badcoe <ian_badcoe@...>
                  > wrote:
                  > >
                  > >
                  > > No. PPS was the system I invented for myself a
                  > while
                  > > back. It works by feeding the player every
                  > possible
                  > > board and assessing their response against a full
                  > > knowledge fo the game tree. Thus if a win is
                  > > possible, they score 1.0 for picking any move that
                  > > evenually leads to a win. Optionally they score
                  > less
                  > > for a move that leads to a stalemate (partial
                  > > success). Also optionally, illegal moves attract
                  > -ve
                  > > scores.
                  >
                  >
                  > I may have to dig around in the archives and see if
                  > there's a more detailed
                  > explanation of this technique (did you post one
                  > before?).

                  Yup, look for something with "TTT PPS" in the title, I
                  think.

                  I think I even posted example code at one point...

                  > This certainly seems like a better evaluation method
                  > than some I've seen (if
                  > I'm understanding it correctly). Seems like you're
                  > rewarding very generic,
                  > and not very good, play, though. Let's say this is
                  > the board:
                  >
                  > 123
                  > 456
                  > 789
                  >
                  > Let's say the ANN is playing X's. It plays at 9.
                  > You give it 1 point for
                  > that move, correct? Then for all other possible
                  > next moves, you evaluate
                  > its responses. What about the branch where O plays
                  > at 5? Does the ANN
                  > receive the same score for a move at 4 as it does
                  > for a move at 7, 8, 6, or
                  > 3? 4 could eventually lead to a win, but it's a
                  > poorer move (in general)
                  > than one that makes a potential win for the next
                  > move, right?

                  I don't care how far away in the future the win might
                  be...

                  But I do make the assumption that neither player will
                  ever make a mistake (beyond the one that might be in
                  the mobve we are assessing). So that ony moves that
                  lead to a _certain_ win are rewarded, because if the
                  win is not certain then we know that the opponent will
                  take advantage.

                  You can do this in TTT because the whole game graph is
                  relatively small...

                  > It would seem if you're not making these
                  > distinctions that the quality of
                  > evolved play could actually be very bad. Have you
                  > then tested the player
                  > after using this kind of evaluation, e.g., by
                  > playing it yourself or putting
                  > it against a random player?

                  I did in the past. I have to upgrade the code to the
                  latest SLEAT before I can do it again, but I must do
                  that....

                  I expect it to play well, but with the odd blindspot.
                  Unlike a human, of course, the blindspots could easily
                  be for some of the simplest situations. I can also
                  get a print out of all the boards that a player gets
                  wrong...

                  Ian


                  Send instant messages to your online friends http://uk.messenger.yahoo.com
                Your message has been successfully submitted and would be delivered to recipients shortly.