Loading ...
Sorry, an error occurred while loading the content.
 

A New Competition?

Expand Messages
  • Marty
    With the source code distributed and some early results distributed, at this time I would like to open a fairly broad discussion here both about the
    Message 1 of 2 , Jan 8, 2010
      With the source code distributed and some early results distributed, at this time I would like to open a fairly broad discussion here both about the competition that we have had and about competitions we could have in the future.

      I propose three topics for discussion, which of course overlap.

      1. What have we learned from this competition?
      2. Should we have another competition? If so, when? What should the timeline be? Will you compete? Who should run the competition?
      3. Should the new competition be different than the present one? If so, how?

      I will begin the discussion by presenting my feelings about these topics. I encourage others to do the same. If we decide on a new competition, at some point we should shift over to a more directed discussion.

      1. What have we learned from this competition?
      Although there is a lot of data to be sifted through, and a lot of bots that I need to more thoroughly understand, I think my first impression of this competition is highlighted by the performance of the first three bots (Soton, Pujara, and RL3), and the last bot, Kuhlmann.

      In a gutsy move, Greg Kuhlmann took a nihilistic approach to the competition. Denying the existence of any structure to the game, he submitted a bot that played uniformly at random, arguing it should have as good a chance as any at winning. The interesting part about his point is that had other players thought similarly, he would have been right. In particular, if everyone else is playing randomly, you can play randomly too. I was hoping to avoid everyone submitting random bots, so I gave example bots that demonstrated how a random bot would lose if others used more sophisticated means. The random strategy in the real competition lost, and lost badly. I don't want to trivialize this result: it is a credit to people's faith in the intelligence of others. Without this, the game would have been very uninteresting. In fact, the game reminds me of that quote from Star Wars V about the cave on Dagoban:
      "That place…is strong with the dark side of the Force."
      "What's in there?"
      "Only what you take with you."
      Suffice it to say, a lot of people "brought it". :-)

      The second bot I wish to describe was Jay Pujara's bot. Pujara was a "modified constant" bot: it stayed in one place until it saw enough bad rounds there, at which point it moved to a new random spot. One point about his bot is that once it moves, it stays still for at least two more rounds: in other words, it is completely exploitable. This is not the tit-for-tat strategy of Axelrod's competition. It is highly exploitable: two bots in collusion who agreed first on who would sit clockwise versus counterclockwise, could get 9 in expectation every round Pujara's bot moves (by playing randomly), and then 11 points each when Pujara's bot stood still (by "sandwiching" him). No one was getting close to this performance.

      The bot currently in the lead, Soton (or EASquared) from the University of Southhampton and University College London, gained through "co-opetition", cooperating with its competitors. In particular, it attempted to predict using three very loose models.
      Either a bot was "sticky" (moved very little distance or rarely far),
      or "followy" (tracked another agent's previous movements). Basically, Soton tried to see which bot was the easiest to predict, and then tried to be directly opposite that bot (in other words, "followy"). This strategy works well if one (or two) of the bots plays a constant strategy. Thus, Soton will work well in the presence of Pujara, as can be seen in the detailed results.

      RL3 submitted a bot that had a variety of different attacks. Like Soton, it could play opposite, but it could also initiate or participate in a sandwich attack. Apparently, this was ineffective against Pujara. Statistically speaking, RL3 played above average (more than eight points per round) in matches where Pujara was playing, but Pujara actually fared better in these matches. I could see three reasons for this, but this is still conjecture as I have not done a detailed analysis of their confrontation. The first is, judging from their description, RL3 engages in a sandwich attack only if it finds itself bumping heads with player B on the far side of the island from player A who is playing a constant strategy. It is very possible this just didn't happen that often. The second is that I didn't see anyone else describing anything resembling a sandwich attack: one important thing about this particular attack is that it doesn't arise from conventional game theoretic analysis (more on this later). Finally, RL3 (fearing a sandwich attack) begins by playing randomly. Given the inclination of a lot of players to play opposite a player that looks constant, playing randomly initially is probably not optimal.

      In summary, there are several "options" that people implemented:

      1. Play constant
      2. Play opposite someone
      3. Initiate a sandwich (play next to someone)
      4. Complete a sandwich (play on the other side)

      I think there is a lot of work left deciding on which of these to try. EASquared had a metric which was interesting, a sort of distance from stickiness, which was the L1 distance moved (with the circle metric) between rounds discounted over time.

      I noticed that there were those who tried "kitchen sink" approaches, where they tried to mix between a variety of strategies. This did not seem to be very effective: I think that this highlights one principle of regret minimization. Your set of experts is like a belief, and a very vague belief will not do well in this short time period. I do not believe that this means that you have to do something hacky to decide which strategies to include. It isn't less principled to make decisions before the match begins if those decisions themselves are principled. Online learning during a match should be considered a phase of the "lifelong learning" of the bot. Otherwise, it is like someone who shows up to the World Series of Poker ready to learn if a straight beats a flush.


      In summary, I believe that there were some interesting strategies. In the poker competitions I have run before, I think that a good benchmark is when the competition is not won by a "rule-based" method. What "rule-based" mean is very much a matter of judgement. However, I think that this is the phase where this competition is.

      2. Should we have another competition? If so, when? What should the timeline be? Will you compete? Who should run the competition?

      Yes, I think we should have another competition. I think that the submission deadline should be either summer or November 31st (December 31st was a bad idea). It would be nice to at least have a rough idea of this competition by the end of the month (Jan 31st).


      Personally, I would like to compete. However, I am also very interested in seeing this competition go in the right direction, so I would like to participate in running it in some capacity. More people could join me in running it, though.


      3. Should the new competition be different than the present one? If so, how?

      As I said earlier, rule-based bots are winning. There is no significant automated tuning. To some degree, that indicates that there is research to be done, and there would be a benefit to keeping the rules as is. However, there are several different alternatives.

      Choice 1: Number of rounds in a game.
      There were at least two people who felt more rounds would make a qualitatively different game, and I agree. However, I think it may be less interesting as opposed to more. Nonetheless, I don't want to be tyrannical about this (or for that matter, anything), so 100 and 1000 have been options mentioned. 100 rounds forces people to come up with a plan of cooperation before the game begins.

      Choice 2: Continuous submission versus one deadline.
      In theory one could submit new bots (or updates to your current bot) over time. This could be done in either a runup to the competition, or as the competition itself. On the other hand, it brings up a fairly complex "meta-game" that may distract from the game at hand.

      Alternatively, one could imagine the bots themselves getting feedback. There could be virtual "days", and at the end of every day, the bots can change their strategies based upon the observed behavior of the last day.

      Choice 3: Cheap talk
      It is conceivable that some amount of cheap talk could be allowed. There could be a very primitive semantic, such as statements must be joint actions or sets of joint actions. Again, I think that people are managing coordination without cheap. However, what if instead of
      trying to sandwich, RL3 stated his sandwich plan as cheap talk? Would that drive more sophisticated cooperation?


      Choice 4: A fixed prespecified utility versus a randomly generated utility.

      First of all, let me say that I am pretty happy with the utility as is. The game seems to have a variety of unsolved problems, especially in terms of trying to figure out what other people are doing.

      However, the best strategies are rule-based. A randomly generated game would make that MUCH harder to do. My thought was to generate a random simultaneous action repeated game with three players, twelve actions that was:
      1. Symmetric (between players)
      2. Constant-sum

      These properties capture the basic properties of the Lemonade Game. What I am afraid of is that such a problem is still too hard. Also, a big question is prep time: how much time do you give each agent to learn the game before starting to play? Five minutes? An hour? A day? In poker, our bots were trained in self-play for weeks. Finally, this really begins to become an engineering problem, for better or worse.

      Note that Choice 3 and Choice 4 are related. Cheap talk could make random games more tractable.

      Choice 5: Continuum of actions versus discrete (and number?).
      This is a choice that I thought about. I think that discrete actions just makes it easier coding-wise: a continuum can be approximated with say 1 million actions. Anyway, I think that this is a secondary issue, and should be decided after other major decisions are resolved.

      This set of choices is not exhaustive: it merely represents my thoughts and a selection of the thoughts of others I have heard. In conclusion, I am not convinced that we have seen the best, most sophisticated agents for the competition in its present form. But, as in mathematics, we may have to make the game more general in order to see simpler, more elegant solutions.
    • Archie Chapman
      Hi folks, A couple of weeks ago, Marty posted three questions for discussion regarding the future of the lemonade game, and here I will respond to two of them
      Message 2 of 2 , Jan 27, 2010
        Hi folks,
        A couple of weeks ago, Marty posted three questions for discussion regarding the future of the lemonade game, and here I will respond to two of them (these are my own opinions, not a Soton team consensus):
         
        2. Should we have another competition? If so, when? What should the timeline be? Will you compete? Who should run the competition?
         
        Yes, we should definitely run another competition.  
         
        When doesn't matter so much, as long as it doesn't have a deadline of new year's eve ;)  Beyond that, I don't think the entrants would need more than three months notice (would anyone honestly start work on their refined bots before then?), so, I guess if we could agree on a date soon, any time after the beginning May is suitable.  Also, aligning the competition with a conference or workshop is a good idea.  I know that EC has been suggested -- this seems a reasonable time as any, and a good venue to boot.
         
        We (Soton) will, of course, enter the competition again.  As for who runs it, Marty has expressed his desire to enter, and obviously it would be difficult for him to run the competition at the same time.  Other competitions, such as TAC, ART and the various forms of RoboCup, are administered by committee, which spreads the burden and removes some of the moral hazard involved with designing and entering the same competition (not that I'm doubting your moral fortitude, Marty!).  Clearly those competitions are much more developed than the Lemonade game tournament is now, but if you agree with me that Lemonade game tournament has the potential to grow, then I hope you also agree that this would be a good way to make it happen.  I would be more than happy to get involved in running the tournament.
         
         
        3. Should the new competition be different than the present one? If so, how?
        I'll respond to Marty's suggestions:
         
        Choice 1: Number of rounds in a game.
        It seems that quite a few entrants were interested in a longer version of the game.  Personally, I like the shorter time frame, it forces the agents' learning to be quick (and dirty) and matches well with interaction in many domains -- not to forget that it also encourages entrants to think about how real people manage the task in such short periods of time.  We all have a good understanding of asymptotic learning, but clever heuristics for short time-frames seem to be less well investigated.  On the other hand, the unique problem of identifying collaborators presented in the lemonade game is a new problem for both long and short horizons, so why not have a competition for each?  Alternatively, the duration of the game could be randomly drawn at the start of each episode, or each round could have some small probability of being the game's last? 
         
        Choice 2: Continuous submission versus one deadline.
        I think that the "meta-game" induced by continuous submission would distract from the fundamental problems of collaboration presented by the lemonade game.  In contrast, the RoboCup Rescue agents championship has continuous submission, but the entries do not compete against each other, rather, they compete in a common, but unknown, scenario.  In that case, continuous submission do make sense.  However, in our case, where the entries directly compete against each other in a game situation, I feel a single deadline is best.
         
        Choice 3: Cheap talk
        Cheap talk would be fun! But maybe we hold off on introducing it until we have a firm handle on what works without it?  Or could we have two versions of the comp, with or without communication? Could we introduce some noise in the communication channel?  Like the duration of the game, the noise level could be randomly set at the start of the game.  As for semantics, joint action sets seem suitable and sufficient. 
         
        Choice 4: A fixed prespecified utility versus a randomly generated utility.
        I also think the current utility structure is good.  But it could be elaborated on, by adding more agents, possibly on a bigger island, or incorporating payoff asymmetries, say for different locations, which may not be known at the outset of the game -- these types of things present quite a challenge.  
         
        Choice 5: Continuum of actions versus discrete (and number?).
        A continuous action version of the Lemonade game is not a suggestion I would like to see adopted.  I feel that it would just add another level of computational difficulty, rather than make the problem any more interesting.  
         
        On other questions of tournament structure, i.e. round-robin vs knockout, cumulative payoffs vs # wins, etc, I have no strong opinion.  
         
        Marty says, 
        "I am not convinced that we have seen the best, most sophisticated agents for the competition in its present form.  But, as in mathematics, we may have to make the game more general in order to see simpler, more elegant solutions".  
        I know our bot is way too finely crafted to handle any of the potential new features discussed above, and I suspect some others' are too, so I can only whole-heartedly agree with Marty's sentiment.  However, at the same time, I don't think we should rush in and make the problem as general as possible straight away.  An incremental approach to expanding the game will give us time to learn, and, hopefully, lead to less fumbling in the dark for a solution when we do face some seriously complicated versions of the game.  We humans need time to learn, just like our bots!
         
        Personally, I would like to see few features included in the next version, with most left for future competitions.  In particular, I would suggest that varying the game's duration is the most suitable extension for the next tournament, as it is the most straight-forward and seems to be the extension with the greatest level of demand.  Such an extension will give us time to absorb the lessons of the first tournament.

        Archie

        --- On Fri, 8/1/10, Marty <martinzinkevich@...> wrote:

        From: Marty <martinzinkevich@...>
        Subject: [lemonadegame] A New Competition?
        To: lemonadegame@yahoogroups.com
        Date: Friday, 8 January, 2010, 20:46

         

        With the source code distributed and some early results distributed, at this time I would like to open a fairly broad discussion here both about the competition that we have had and about competitions we could have in the future.

        I propose three topics for discussion, which of course overlap.

        1. What have we learned from this competition?
        2. Should we have another competition? If so, when? What should the timeline be? Will you compete? Who should run the competition?
        3. Should the new competition be different than the present one? If so, how?

        I will begin the discussion by presenting my feelings about these topics. I encourage others to do the same. If we decide on a new competition, at some point we should shift over to a more directed discussion.

        1. What have we learned from this competition?
        Although there is a lot of data to be sifted through, and a lot of bots that I need to more thoroughly understand, I think my first impression of this competition is highlighted by the performance of the first three bots (Soton, Pujara, and RL3), and the last bot, Kuhlmann.

        In a gutsy move, Greg Kuhlmann took a nihilistic approach to the competition. Denying the existence of any structure to the game, he submitted a bot that played uniformly at random, arguing it should have as good a chance as any at winning. The interesting part about his point is that had other players thought similarly, he would have been right. In particular, if everyone else is playing randomly, you can play randomly too. I was hoping to avoid everyone submitting random bots, so I gave example bots that demonstrated how a random bot would lose if others used more sophisticated means. The random strategy in the real competition lost, and lost badly. I don't want to trivialize this result: it is a credit to people's faith in the intelligence of others. Without this, the game would have been very uninteresting. In fact, the game reminds me of that quote from Star Wars V about the cave on Dagoban:
        "That place…is strong with the dark side of the Force."
        "What's in there?"
        "Only what you take with you."
        Suffice it to say, a lot of people "brought it". :-)

        The second bot I wish to describe was Jay Pujara's bot. Pujara was a "modified constant" bot: it stayed in one place until it saw enough bad rounds there, at which point it moved to a new random spot. One point about his bot is that once it moves, it stays still for at least two more rounds: in other words, it is completely exploitable. This is not the tit-for-tat strategy of Axelrod's competition. It is highly exploitable: two bots in collusion who agreed first on who would sit clockwise versus counterclockwise, could get 9 in expectation every round Pujara's bot moves (by playing randomly), and then 11 points each when Pujara's bot stood still (by "sandwiching" him). No one was getting close to this performance.

        The bot currently in the lead, Soton (or EASquared) from the University of Southhampton and University College London, gained through "co-opetition" , cooperating with its competitors. In particular, it attempted to predict using three very loose models.
        Either a bot was "sticky" (moved very little distance or rarely far),
        or "followy" (tracked another agent's previous movements). Basically, Soton tried to see which bot was the easiest to predict, and then tried to be directly opposite that bot (in other words, "followy"). This strategy works well if one (or two) of the bots plays a constant strategy. Thus, Soton will work well in the presence of Pujara, as can be seen in the detailed results.

        RL3 submitted a bot that had a variety of different attacks. Like Soton, it could play opposite, but it could also initiate or participate in a sandwich attack. Apparently, this was ineffective against Pujara. Statistically speaking, RL3 played above average (more than eight points per round) in matches where Pujara was playing, but Pujara actually fared better in these matches. I could see three reasons for this, but this is still conjecture as I have not done a detailed analysis of their confrontation. The first is, judging from their description, RL3 engages in a sandwich attack only if it finds itself bumping heads with player B on the far side of the island from player A who is playing a constant strategy. It is very possible this just didn't happen that often. The second is that I didn't see anyone else describing anything resembling a sandwich attack: one important thing about this particular attack is that it doesn't arise from conventional game theoretic analysis (more on this later). Finally, RL3 (fearing a sandwich attack) begins by playing randomly. Given the inclination of a lot of players to play opposite a player that looks constant, playing randomly initially is probably not optimal.

        In summary, there are several "options" that people implemented:

        1. Play constant
        2. Play opposite someone
        3. Initiate a sandwich (play next to someone)
        4.. Complete a sandwich (play on the other side)

        I think there is a lot of work left deciding on which of these to try. EASquared had a metric which was interesting, a sort of distance from stickiness, which was the L1 distance moved (with the circle metric) between rounds discounted over time.

        I noticed that there were those who tried "kitchen sink" approaches, where they tried to mix between a variety of strategies. This did not seem to be very effective: I think that this highlights one principle of regret minimization. Your set of experts is like a belief, and a very vague belief will not do well in this short time period. I do not believe that this means that you have to do something hacky to decide which strategies to include. It isn't less principled to make decisions before the match begins if those decisions themselves are principled. Online learning during a match should be considered a phase of the "lifelong learning" of the bot. Otherwise, it is like someone who shows up to the World Series of Poker ready to learn if a straight beats a flush.

        In summary, I believe that there were some interesting strategies. In the poker competitions I have run before, I think that a good benchmark is when the competition is not won by a "rule-based" method. What "rule-based" mean is very much a matter of judgement. However, I think that this is the phase where this competition is.

        2. Should we have another competition? If so, when? What should the timeline be? Will you compete? Who should run the competition?

        Yes, I think we should have another competition. I think that the submission deadline should be either summer or November 31st (December 31st was a bad idea). It would be nice to at least have a rough idea of this competition by the end of the month (Jan 31st).

        Personally, I would like to compete. However, I am also very interested in seeing this competition go in the right direction, so I would like to participate in running it in some capacity. More people could join me in running it, though.

        3. Should the new competition be different than the present one? If so, how?

        As I said earlier, rule-based bots are winning. There is no significant automated tuning. To some degree, that indicates that there is research to be done, and there would be a benefit to keeping the rules as is. However, there are several different alternatives.

        Choice 1: Number of rounds in a game.
        There were at least two people who felt more rounds would make a qualitatively different game, and I agree. However, I think it may be less interesting as opposed to more. Nonetheless, I don't want to be tyrannical about this (or for that matter, anything), so 100 and 1000 have been options mentioned. 100 rounds forces people to come up with a plan of cooperation before the game begins.

        Choice 2: Continuous submission versus one deadline.
        In theory one could submit new bots (or updates to your current bot) over time. This could be done in either a runup to the competition, or as the competition itself. On the other hand, it brings up a fairly complex "meta-game" that may distract from the game at hand.

        Alternatively, one could imagine the bots themselves getting feedback. There could be virtual "days", and at the end of every day, the bots can change their strategies based upon the observed behavior of the last day.

        Choice 3: Cheap talk
        It is conceivable that some amount of cheap talk could be allowed. There could be a very primitive semantic, such as statements must be joint actions or sets of joint actions. Again, I think that people are managing coordination without cheap. However, what if instead of
        trying to sandwich, RL3 stated his sandwich plan as cheap talk? Would that drive more sophisticated cooperation?

        Choice 4: A fixed prespecified utility versus a randomly generated utility.

        First of all, let me say that I am pretty happy with the utility as is. The game seems to have a variety of unsolved problems, especially in terms of trying to figure out what other people are doing.

        However, the best strategies are rule-based. A randomly generated game would make that MUCH harder to do. My thought was to generate a random simultaneous action repeated game with three players, twelve actions that was:
        1. Symmetric (between players)
        2. Constant-sum

        These properties capture the basic properties of the Lemonade Game. What I am afraid of is that such a problem is still too hard. Also, a big question is prep time: how much time do you give each agent to learn the game before starting to play? Five minutes? An hour? A day? In poker, our bots were trained in self-play for weeks. Finally, this really begins to become an engineering problem, for better or worse.

        Note that Choice 3 and Choice 4 are related. Cheap talk could make random games more tractable.

        Choice 5: Continuum of actions versus discrete (and number?).
        This is a choice that I thought about.. I think that discrete actions just makes it easier coding-wise: a continuum can be approximated with say 1 million actions. Anyway, I think that this is a secondary issue, and should be decided after other major decisions are resolved.

        This set of choices is not exhaustive: it merely represents my thoughts and a selection of the thoughts of others I have heard. In conclusion, I am not convinced that we have seen the best, most sophisticated agents for the competition in its present form. But, as in mathematics, we may have to make the game more general in order to see simpler, more elegant solutions.


      Your message has been successfully submitted and would be delivered to recipients shortly.