Loading ...
Sorry, an error occurred while loading the content.

Evolving "Picking Soccer Match Winners" Strategies

Expand Messages
  • Mattias Fagerlund
    Hi, I started this as a part of the stock market email, but I decided to create a separate email; I recently finished a run of experiments on predicting soccer
    Message 1 of 6 , Jan 11, 2005
    • 0 Attachment
      Hi,

      I started this as a part of the stock market email, but I decided to
      create a separate email;

      I recently finished a run of experiments on predicting soccer games, and
      it performed "admirably", picking the winner in a fair percentage of the
      matches. home/draw/away usually has a spread of about 50%/25%/25%. My
      system was able to pick games where the away team wins over 60% of the
      time, which is a nice improvement over 25%. However;

      The downside is that when the strategy was able to pick a winner, it was
      usually very poor odds (1.5 times the money or less on average) and in
      the
      end, the system couldn't be profitable. If you're able to predict the
      result (home, draw, away) 67% of the time, the payback must be 1.5 to
      break even. And the more predictable games are, the lower the odds are on
      the winner - that's what odds are. At best, a strategy would be able to
      find errors in the odds makers strategies which is fairly hard.

      Further, if you invest 100% of your capital on each predictable game, one
      loss will wipe out all your funds. So I played 20% of the (simulated)
      capital per game. While this limits the downside, it severely limits the
      upside.

      Sadly, it wasn't as good picking winners out of bounds as it was in
      bounds, which kinda killed the usefulness even if it had been able to
      predict profitably.

      Has anyone else done any work in this vein?

      cheers,
      mattias
    • mike woodhouse
      On Tue, 11 Jan 2005 11:08:50 +0100, Mattias Fagerlund ... Ah - my other (and more recent) hobby-horse! ... 20% is a scary number. Bear in mind that you may
      Message 2 of 6 , Jan 11, 2005
      • 0 Attachment
        On Tue, 11 Jan 2005 11:08:50 +0100, Mattias Fagerlund
        <mattias@...> wrote:
        >
        > Hi,
        >
        > I started this as a part of the stock market email, but I decided to
        > create a separate email;
        >
        > I recently finished a run of experiments on predicting soccer games, and
        > it performed "admirably", picking the winner in a fair percentage of the
        > matches. home/draw/away usually has a spread of about 50%/25%/25%. My
        > system was able to pick games where the away team wins over 60% of the
        > time, which is a nice improvement over 25%. However;
        >
        > The downside is that when the strategy was able to pick a winner, it was
        > usually very poor odds (1.5 times the money or less on average) and in
        > the end, the system couldn't be profitable. If you're able to predict the
        > result (home, draw, away) 67% of the time, the payback must be 1.5 to
        > break even. And the more predictable games are, the lower the odds are on
        > the winner - that's what odds are. At best, a strategy would be able to
        > find errors in the odds makers strategies which is fairly hard.

        Ah - my other (and more recent) hobby-horse!

        > Further, if you invest 100% of your capital on each predictable game, one
        > loss will wipe out all your funds. So I played 20% of the (simulated)
        > capital per game. While this limits the downside, it severely limits the
        > upside.

        20% is a scary number. Bear in mind that you may have simultaneous
        bets running. What if 40 games kick off at 3pm on Saturday and your
        system wants to bet on 10 of them?

        If you can correctly estimate your "edge" then the Kelly Criterion can
        be applied to determine stake size. I rather like the idea of having
        the model tell me how much it thinks I should bet (or tell me its
        estimate of the edge and I'll work out how much to bet).

        Say you estimate the likelihood of an away win at 70% and the bookies
        have it at 60% (odds are 1.67). You have an edge of about 16% and
        Kelly would suggest you bet an amount that increases your bank by that
        amount: about 25% in this case. A more conservative measure might be
        to bet half that amount (a "Half-Kelly" staking plan).

        But the bookies are very accurate in pricing Home-Draw-Away results:
        if your model is predicting a large mispricing, be careful: the model
        may be wrong.

        > Sadly, it wasn't as good picking winners out of bounds as it was in
        > bounds, which kinda killed the usefulness even if it had been able to
        > predict profitably.
        >
        > Has anyone else done any work in this vein?

        I did have some success with identifying matches where the likelihood
        of half-time/full-time draw-draw was greater than implied by the
        bookmakers' odds: it was profitable the first season I operated it
        (for small stakes while I tested it) and blew up spectacularly the
        following season. If the correct scores market were more competitively
        priced then I think it could be profitable: 1-1 draws would probably
        be the place to start as they're the most frequent result (in English
        games at least). There's a colossal overround in that market though,
        which is hard to overcome. There may be some advantage to using the
        exchanges over the fixed-odds bookmakers. I do know of at least one
        guy who has a Ward Systems Predictor-based system that just takes the
        last six scores for each team. He seems to get some promising results,
        although I don't think he's made his first million yet...

        I was evolving the weights on a fixed-topology network working from
        heavily-preprocessed data. I also played with a version of GEP
        (interesting but fruitless). I think NEAT is massively attractive
        here: topology can vary and a larger range of input data can be
        presented (I'd be inclined to start with an un- or sparsely-connected
        network and let the program work out what's useful).

        As with the stock market stuff, this is a topic about which I can go
        on and on and on.... So I'd better stop for now. If you're brave
        enough to want to ask questions I am more than happy to bore you some
        more!

        > cheers,
        > mattias
      • Kenneth Stanley
        ... sparsely-connected ... Mike, from recent work here, starting sparsely looks like a really good idea. We are working on a paper right now that reports
        Message 3 of 6 , Jan 11, 2005
        • 0 Attachment
          --- In neat@yahoogroups.com, mike woodhouse <mikewoodhouse@g...>
          > (interesting but fruitless). I think NEAT is massively attractive
          > here: topology can vary and a larger range of input data can be
          > presented (I'd be inclined to start with an un- or
          sparsely-connected
          > network and let the program work out what's useful).
          >

          Mike, from recent work here, starting sparsely looks like a really
          good idea. We are working on a paper right now that reports great
          results from starting this way as opposed to starting fully-connected.
          An earlier workshop paper from last year on the topic is here:

          http://nn.cs.utexas.edu/keyword?whiteson:geccows04

          This year's paper simplifies the technique and shows more definitive
          results in a more significant domain. So I'd say your inclination is
          promising.

          ken
        • mike woodhouse
          On Wed, 12 Jan 2005 00:01:33 -0000, Kenneth Stanley ... I m delighted to read it :) I confess to being largely motivated by laziness - there are problems where
          Message 4 of 6 , Jan 12, 2005
          • 0 Attachment
            On Wed, 12 Jan 2005 00:01:33 -0000, Kenneth Stanley
            <kstanley@...> wrote:
            >
            >
            > --- In neat@yahoogroups.com, mike woodhouse <mikewoodhouse@g...>
            > > (interesting but fruitless). I think NEAT is massively attractive
            > > here: topology can vary and a larger range of input data can be
            > > presented (I'd be inclined to start with an un- or
            > sparsely-connected
            > > network and let the program work out what's useful).
            > >
            >
            > Mike, from recent work here, starting sparsely looks like a really
            > good idea. We are working on a paper right now that reports great
            > results from starting this way as opposed to starting fully-connected.
            > An earlier workshop paper from last year on the topic is here:

            I'm delighted to read it :)

            I confess to being largely motivated by laziness - there are problems
            where I can conceive of a large number of possible input that may be:
            "raw", normalised or more expensively derived (e.g. decorrelated
            pairs, or other stuff I didn't properly understand). I didn't want to
            have to perform the up-front analysis to figure out what was most
            appropriate: I just don't have the time what with day job, kids, those
            kind of things. But I do have computers and they do have time (less of
            it now the kids are old enough to use a mouse, but that's another
            story). So the sparse (presumably, in the limit we could start with no
            links, but I assume a sensible starting point is to link one input to
            one output and let NEAT take it from there). Over time, the model
            should be able to decide for itself what is useful and what's not. And
            of course, the early runs are going to be insanely fast!

            I lean towards "constructive" (?) - I'll go to a lot of effort to save
            myself work... I seek to expend more energy up-front creating models
            that are able to do more of that tedious searching. If it takes them a
            week to come up with a solution, that's cool: I'll QA the results when
            I get them. In the meantime I can be doing something else.

            The thing that most excites me about NEAT is the additional potential
            for laziness: I no longer have to expend energy cooking up and
            tweaking network topologies. Having never ventured into recurrent
            networks before, I'm also now seeing some potential uses for "memory"
            as provided in NEAT.

            Now if I could just get the kids off the PC long enough to get some
            code written...


            > http://nn.cs.utexas.edu/keyword?whiteson:geccows04
            >
            > This year's paper simplifies the technique and shows more definitive
            > results in a more significant domain. So I'd say your inclination is
            > promising.
            >
            > ken
            >
            >
            >
            > Yahoo! Groups Links
            >
            >
            >
            >
            >
          • Mattias Fagerlund
            Heya, ... Really? Tell me you re also into evolving physically simulated critters and I ll start getting nervous... ... My system wasn t confident all that
            Message 5 of 6 , Jan 12, 2005
            • 0 Attachment
              Heya,

              > > find errors in the odds makers strategies which is fairly hard.
              >
              > Ah - my other (and more recent) hobby-horse!

              Really? Tell me you're also into evolving physically simulated critters
              and I'll start getting nervous...

              > > Further, if you invest 100% of your capital on each predictable game,
              > one
              > > loss will wipe out all your funds. So I played 20% of the (simulated)
              > > capital per game. While this limits the downside, it severely limits
              > the
              > > upside.
              >
              > 20% is a scary number. Bear in mind that you may have simultaneous
              > bets running. What if 40 games kick off at 3pm on Saturday and your
              > system wants to bet on 10 of them?

              My system wasn't confident all that often, more like a game per week it
              would suggest a gamble. The rest of the time, it'd recommend staying out.

              > If you can correctly estimate your "edge" then the Kelly Criterion can
              > be applied to determine stake size. I rather like the idea of having
              > the model tell me how much it thinks I should bet (or tell me its
              > estimate of the edge and I'll work out how much to bet).
              >
              > Say you estimate the likelihood of an away win at 70% and the bookies
              > have it at 60% (odds are 1.67). You have an edge of about 16% and
              > Kelly would suggest you bet an amount that increases your bank by that
              > amount: about 25% in this case. A more conservative measure might be
              > to bet half that amount (a "Half-Kelly" staking plan).

              Ah, clever.

              > But the bookies are very accurate in pricing Home-Draw-Away results:
              > if your model is predicting a large mispricing, be careful: the model
              > may be wrong.

              It usually is - when out of bounds that is ;)

              > > Sadly, it wasn't as good picking winners out of bounds as it was in
              > > bounds, which kinda killed the usefulness even if it had been able to
              > > predict profitably.
              > >
              > > Has anyone else done any work in this vein?
              >
              > I did have some success with identifying matches where the likelihood
              > of half-time/full-time draw-draw was greater than implied by the
              > bookmakers' odds: it was profitable the first season I operated it
              > (for small stakes while I tested it) and blew up spectacularly the
              > following season. If the correct scores market were more competitively
              > priced then I think it could be profitable: 1-1 draws would probably
              > be the place to start as they're the most frequent result (in English
              > games at least).

              Right, I have idea what the odds are on those. I looked up a few games on
              ladbrokes, the odds seem to be around 5.5 to 6.5 for the games that I
              reviewed, indicating that about 17% of the games end 1-1. If you could
              pick games that end 1-1 with better frequency than 17%, you'd be in the
              money. I'll give it a shot see what happens.

              > There's a colossal overround in that market though,
              > which is hard to overcome.

              Too bad :(

              > There may be some advantage to using the
              > exchanges over the fixed-odds bookmakers. I do know of at least one
              > guy who has a Ward Systems Predictor-based system that just takes the
              > last six scores for each team. He seems to get some promising results,
              > although I don't think he's made his first million yet...

              I'm not sure what you mean by "last six scores", I tried the following
              indicators

              (results=home draw or away)
              results in the last 3 games for home team,
              results in the last 3 games for away team,

              results in the last 3 HOME games for home team,
              results in the last 3 AWAY games for away team,

              results in the last 3 games for home vs away

              Next I tried goal differences where I counted;

              (home goals scored-home goals conceded) - (away goals scored-(away goals
              conceded)

              and related indicators. One problem is that I don't have league positions
              in my dataset, so a goal conceded to the top team counts as much as a
              goal conceded to the bottom team. I could calculate my own "league score"
              from the last 10 games per team or some such, but I never got to that.

              > I was evolving the weights on a fixed-topology network working from
              > heavily-preprocessed data.

              I could send you my XML dataset for training, it contains a few thousand
              games with results, my indicators and the best odds that were available -
              if you're interested.

              > As with the stock market stuff, this is a topic about which I can go
              > on and on and on.... So I'd better stop for now. If you're brave
              > enough to want to ask questions I am more than happy to bore you some
              > more!

              Well, as long as you don't actually talk about football, I'm very
              interested. If sweden made it to a combined Olympics and world games
              final, I _might_ watch parts of the game, but I'd surely not enjoy it much ;)

              cheers,
              mattias
            • mike woodhouse
              ... Nah. I can sit and watch someone else s code evolving for a while, but that s as far as it goes. No money in it... :) ... I don t think league position is
              Message 6 of 6 , Jan 12, 2005
              • 0 Attachment
                > Really? Tell me you're also into evolving physically simulated critters
                > and I'll start getting nervous...

                Nah. I can sit and watch someone else's code evolving for a while, but
                that's as far as it goes. No money in it... :)

                > and related indicators. One problem is that I don't have league positions
                > in my dataset, so a goal conceded to the top team counts as much as a
                > goal conceded to the bottom team. I could calculate my own "league score"
                > from the last 10 games per team or some such, but I never got to that.

                I don't think league position is as good a measure as average points
                per game. I'd take it from the start of the season and also for a
                recent (5? 6?) number of games. A linear regression on season
                pts-per-game against bookie's odds (expressed as a probability) will
                give a surprisingly good fit on its own.

                I developed a couple of rating systems, loosely based on the chess Elo
                rating system, the idea being to take account of the strength of the
                opposition: a home draw is generally considered slightly negative, but
                if the team at the bottom of the league manages to draw with the top
                team then that probably ought to be considered a positive. Using the
                difference or ratio of the two teams' current ratings also gave a good
                fit to bookie odds and allowed the addition of rating derivatives:
                trend, for example, that I hoped might help the model to spot teams
                whose improvement was not factored into their price. I think I have
                some Delphi code that calculated inputs (to my GEP implementation)
                that may cast some light (if you promise not to laugh at my somewhat
                confused OO, not to mention my inexperience with Object Pascal...)

                I don't have them to hand at the moment but I'll dig out my notes on
                my latest (100% virtual) ideas, inspired by the combination of part
                failures (!) and the innovations available in NEAT.

                > Well, as long as you don't actually talk about football, I'm very
                > interested. If sweden made it to a combined Olympics and world games
                > final, I _might_ watch parts of the game, but I'd surely not enjoy it much ;)

                While I do actually attend matches, my conversation about the game
                itself is mostly limited to my own club, which makes it a minority
                interest!

                Mike
              Your message has been successfully submitted and would be delivered to recipients shortly.