Loading ...
Sorry, an error occurred while loading the content.
 

Using standard deviation to determine expected winning percentage

Expand Messages
  • monepeterson
    Forgive me if this has already been done, and as usual, pardon my inarticulate method of explaining all things mathematical. Standard deviation is something
    Message 1 of 6 , Feb 10, 2004
      Forgive me if this has already been done, and as usual, pardon my
      inarticulate method of explaining all things mathematical.

      Standard deviation is something I've been playing around with. I've
      been calculating "z-scores" (standard deviations above or below the
      mean -- the "standardize" function in Excel) of winning percentages
      and point differential for every BAA/NBA team just to see if I
      discovered anything interesting.

      I'm not sure I did, but one thing that occurred to me is that you
      could use the z-score for point differential to determine an expected
      winning percentage, rather than using an exponent of 13.5 (or whatever
      it is) which should be a mutable number depending on league conditions
      anyway. You could really do this for any level of basketball, or any
      sport for that matter without having to work out an exponent.

      You determine the point differential z-score for each team, multiply
      it by the standard deviation of winning percentage (.142 last year)
      and add it to the mean (.500) and you'll come up with an "Expected
      winning percentage" based on point differential. For example, here it
      is for the 2002-03 season. This will probably look like crap, but the
      columns are supposed to be: team, point differential per game, point
      differential z-score, expected win-loss record, and actual win-loss
      record (in paren).


      Team Pt Dif. Z-score Expected W-L (Actual W-L)
      ATL -3.6 -0.85 31-51 (35-47)
      BOS -0.4 -0.09 40-42 (44-38)
      CHI -5.1 -1.22 27-55 (30-52)
      CLE -9.6 -2.29 14-68 (17-65)
      DAL +7.8 +1.85 62-20 (60-22)
      DEN -8.3 -1.97 18-64 (17-65)
      DET +3.7 +0.88 51-31 (50-32)
      GSW -1.1 -0.27 38-44 (38-44)
      HOU +1.5 +0.35 45-37 (43-39)
      IND +3.5 +0.83 51-31 (48-34)
      LAC -4.1 -0.98 30-52 (27-55)
      LAL +2.3 +0.55 47-35 (50-32)
      MEM -3.2 -0.77 32-50 (28-54)
      MIA -5.0 -1.20 27-55 (25-57)
      MIL +0.2 +0.06 42-40 (42-40)
      MIN +2.1 +0.49 47-35 (51-31)
      NJN +5.2 +1.24 55-27 (49-33)
      NOH +2.1 +0.50 47-35 (47-35)
      NYK -1.4 -0.32 37-45 (37-45)
      ORL +0.1 +0.03 41-41 (42-40)
      PHI +2.3 +0.55 47-35 (48-34)
      PHO +1.1 +0.27 44-38 (44-38)
      POR +2.6 +0.62 48-34 (50-32)
      SAC +6.5 +1.55 59-23 (59-23)
      SAN +5.4 +1.29 56-26 (60-22)
      SEA -0.1 -0.03 41-41 (40-42)
      TOR -5.9 -1.40 25-57 (24-58)
      UTA +2.4 +0.57 48-34 (47-35)
      WSW -1.0 -0.24 38-44 (37-45)


      I would be curious to find out how this matches up with the 13.5
      exponent. So what do you think? Is there some bias that I'm missing
      that invalidates this method? Too much work for too little payoff?

      Moné
    • Michael Tamada
      It s a fine idea -- not sure that it will work out to be superior but it might. The basic issue is that although point differential has nice predictive value,
      Message 2 of 6 , Feb 10, 2004
        It's a fine idea -- not sure that it will work out to be superior
        but it might. The basic issue is that although point differential
        has nice predictive value, there are other, less linear, functional
        forms that are possible and which might provide better fit.

        Such as the points ratio, carried to some power. Or to looks
        at z-scores instead of raw points.

        This is somewhat following along the lines of what DeanO does with
        his "basketball's bell curve stuff", although instead of looking at
        z-scores per se he looks at points scored means and standard deviations,
        points allowed means and standard deviations, and perhaps most crucially,
        the covariance between them.

        That use of covariance means that he's taking into account a parameter
        that the other measures do not, and should in general get an extra degree
        of accuracy. But of course at the expense of having another parameter
        to have to look up and measure.

        Z-scores of course are based on means and standard deviations, so my guess
        is that this approach will be similar to DeanO's, except it looks at
        point differential instead of off pts and def pts separately, and it doesn't
        use covariance. As such it'll probably be less accurate than DeanO's measure,
        but easier to calculate and find the data for.

        How it'll compare to point differential or point ratios carried to a power,
        I don't know but it might have better accuracy, which would be a nice thing.


        In a separate post I excoriated Bill James for coining a new term for an
        old concept, the Plexiglas principle. But his Pythagorean formula is something
        which I have not seen done earlier, and it was a very nice innovation. The
        points-ratios-carried-to-a-power formulas are simply variations on his
        Pythagorean formula, so I give my props to him here. I believe that the
        Pythagorean formula with suitable exponents outperforms a simple point
        differential formula, it'll be interesting to see how the z-score approach does.


        --MKT

        -----Original Message-----
        From: monepeterson [mailto:mone@...]
        Sent: Tuesday, February 10, 2004 3:04 PM
        To: APBR_analysis@yahoogroups.com
        Subject: [APBR_analysis] Using standard deviation to determine expected
        winning percentage


        Forgive me if this has already been done, and as usual, pardon my
        inarticulate method of explaining all things mathematical.

        Standard deviation is something I've been playing around with. I've
        been calculating "z-scores" (standard deviations above or below the
        mean -- the "standardize" function in Excel) of winning percentages
        and point differential for every BAA/NBA team just to see if I
        discovered anything interesting.

        I'm not sure I did, but one thing that occurred to me is that you
        could use the z-score for point differential to determine an expected
        winning percentage, rather than using an exponent of 13.5 (or whatever
        it is) which should be a mutable number depending on league conditions
        anyway. You could really do this for any level of basketball, or any
        sport for that matter without having to work out an exponent.

        You determine the point differential z-score for each team, multiply
        it by the standard deviation of winning percentage (.142 last year)
        and add it to the mean (.500) and you'll come up with an "Expected
        winning percentage" based on point differential. For example, here it
        is for the 2002-03 season. This will probably look like crap, but the
        columns are supposed to be: team, point differential per game, point
        differential z-score, expected win-loss record, and actual win-loss
        record (in paren).


        Team Pt Dif. Z-score Expected W-L (Actual W-L)
        ATL -3.6 -0.85 31-51 (35-47)
        BOS -0.4 -0.09 40-42 (44-38)
        CHI -5.1 -1.22 27-55 (30-52)
        CLE -9.6 -2.29 14-68 (17-65)
        DAL +7.8 +1.85 62-20 (60-22)
        DEN -8.3 -1.97 18-64 (17-65)
        DET +3.7 +0.88 51-31 (50-32)
        GSW -1.1 -0.27 38-44 (38-44)
        HOU +1.5 +0.35 45-37 (43-39)
        IND +3.5 +0.83 51-31 (48-34)
        LAC -4.1 -0.98 30-52 (27-55)
        LAL +2.3 +0.55 47-35 (50-32)
        MEM -3.2 -0.77 32-50 (28-54)
        MIA -5.0 -1.20 27-55 (25-57)
        MIL +0.2 +0.06 42-40 (42-40)
        MIN +2.1 +0.49 47-35 (51-31)
        NJN +5.2 +1.24 55-27 (49-33)
        NOH +2.1 +0.50 47-35 (47-35)
        NYK -1.4 -0.32 37-45 (37-45)
        ORL +0.1 +0.03 41-41 (42-40)
        PHI +2.3 +0.55 47-35 (48-34)
        PHO +1.1 +0.27 44-38 (44-38)
        POR +2.6 +0.62 48-34 (50-32)
        SAC +6.5 +1.55 59-23 (59-23)
        SAN +5.4 +1.29 56-26 (60-22)
        SEA -0.1 -0.03 41-41 (40-42)
        TOR -5.9 -1.40 25-57 (24-58)
        UTA +2.4 +0.57 48-34 (47-35)
        WSW -1.0 -0.24 38-44 (37-45)


        I would be curious to find out how this matches up with the 13.5
        exponent. So what do you think? Is there some bias that I'm missing
        that invalidates this method? Too much work for too little payoff?

        Moné




        Yahoo! Groups Links
      • monepeterson
        ... Out of curiosity, why would point ratios be more accurate than point differential? I d think something like that would be severely impacted by game pace.
        Message 3 of 6 , Feb 10, 2004
          --- In APBR_analysis@yahoogroups.com, "Michael Tamada" <tamada@o...>
          wrote:
          > It's a fine idea -- not sure that it will work out to be superior
          > but it might. The basic issue is that although point differential
          > has nice predictive value, there are other, less linear, functional
          > forms that are possible and which might provide better fit.
          >
          > Such as the points ratio, carried to some power. Or to looks
          > at z-scores instead of raw points.

          Out of curiosity, why would point ratios be more accurate than point
          differential? I'd think something like that would be severely impacted
          by game pace. Winning 100-90 isn't equitable to winning 90-80. What
          justifies that?

          > This is somewhat following along the lines of what DeanO does with
          > his "basketball's bell curve stuff", although instead of looking at
          > z-scores per se he looks at points scored means and standard
          > deviations, points allowed means and standard deviations, and
          > perhaps most crucially, the covariance between them.

          Ah, interesting. Time to dig out the book again. Dean, do you use
          points per 100 possessions or raw points for the means and SDs (anyone
          who knows may answer)? I like the idea of using pts/possession for
          this sort of thing, because I'm thinking it would give you a better
          idea of how much of a team's success or failure to attribute to either
          end of the court.

          Now I just have to figure out how to covariance.

          Moné
        • Michael Tamada
          ... From: monepeterson [mailto:mone@sigma.net] Sent: Tuesday, February 10, 2004 8:10 PM ... wrote: [...] ... Oh it s not a guarantee that it s more accurate,
          Message 4 of 6 , Feb 10, 2004
            -----Original Message-----
            From: monepeterson [mailto:mone@...]
            Sent: Tuesday, February 10, 2004 8:10 PM


            --- In APBR_analysis@yahoogroups.com, "Michael Tamada" <tamada@o...>
            wrote:

            [...]

            >> Such as the points ratio, carried to some power. Or to looks
            >> at z-scores instead of raw points.
            >
            >Out of curiosity, why would point ratios be more accurate than point
            >differential? I'd think something like that would be severely impacted
            >by game pace. Winning 100-90 isn't equitable to winning 90-80. What
            >justifies that?

            Oh it's not a guarantee that it's more accurate, just as there's no
            guarantee that the z-score approach will be more, or less, accurate
            than the alternatives.

            I'd imagine it's a question of which games are the most important
            or influential in determining a team's points scored and allowed
            stats, compared to their won-loss record. E.g. a 75-60 win
            is probably more similar to a 120-96 win than it is to a 120-105 win.
            If so, then ratios are better than point differentials. On the
            other hand, a 72-70 win is probably more similar to a 107-105 win
            than to a 108-105 win. In which case, point differentials are
            better than ratios.

            Where would z-scores fit? I don't know, maybe it'd be a happy
            medium, with better accuracy than either technique. Or maybe
            not. It'd be an interesting comparison (point differential vs
            Pythagorean vs z-score). I seem to recall that DeanO did some
            comparisons already, but I forget the results.


            --MKT
          • Michael Tamada
            I should clarify that by similar I mean similar in terms of telling us about the likely true strength differential between the two teams . --MKT ... From:
            Message 5 of 6 , Feb 10, 2004
              I should clarify that by "similar" I mean "similar in terms of
              telling us about the likely true strength differential between
              the two teams".

              --MKT


              -----Original Message-----
              From: Michael Tamada
              Sent: Tuesday, February 10, 2004 8:26 PM


              --- In APBR_analysis@yahoogroups.com, "Michael Tamada" <tamada@o...>
              wrote:

              [...]

              > E.g. a 75-60 win
              is probably more similar to a 120-96 win than it is to a 120-105 win.
              If so, then ratios are better than point differentials. On the
              other hand, a 72-70 win is probably more similar to a 107-105 win
              than to a 108-105 win. In which case, point differentials are
              better than ratios.
            • Dean Oliver
              ... Both work equivalently. These days, I prefer using the offensive and defensive ratings because pace adds a significant correlation to pts and dpts. That
              Message 6 of 6 , Feb 12, 2004
                --- In APBR_analysis@yahoogroups.com, "monepeterson" <mone@s...> wrote:
                > --- In APBR_analysis@yahoogroups.com, "Michael Tamada" <tamada@o...>

                > > This is somewhat following along the lines of what DeanO does with
                > > his "basketball's bell curve stuff", although instead of looking at
                > > z-scores per se he looks at points scored means and standard
                > > deviations, points allowed means and standard deviations, and
                > > perhaps most crucially, the covariance between them.
                >
                > Ah, interesting. Time to dig out the book again. Dean, do you use
                > points per 100 possessions or raw points for the means and SDs (anyone
                > who knows may answer)?

                Both work equivalently. These days, I prefer using the offensive and
                defensive ratings because pace adds a significant correlation to pts
                and dpts. That then hides whether turnovers lead to points or whether
                offensive rebounds hurt a defense (which both are rather important to
                know for teams).

                DeanO
                www.basketballonpaper.com
                "Dean Oliver looks at basketball with a fresh perspective. If you
                want a new way to analyze the game, this book is for you. You'll
                never watch a game the same way again. We use his stuff and it helps
                us." Yvan Kelly, Scout, Seattle SuperSonics
              Your message has been successfully submitted and would be delivered to recipients shortly.