Using standard deviation to determine expected winning percentage

Expand Messages
• Forgive me if this has already been done, and as usual, pardon my inarticulate method of explaining all things mathematical. Standard deviation is something
Message 1 of 6 , Feb 10, 2004
Forgive me if this has already been done, and as usual, pardon my
inarticulate method of explaining all things mathematical.

Standard deviation is something I've been playing around with. I've
been calculating "z-scores" (standard deviations above or below the
mean -- the "standardize" function in Excel) of winning percentages
and point differential for every BAA/NBA team just to see if I
discovered anything interesting.

I'm not sure I did, but one thing that occurred to me is that you
could use the z-score for point differential to determine an expected
winning percentage, rather than using an exponent of 13.5 (or whatever
it is) which should be a mutable number depending on league conditions
anyway. You could really do this for any level of basketball, or any
sport for that matter without having to work out an exponent.

You determine the point differential z-score for each team, multiply
it by the standard deviation of winning percentage (.142 last year)
and add it to the mean (.500) and you'll come up with an "Expected
winning percentage" based on point differential. For example, here it
is for the 2002-03 season. This will probably look like crap, but the
columns are supposed to be: team, point differential per game, point
differential z-score, expected win-loss record, and actual win-loss
record (in paren).

Team Pt Dif. Z-score Expected W-L (Actual W-L)
ATL -3.6 -0.85 31-51 (35-47)
BOS -0.4 -0.09 40-42 (44-38)
CHI -5.1 -1.22 27-55 (30-52)
CLE -9.6 -2.29 14-68 (17-65)
DAL +7.8 +1.85 62-20 (60-22)
DEN -8.3 -1.97 18-64 (17-65)
DET +3.7 +0.88 51-31 (50-32)
GSW -1.1 -0.27 38-44 (38-44)
HOU +1.5 +0.35 45-37 (43-39)
IND +3.5 +0.83 51-31 (48-34)
LAC -4.1 -0.98 30-52 (27-55)
LAL +2.3 +0.55 47-35 (50-32)
MEM -3.2 -0.77 32-50 (28-54)
MIA -5.0 -1.20 27-55 (25-57)
MIL +0.2 +0.06 42-40 (42-40)
MIN +2.1 +0.49 47-35 (51-31)
NJN +5.2 +1.24 55-27 (49-33)
NOH +2.1 +0.50 47-35 (47-35)
NYK -1.4 -0.32 37-45 (37-45)
ORL +0.1 +0.03 41-41 (42-40)
PHI +2.3 +0.55 47-35 (48-34)
PHO +1.1 +0.27 44-38 (44-38)
POR +2.6 +0.62 48-34 (50-32)
SAC +6.5 +1.55 59-23 (59-23)
SAN +5.4 +1.29 56-26 (60-22)
SEA -0.1 -0.03 41-41 (40-42)
TOR -5.9 -1.40 25-57 (24-58)
UTA +2.4 +0.57 48-34 (47-35)
WSW -1.0 -0.24 38-44 (37-45)

I would be curious to find out how this matches up with the 13.5
exponent. So what do you think? Is there some bias that I'm missing
that invalidates this method? Too much work for too little payoff?

Moné
• It s a fine idea -- not sure that it will work out to be superior but it might. The basic issue is that although point differential has nice predictive value,
Message 2 of 6 , Feb 10, 2004
It's a fine idea -- not sure that it will work out to be superior
but it might. The basic issue is that although point differential
has nice predictive value, there are other, less linear, functional
forms that are possible and which might provide better fit.

Such as the points ratio, carried to some power. Or to looks
at z-scores instead of raw points.

This is somewhat following along the lines of what DeanO does with
z-scores per se he looks at points scored means and standard deviations,
points allowed means and standard deviations, and perhaps most crucially,
the covariance between them.

That use of covariance means that he's taking into account a parameter
that the other measures do not, and should in general get an extra degree
of accuracy. But of course at the expense of having another parameter
to have to look up and measure.

Z-scores of course are based on means and standard deviations, so my guess
is that this approach will be similar to DeanO's, except it looks at
point differential instead of off pts and def pts separately, and it doesn't
use covariance. As such it'll probably be less accurate than DeanO's measure,
but easier to calculate and find the data for.

How it'll compare to point differential or point ratios carried to a power,
I don't know but it might have better accuracy, which would be a nice thing.

In a separate post I excoriated Bill James for coining a new term for an
old concept, the Plexiglas principle. But his Pythagorean formula is something
which I have not seen done earlier, and it was a very nice innovation. The
points-ratios-carried-to-a-power formulas are simply variations on his
Pythagorean formula, so I give my props to him here. I believe that the
Pythagorean formula with suitable exponents outperforms a simple point
differential formula, it'll be interesting to see how the z-score approach does.

--MKT

-----Original Message-----
From: monepeterson [mailto:mone@...]
Sent: Tuesday, February 10, 2004 3:04 PM
To: APBR_analysis@yahoogroups.com
Subject: [APBR_analysis] Using standard deviation to determine expected
winning percentage

Forgive me if this has already been done, and as usual, pardon my
inarticulate method of explaining all things mathematical.

Standard deviation is something I've been playing around with. I've
been calculating "z-scores" (standard deviations above or below the
mean -- the "standardize" function in Excel) of winning percentages
and point differential for every BAA/NBA team just to see if I
discovered anything interesting.

I'm not sure I did, but one thing that occurred to me is that you
could use the z-score for point differential to determine an expected
winning percentage, rather than using an exponent of 13.5 (or whatever
it is) which should be a mutable number depending on league conditions
anyway. You could really do this for any level of basketball, or any
sport for that matter without having to work out an exponent.

You determine the point differential z-score for each team, multiply
it by the standard deviation of winning percentage (.142 last year)
and add it to the mean (.500) and you'll come up with an "Expected
winning percentage" based on point differential. For example, here it
is for the 2002-03 season. This will probably look like crap, but the
columns are supposed to be: team, point differential per game, point
differential z-score, expected win-loss record, and actual win-loss
record (in paren).

Team Pt Dif. Z-score Expected W-L (Actual W-L)
ATL -3.6 -0.85 31-51 (35-47)
BOS -0.4 -0.09 40-42 (44-38)
CHI -5.1 -1.22 27-55 (30-52)
CLE -9.6 -2.29 14-68 (17-65)
DAL +7.8 +1.85 62-20 (60-22)
DEN -8.3 -1.97 18-64 (17-65)
DET +3.7 +0.88 51-31 (50-32)
GSW -1.1 -0.27 38-44 (38-44)
HOU +1.5 +0.35 45-37 (43-39)
IND +3.5 +0.83 51-31 (48-34)
LAC -4.1 -0.98 30-52 (27-55)
LAL +2.3 +0.55 47-35 (50-32)
MEM -3.2 -0.77 32-50 (28-54)
MIA -5.0 -1.20 27-55 (25-57)
MIL +0.2 +0.06 42-40 (42-40)
MIN +2.1 +0.49 47-35 (51-31)
NJN +5.2 +1.24 55-27 (49-33)
NOH +2.1 +0.50 47-35 (47-35)
NYK -1.4 -0.32 37-45 (37-45)
ORL +0.1 +0.03 41-41 (42-40)
PHI +2.3 +0.55 47-35 (48-34)
PHO +1.1 +0.27 44-38 (44-38)
POR +2.6 +0.62 48-34 (50-32)
SAC +6.5 +1.55 59-23 (59-23)
SAN +5.4 +1.29 56-26 (60-22)
SEA -0.1 -0.03 41-41 (40-42)
TOR -5.9 -1.40 25-57 (24-58)
UTA +2.4 +0.57 48-34 (47-35)
WSW -1.0 -0.24 38-44 (37-45)

I would be curious to find out how this matches up with the 13.5
exponent. So what do you think? Is there some bias that I'm missing
that invalidates this method? Too much work for too little payoff?

Moné

• ... Out of curiosity, why would point ratios be more accurate than point differential? I d think something like that would be severely impacted by game pace.
Message 3 of 6 , Feb 10, 2004
wrote:
> It's a fine idea -- not sure that it will work out to be superior
> but it might. The basic issue is that although point differential
> has nice predictive value, there are other, less linear, functional
> forms that are possible and which might provide better fit.
>
> Such as the points ratio, carried to some power. Or to looks
> at z-scores instead of raw points.

Out of curiosity, why would point ratios be more accurate than point
differential? I'd think something like that would be severely impacted
by game pace. Winning 100-90 isn't equitable to winning 90-80. What
justifies that?

> This is somewhat following along the lines of what DeanO does with
> z-scores per se he looks at points scored means and standard
> deviations, points allowed means and standard deviations, and
> perhaps most crucially, the covariance between them.

Ah, interesting. Time to dig out the book again. Dean, do you use
points per 100 possessions or raw points for the means and SDs (anyone
who knows may answer)? I like the idea of using pts/possession for
this sort of thing, because I'm thinking it would give you a better
idea of how much of a team's success or failure to attribute to either
end of the court.

Now I just have to figure out how to covariance.

Moné
• ... From: monepeterson [mailto:mone@sigma.net] Sent: Tuesday, February 10, 2004 8:10 PM ... wrote: [...] ... Oh it s not a guarantee that it s more accurate,
Message 4 of 6 , Feb 10, 2004
-----Original Message-----
From: monepeterson [mailto:mone@...]
Sent: Tuesday, February 10, 2004 8:10 PM

wrote:

[...]

>> Such as the points ratio, carried to some power. Or to looks
>> at z-scores instead of raw points.
>
>Out of curiosity, why would point ratios be more accurate than point
>differential? I'd think something like that would be severely impacted
>by game pace. Winning 100-90 isn't equitable to winning 90-80. What
>justifies that?

Oh it's not a guarantee that it's more accurate, just as there's no
guarantee that the z-score approach will be more, or less, accurate
than the alternatives.

I'd imagine it's a question of which games are the most important
or influential in determining a team's points scored and allowed
stats, compared to their won-loss record. E.g. a 75-60 win
is probably more similar to a 120-96 win than it is to a 120-105 win.
If so, then ratios are better than point differentials. On the
other hand, a 72-70 win is probably more similar to a 107-105 win
than to a 108-105 win. In which case, point differentials are
better than ratios.

Where would z-scores fit? I don't know, maybe it'd be a happy
medium, with better accuracy than either technique. Or maybe
not. It'd be an interesting comparison (point differential vs
Pythagorean vs z-score). I seem to recall that DeanO did some
comparisons already, but I forget the results.

--MKT
• I should clarify that by similar I mean similar in terms of telling us about the likely true strength differential between the two teams . --MKT ... From:
Message 5 of 6 , Feb 10, 2004
I should clarify that by "similar" I mean "similar in terms of
telling us about the likely true strength differential between
the two teams".

--MKT

-----Original Message-----
Sent: Tuesday, February 10, 2004 8:26 PM

wrote:

[...]

> E.g. a 75-60 win
is probably more similar to a 120-96 win than it is to a 120-105 win.
If so, then ratios are better than point differentials. On the
other hand, a 72-70 win is probably more similar to a 107-105 win
than to a 108-105 win. In which case, point differentials are
better than ratios.
• ... Both work equivalently. These days, I prefer using the offensive and defensive ratings because pace adds a significant correlation to pts and dpts. That
Message 6 of 6 , Feb 12, 2004
--- In APBR_analysis@yahoogroups.com, "monepeterson" <mone@s...> wrote:

> > This is somewhat following along the lines of what DeanO does with
> > his "basketball's bell curve stuff", although instead of looking at
> > z-scores per se he looks at points scored means and standard
> > deviations, points allowed means and standard deviations, and
> > perhaps most crucially, the covariance between them.
>
> Ah, interesting. Time to dig out the book again. Dean, do you use
> points per 100 possessions or raw points for the means and SDs (anyone

Both work equivalently. These days, I prefer using the offensive and
defensive ratings because pace adds a significant correlation to pts
and dpts. That then hides whether turnovers lead to points or whether
offensive rebounds hurt a defense (which both are rather important to
know for teams).

DeanO