## My version of WINVAL

Expand Messages
• Previously, I have started a discussion with DeanO, Kevin, and Roland about WINVAL, but I would like to take the opportunity to open this conversation up to
Message 1 of 30 , Feb 24, 2004
Previously, I have started a discussion with DeanO, Kevin, and
Roland about WINVAL, but I would like to take the opportunity to
open this conversation up to the rest of you.

Roland Beech of 82games has been kind enough to provide me with data
on the outcome and players playing in every second of every regular
season game in the 2002-03 season. With these data one can compute
the plus/minus statistics that Roland reports on his web-site. In
statistics that account for the quality of the players on the floor
for both teams. This is analagous to the WINVAL system that Jeff
Sagarin and Wayne Winston have created for the Dallas Mavericks at a
cost of more than \$100,000 per year.

Here is the set-up. Every observation is a unit of time in a game
where no substitutions are made. There are more the 30,000 such
observations in 2002-03. With these data I run the following
regression.

(1.1) MARGIN = B0 + B1*X1 + B2*X2 + ... + BK*XK + U, where

MARGIN = home team points - away team points

(MARGIN is measured during the period of time over which there is no
substitution made. This MARGIN is normalized so that is measures
the point difference per 40 minutes.)

X1 = 1 if player 1 is playing at home
__ = -1 if player 1 is playing away
__ = 0 if player 1 is not playing
XK = 1 if player K is playing at home
__ = -1 if player K is playing away
__ = 0 if player K is not playing

U = i.i.d. error term

B0 = measures average home court advantage across all teams
B1 = measures the difference between player 1 and the omitted players
BK = measures the difference between player K and the omitted players

In this case the omitted players are all players playing less than
513 minutes in 2002-03, a group that represents the bottom 4 percent
of minutes played in 2002-03.

Observations are weighted by the number of seconds of the particular
observation to get my first estimate (WV1). I also report a second
crunch time and less weight during garbage time.

For comparison purposes, I also report the results for an index that
is normalized to have the same mean and standard deviation as WV1
and WV2 (WV2 is normalized to be the same as WV1). This index
(before normalization) is the following.

INDEX40 = [40/MIN] * [2*(ST-TO) + 1.5*(PTS+AS+BK) + 1*(RB-FGA-PF) -
0.44*FTA]

Here are the results that I get from this process.

http://www.uncg.edu/bae/people/rosenbaum/NBA/wv1.lst~

After this process, I regress the WV1 estimates onto various per 40
minute statistics. At the very end, I run some tests that I will
discuss later.

Well, this is probably enough of description for now. In my next
post, I will discuss what I think we can learn from these adjusted
plus/minus statistics. (But now I need to head to the store.)

Best wishes,
Dan
• There are a couple of things I forgot to mention about my previous results. 1. They are different than those results that I had reported to DeanO, Kevin, and
Message 2 of 30 , Feb 24, 2004
There are a couple of things I forgot to mention about my previous
results.

1. They are different than those results that I had reported to
DeanO, Kevin, and Roland earlier - mostly because I have changed the
omitted group.

2. I have not reported a standard error for my index, but a standard
error of 1 is a good approximation for all but the handful of
players playing less than 600 minutes per season.
• There are several issues to consider about these adjusted plus/minus statistics, but one I would hope not to spend forever on is whether or not I am exactly
Message 3 of 30 , Feb 24, 2004
statistics, but one I would hope not to spend forever on is whether
or not I am exactly replicating what Winston and Sagarin are doing.
I am sure that I am not, but other than the fact that they have more
years of data at their disposal (which in this case is an important
difference), I highly doubt what they are doing is using the data in
a significantly more efficient manner than I am. What they are
doing is surely different (and the results are probably quite a bit
different due to the noisiness of this methodology), but the general
theme of my work has to be very close to theirs.

So let me talk about a few issues related to these adjusted
plus/minus statistics.

1. The single most important feature of these results is how noisy
the estimates are. Relative to my Tendex-like index, the standard
errors for these adjusted plus/minus statistics are 3.5 to 5.5 times
larger. What that means is that the precision in these adjusted
plus/minus statistics in a whole season is about equivalent to what
you get with a Tendex-like index in three to seven games. That is
why DeanO says in his book that they don't pass the laugh test;
these estimates are really, really noisy.

I should add, however, that another season or two worth of data will
help more than adding an equivalent number of games for a Tendex-
like index, because a new season will bring lots of new player
combinations, which will help break up the very strong
multicollinearity that sharply reduces the variation that can
identify the value of these hundreds of players. There are issues
in making use of more than one year of this type of data, but I
suspect it will help things a lot. But at the end of the day, I
suspect it will still be a lot more noisy than something like a
Tendex index.

2. So is the conclusion that something like WINVAL is completely
useless? No. The great advantage of this approach is that IMO it
is the least biased (in the strict statistical sense) methodology of
gauging player value of any methodology that I have seen proposed.
Unlike other methods that we know leave important features of player
value, such as defense and a lot of non-assist passing, this method
in theory captures much closer to everything that is relevant.
(There are still things it misses, but IMO those things are second
order relative to the things other methods miss.)

That said, being the least biased is only part of the equation. The
other part is how precisely can we estimate player value with this
methodology? What is the variance of the estimates? Well, the
upshot is that this methodology is tremendously noisy relative to
other methods, which makes it very hard to use.

For example, using 2002-03 data there are only about 50 players that
using this method, we can say with the usual level of certainty used
in statistics (a 5% rejection region) that those players are
significantly better than the average replacement player (players
who played less than 513 minutes). On the other hand, using my
Tendex-like index we can say that about nearly 200 players.

3. So how can these adjusted plus/minus statistics be used, given
how noisy they are? With more data they might become precise enough
to be used on their own, but I think the best way to use them is how
I used them in the regressions at the very end of my link.

http://www.uncg.edu/bae/people/rosenbaum/NBA/wv1.lst~

Here I regressed these adjusted plus/minus statistics onto per 40
minute statistics. The coefficients on these per 40 minute
statistics are estimates of the weights that we should use in
indexes that weight various statistics. Now I am not arguing that
the linear weights should be pulled straight from these regressions,
but I think we can learn some things from these regressions.

a. The first lesson seems to be that specifications with attempts
rather than misses seem to fit better. The second specification
which uses two point field goal attempts, three point attempts, and
free throw attempts has a higher R-squared that the first
specification, even though it uses one less explanatory variable.
(Perhaps David Berri was right about that.)

b. It appears that rebounds have much less value than usually
assumed (especially relative to what Berri assumed) and that steals
and blocks are much more valuable. Perhaps these defensive
statistics are highly correlated with other unmeasured defensive
qualities.

Also, even after accounting for points scored, it appears that three
point misses are far less costly than two point misses. Perhaps
having three point shooters on the floor spreads the floor,
resulting in fewer turnovers and higher field goal percentages for
other players, i.e. things that are not picked up in the three point
players' own statistics.

Note that I tested whether the cost of a two pointer was the same as
that of a three pointer and the p-value for the test was 0.0024,
suggesting that three point attempts for whatever reason are less
costly than two points, even after accounting for the points scored
on those attempts.

Well, that is probably enough for now, since I suspect other things
will come up later. I hope this was interesting.

Best wishes,
Dan
• ... From: dan_t_rosenbaum To: Sent: Tuesday, February 24, 2004 7:19 PM Subject: [APBR_analysis] My
Message 4 of 30 , Feb 24, 2004
----- Original Message -----
From: "dan_t_rosenbaum" <rosenbaum@...>
To: <APBR_analysis@yahoogroups.com>
Sent: Tuesday, February 24, 2004 7:19 PM
Subject: [APBR_analysis] My version of WINVAL

<snipping throughout>

>
> (1.1) MARGIN = B0 + B1*X1 + B2*X2 + ... + BK*XK + U, where
>
> MARGIN = home team points - away team points
>
> (MARGIN is measured during the period of time over which there is no
> substitution made. This MARGIN is normalized so that is measures
> the point difference per 40 minutes.)
>
> X1 = 1 if player 1 is playing at home
> __ = -1 if player 1 is playing away
> __ = 0 if player 1 is not playing
> XK = 1 if player K is playing at home
> __ = -1 if player K is playing away
> __ = 0 if player K is not playing
>
> U = i.i.d. error term
>
> B0 = measures average home court advantage across all teams
> B1 = measures the difference between player 1 and the omitted players
> BK = measures the difference between player K and the omitted players
>
...
>
> Observations are weighted by the number of seconds of the particular
> observation to get my first estimate (WV1). I also report a second
> crunch time and less weight during garbage time.
>

What is your definition of crunch and garbage?

> For comparison purposes, I also report the results for an index that
> is normalized to have the same mean and standard deviation as WV1
> and WV2 (WV2 is normalized to be the same as WV1). This index
> (before normalization) is the following.
>
> INDEX40 = [40/MIN] * [2*(ST-TO) + 1.5*(PTS+AS+BK) + 1*(RB-FGA-PF) -
> 0.44*FTA]
>
> Here are the results that I get from this process.
>
> http://www.uncg.edu/bae/people/rosenbaum/NBA/wv1.lst~
>

Is there a pace adjustment in there that I can't see? If not, I think a few
players would benefit: a quick calculation shows that among your top 48,
McGrady, Boykins, and Ilgauskas all gain from playing fast paced teams,
while Francis, Mike James, and Billups pay for playing on slow paced teams.

ed
• ... Here is my exact code. Clock measures the minutes elapsed in the game at the beginning of the observation. Three minutes left in the game (in regulation
Message 5 of 30 , Feb 24, 2004
> What is your definition of crunch and garbage?

Here is my exact code. Clock measures the minutes elapsed in the
game at the beginning of the observation. Three minutes left in the
game (in regulation or in overtime) is counted as 45. Margin is the
absolute value of the difference in scores at the beginning of the
observation.

ptime=max(0,(clock-36)/12);
marg10=10-ptime*7;
wgt=10*(1+ptime)*max(0,min(1,(1-(margin/marg10-1))));

Basically, in the first three quarters, full weight is given to any
part of a game where the margin is less than 10 and no weight is
given if it is more than 20. Between 10 and 20, the weight is
phased from full to zero.

This is basically what happens in the fourth quarter as well, except
that I decrease the margin from 10 (20) to 3 (6) from the beginning
to the end of the fourth quarter. Also, ceterus paribus, the end of
the quarter counts more than the beginning of the quarter.

At the end of all of this, I renormalize the weights so that on
average minutes in the fourth quarter count the same as those in the
first three quarters.

> Is there a pace adjustment in there that I can't see? If not, I
think a few players would benefit: a quick calculation shows that
playing fast paced teams, while Francis, Mike James, and Billups pay
for playing on slow paced teams.

No, there is no pace adjustment in here. This could be added, but
there seem to be some bugs in the possessions variables that would
be the ideal way to adjust for this, so in this pass I ignored this
issue.

> ed

Thanks Ed.

Best wishes,
Dan
• Dan, Thanks a lot for doing this. Quite interesting, though I wish the collinearity wasn t so damaging. Perhaps you could run a model that could contribute
Message 6 of 30 , Feb 25, 2004
Dan,

Thanks a lot for doing this. Quite interesting, though I wish the
collinearity wasn't so damaging. Perhaps you could run a model that
could contribute to another discussion going on? What happens when
you separate offensive and defensive rebounds? Other work would
suggest we'd get a larger coefficient for offensive rebounds.

Ben

--- In APBR_analysis@yahoogroups.com, "dan_t_rosenbaum"
<rosenbaum@u...> wrote:
plus/minus
> statistics, but one I would hope not to spend forever on is
whether
> or not I am exactly replicating what Winston and Sagarin are
doing.
> I am sure that I am not, but other than the fact that they have
more
> years of data at their disposal (which in this case is an
important
> difference), I highly doubt what they are doing is using the data
in
> a significantly more efficient manner than I am. What they are
> doing is surely different (and the results are probably quite a
bit
> different due to the noisiness of this methodology), but the
general
> theme of my work has to be very close to theirs.
>
> So let me talk about a few issues related to these adjusted
> plus/minus statistics.
>
> 1. The single most important feature of these results is how noisy
> the estimates are. Relative to my Tendex-like index, the standard
> errors for these adjusted plus/minus statistics are 3.5 to 5.5
times
> larger. What that means is that the precision in these adjusted
> plus/minus statistics in a whole season is about equivalent to
what
> you get with a Tendex-like index in three to seven games. That is
> why DeanO says in his book that they don't pass the laugh test;
> these estimates are really, really noisy.
>
> I should add, however, that another season or two worth of data
will
> help more than adding an equivalent number of games for a Tendex-
> like index, because a new season will bring lots of new player
> combinations, which will help break up the very strong
> multicollinearity that sharply reduces the variation that can
> identify the value of these hundreds of players. There are issues
> in making use of more than one year of this type of data, but I
> suspect it will help things a lot. But at the end of the day, I
> suspect it will still be a lot more noisy than something like a
> Tendex index.
>
> 2. So is the conclusion that something like WINVAL is completely
> useless? No. The great advantage of this approach is that IMO it
> is the least biased (in the strict statistical sense) methodology
of
> gauging player value of any methodology that I have seen
proposed.
> Unlike other methods that we know leave important features of
player
> value, such as defense and a lot of non-assist passing, this
method
> in theory captures much closer to everything that is relevant.
> (There are still things it misses, but IMO those things are second
> order relative to the things other methods miss.)
>
> That said, being the least biased is only part of the equation.
The
> other part is how precisely can we estimate player value with this
> methodology? What is the variance of the estimates? Well, the
> upshot is that this methodology is tremendously noisy relative to
> other methods, which makes it very hard to use.
>
> For example, using 2002-03 data there are only about 50 players
that
> using this method, we can say with the usual level of certainty
used
> in statistics (a 5% rejection region) that those players are
> significantly better than the average replacement player (players
> who played less than 513 minutes). On the other hand, using my
> Tendex-like index we can say that about nearly 200 players.
>
> 3. So how can these adjusted plus/minus statistics be used, given
> how noisy they are? With more data they might become precise
enough
> to be used on their own, but I think the best way to use them is
how
> I used them in the regressions at the very end of my link.
>
> http://www.uncg.edu/bae/people/rosenbaum/NBA/wv1.lst~
>
> Here I regressed these adjusted plus/minus statistics onto per 40
> minute statistics. The coefficients on these per 40 minute
> statistics are estimates of the weights that we should use in
> indexes that weight various statistics. Now I am not arguing that
> the linear weights should be pulled straight from these
regressions,
> but I think we can learn some things from these regressions.
>
> a. The first lesson seems to be that specifications with attempts
> rather than misses seem to fit better. The second specification
> which uses two point field goal attempts, three point attempts,
and
> free throw attempts has a higher R-squared that the first
> specification, even though it uses one less explanatory variable.
> (Perhaps David Berri was right about that.)
>
> b. It appears that rebounds have much less value than usually
> assumed (especially relative to what Berri assumed) and that
steals
> and blocks are much more valuable. Perhaps these defensive
> statistics are highly correlated with other unmeasured defensive
> qualities.
>
> Also, even after accounting for points scored, it appears that
three
> point misses are far less costly than two point misses. Perhaps
> having three point shooters on the floor spreads the floor,
> resulting in fewer turnovers and higher field goal percentages for
> other players, i.e. things that are not picked up in the three
point
> players' own statistics.
>
> Note that I tested whether the cost of a two pointer was the same
as
> that of a three pointer and the p-value for the test was 0.0024,
> suggesting that three point attempts for whatever reason are less
> costly than two points, even after accounting for the points
scored
> on those attempts.
>
> Well, that is probably enough for now, since I suspect other
things
> will come up later. I hope this was interesting.
>
> Best wishes,
> Dan
• (continuing from my previous message on different points) 2. Your method for estimating crunch time is interesting. I need to plot that up, but it looks like
Message 7 of 30 , Feb 25, 2004
(continuing from my previous message on different points)

2. Your method for estimating crunch time is interesting. I need to
plot that up, but it looks like a pretty decent estimate on first glance.

3. I'll echo Ed's plea to consider pace as it does have some impact.
I'm not sure it's all that big with what you're doing, but I'm not
sure it's small either.

4. I believe WINVAL had Scottie Pippen as the 2nd best player last
year. That is closer to your WV2. It's amazing to me how much
weighting by clutch time matters in your system, which makes me
reconsider whether it is a good measure. Perhaps it exaggerates a
bit. Regardless, it implies that perhaps the method -- KF vs
regression -- isn't as important as how you treat crunch time. (You
can say the same thing about human perception, of course. We seem to
give a lot of credit to players who hit clutch shots, perhaps more
than we should.)

5. What is the R2 for your estimates of wv1 and wv2? One thing I
tend to think of a KF doing is predicting the winner of a basketball
game about 2/3rds of the time. I wonder if there is a way of seeing
how your measure does on such a thing.

6. How did you develop this index? Why do you put the stats together
that way?

7. Adding entirely new seasons may reduce the collinearity, but it
also weakens the inherent assumption in these methods that player
ability remains constant over the estimation period. If Gilbert
Arenas played so much more poorly early this season just because of
his injury, sheesh, that's a tough thing to deal with.

8. Your point 3 in message 3268 (below) -- I like what it's saying
but I'm not seeing how it comes out of the results. I don't see, for
instance, where you regress against field goal attempts, two pt fga vs
three pt fga, etc. The r2 on those regressions aren't very good
either. What's PTS_AST, FG2_TP, STL_TO?

DeanO

Dean Oliver
When basketball teams start playing Moneyball, this is the book
they'll use!

--- In APBR_analysis@yahoogroups.com, "dan_t_rosenbaum"
<rosenbaum@u...> wrote:
> statistics, but one I would hope not to spend forever on is whether
> or not I am exactly replicating what Winston and Sagarin are doing.
> I am sure that I am not, but other than the fact that they have more
> years of data at their disposal (which in this case is an important
> difference), I highly doubt what they are doing is using the data in
> a significantly more efficient manner than I am. What they are
> doing is surely different (and the results are probably quite a bit
> different due to the noisiness of this methodology), but the general
> theme of my work has to be very close to theirs.
>
> So let me talk about a few issues related to these adjusted
> plus/minus statistics.
>
> 1. The single most important feature of these results is how noisy
> the estimates are. Relative to my Tendex-like index, the standard
> errors for these adjusted plus/minus statistics are 3.5 to 5.5 times
> larger. What that means is that the precision in these adjusted
> plus/minus statistics in a whole season is about equivalent to what
> you get with a Tendex-like index in three to seven games. That is
> why DeanO says in his book that they don't pass the laugh test;
> these estimates are really, really noisy.
>
> I should add, however, that another season or two worth of data will
> help more than adding an equivalent number of games for a Tendex-
> like index, because a new season will bring lots of new player
> combinations, which will help break up the very strong
> multicollinearity that sharply reduces the variation that can
> identify the value of these hundreds of players. There are issues
> in making use of more than one year of this type of data, but I
> suspect it will help things a lot. But at the end of the day, I
> suspect it will still be a lot more noisy than something like a
> Tendex index.
>
> 2. So is the conclusion that something like WINVAL is completely
> useless? No. The great advantage of this approach is that IMO it
> is the least biased (in the strict statistical sense) methodology of
> gauging player value of any methodology that I have seen proposed.
> Unlike other methods that we know leave important features of player
> value, such as defense and a lot of non-assist passing, this method
> in theory captures much closer to everything that is relevant.
> (There are still things it misses, but IMO those things are second
> order relative to the things other methods miss.)
>
> That said, being the least biased is only part of the equation. The
> other part is how precisely can we estimate player value with this
> methodology? What is the variance of the estimates? Well, the
> upshot is that this methodology is tremendously noisy relative to
> other methods, which makes it very hard to use.
>
> For example, using 2002-03 data there are only about 50 players that
> using this method, we can say with the usual level of certainty used
> in statistics (a 5% rejection region) that those players are
> significantly better than the average replacement player (players
> who played less than 513 minutes). On the other hand, using my
> Tendex-like index we can say that about nearly 200 players.
>
> 3. So how can these adjusted plus/minus statistics be used, given
> how noisy they are? With more data they might become precise enough
> to be used on their own, but I think the best way to use them is how
> I used them in the regressions at the very end of my link.
>
> http://www.uncg.edu/bae/people/rosenbaum/NBA/wv1.lst~
>
> Here I regressed these adjusted plus/minus statistics onto per 40
> minute statistics. The coefficients on these per 40 minute
> statistics are estimates of the weights that we should use in
> indexes that weight various statistics. Now I am not arguing that
> the linear weights should be pulled straight from these regressions,
> but I think we can learn some things from these regressions.
>
> a. The first lesson seems to be that specifications with attempts
> rather than misses seem to fit better. The second specification
> which uses two point field goal attempts, three point attempts, and
> free throw attempts has a higher R-squared that the first
> specification, even though it uses one less explanatory variable.
> (Perhaps David Berri was right about that.)
>
> b. It appears that rebounds have much less value than usually
> assumed (especially relative to what Berri assumed) and that steals
> and blocks are much more valuable. Perhaps these defensive
> statistics are highly correlated with other unmeasured defensive
> qualities.
>
> Also, even after accounting for points scored, it appears that three
> point misses are far less costly than two point misses. Perhaps
> having three point shooters on the floor spreads the floor,
> resulting in fewer turnovers and higher field goal percentages for
> other players, i.e. things that are not picked up in the three point
> players' own statistics.
>
> Note that I tested whether the cost of a two pointer was the same as
> that of a three pointer and the p-value for the test was 0.0024,
> suggesting that three point attempts for whatever reason are less
> costly than two points, even after accounting for the points scored
> on those attempts.
>
> Well, that is probably enough for now, since I suspect other things
> will come up later. I hope this was interesting.
>
> Best wishes,
> Dan
• ... Let me add one little thing on this point. I do a regression of team net pts vs offensive rebounding and defensive rebounding PERCENTAGE and the weights
Message 8 of 30 , Feb 25, 2004
--- In APBR_analysis@yahoogroups.com, "wimpds" <wimpds@y...> wrote:
> Dan,
>
> Thanks a lot for doing this. Quite interesting, though I wish the
> collinearity wasn't so damaging. Perhaps you could run a model that
> could contribute to another discussion going on? What happens when
> you separate offensive and defensive rebounds? Other work would
> suggest we'd get a larger coefficient for offensive rebounds.
>

Let me add one little thing on this point. I do a regression of team
net pts vs offensive rebounding and defensive rebounding PERCENTAGE
and the weights end up pretty equivalent. So the value of any rebound
is about the same. But I think what causes some of the interpretation
problem is two things.

1. Difficulty in accomplishing the two rebounds. It's easier to get
a defensive rebound than an offensive rebound, roughly twice as
likely. Whether you account for that ease of accomplishment dictates
"value" in some sense.

2. Imbalance in information from offense to defense. On the
offensive side, an offensive rebound generates another attempt and, in
some cases, a made field goal WHICH IS RECORDED. On the defensive
side, a defensive rebound indicates that some defender forced a missed
shot, but who did that IS NOT RECORDED. The offensive side is tending
to give credit all over the place to a lot of people. The defensive
side is recording just a defensive board, which is frankly the easier
part of stopping the offense. As long as you have this huge imbalance
in information, it's tough to narrow down the things you want to talk

DeanO
Dean Oliver
"Dean Oliver looks at basketball with a fresh perspective. If you
want a new way to analyze the game, this book is for you. You'll
never watch a game the same way again. We use his stuff and it helps
us." Yvan Kelly, Scout, Seattle SuperSonics

> Ben
>
>
> --- In APBR_analysis@yahoogroups.com, "dan_t_rosenbaum"
> <rosenbaum@u...> wrote:
> plus/minus
> > statistics, but one I would hope not to spend forever on is
> whether
> > or not I am exactly replicating what Winston and Sagarin are
> doing.
> > I am sure that I am not, but other than the fact that they have
> more
> > years of data at their disposal (which in this case is an
> important
> > difference), I highly doubt what they are doing is using the data
> in
> > a significantly more efficient manner than I am. What they are
> > doing is surely different (and the results are probably quite a
> bit
> > different due to the noisiness of this methodology), but the
> general
> > theme of my work has to be very close to theirs.
> >
> > So let me talk about a few issues related to these adjusted
> > plus/minus statistics.
> >
> > 1. The single most important feature of these results is how noisy
> > the estimates are. Relative to my Tendex-like index, the standard
> > errors for these adjusted plus/minus statistics are 3.5 to 5.5
> times
> > larger. What that means is that the precision in these adjusted
> > plus/minus statistics in a whole season is about equivalent to
> what
> > you get with a Tendex-like index in three to seven games. That is
> > why DeanO says in his book that they don't pass the laugh test;
> > these estimates are really, really noisy.
> >
> > I should add, however, that another season or two worth of data
> will
> > help more than adding an equivalent number of games for a Tendex-
> > like index, because a new season will bring lots of new player
> > combinations, which will help break up the very strong
> > multicollinearity that sharply reduces the variation that can
> > identify the value of these hundreds of players. There are issues
> > in making use of more than one year of this type of data, but I
> > suspect it will help things a lot. But at the end of the day, I
> > suspect it will still be a lot more noisy than something like a
> > Tendex index.
> >
> > 2. So is the conclusion that something like WINVAL is completely
> > useless? No. The great advantage of this approach is that IMO it
> > is the least biased (in the strict statistical sense) methodology
> of
> > gauging player value of any methodology that I have seen
> proposed.
> > Unlike other methods that we know leave important features of
> player
> > value, such as defense and a lot of non-assist passing, this
> method
> > in theory captures much closer to everything that is relevant.
> > (There are still things it misses, but IMO those things are second
> > order relative to the things other methods miss.)
> >
> > That said, being the least biased is only part of the equation.
> The
> > other part is how precisely can we estimate player value with this
> > methodology? What is the variance of the estimates? Well, the
> > upshot is that this methodology is tremendously noisy relative to
> > other methods, which makes it very hard to use.
> >
> > For example, using 2002-03 data there are only about 50 players
> that
> > using this method, we can say with the usual level of certainty
> used
> > in statistics (a 5% rejection region) that those players are
> > significantly better than the average replacement player (players
> > who played less than 513 minutes). On the other hand, using my
> > Tendex-like index we can say that about nearly 200 players.
> >
> > 3. So how can these adjusted plus/minus statistics be used, given
> > how noisy they are? With more data they might become precise
> enough
> > to be used on their own, but I think the best way to use them is
> how
> > I used them in the regressions at the very end of my link.
> >
> > http://www.uncg.edu/bae/people/rosenbaum/NBA/wv1.lst~
> >
> > Here I regressed these adjusted plus/minus statistics onto per 40
> > minute statistics. The coefficients on these per 40 minute
> > statistics are estimates of the weights that we should use in
> > indexes that weight various statistics. Now I am not arguing that
> > the linear weights should be pulled straight from these
> regressions,
> > but I think we can learn some things from these regressions.
> >
> > a. The first lesson seems to be that specifications with attempts
> > rather than misses seem to fit better. The second specification
> > which uses two point field goal attempts, three point attempts,
> and
> > free throw attempts has a higher R-squared that the first
> > specification, even though it uses one less explanatory variable.
> > (Perhaps David Berri was right about that.)
> >
> > b. It appears that rebounds have much less value than usually
> > assumed (especially relative to what Berri assumed) and that
> steals
> > and blocks are much more valuable. Perhaps these defensive
> > statistics are highly correlated with other unmeasured defensive
> > qualities.
> >
> > Also, even after accounting for points scored, it appears that
> three
> > point misses are far less costly than two point misses. Perhaps
> > having three point shooters on the floor spreads the floor,
> > resulting in fewer turnovers and higher field goal percentages for
> > other players, i.e. things that are not picked up in the three
> point
> > players' own statistics.
> >
> > Note that I tested whether the cost of a two pointer was the same
> as
> > that of a three pointer and the p-value for the test was 0.0024,
> > suggesting that three point attempts for whatever reason are less
> > costly than two points, even after accounting for the points
> scored
> > on those attempts.
> >
> > Well, that is probably enough for now, since I suspect other
> things
> > will come up later. I hope this was interesting.
> >
> > Best wishes,
> > Dan
• ... (You ... to ... WINVAL s top5: 1. Garnett 2. Duncan 3. Nowitzki 4. Rip Hamilton 5. Pippen (source: April 27, 2003 NY Times)
Message 9 of 30 , Feb 25, 2004
>
> 4. I believe WINVAL had Scottie Pippen as the 2nd best player last
> year. That is closer to your WV2. It's amazing to me how much
> weighting by clutch time matters in your system, which makes me
> reconsider whether it is a good measure. Perhaps it exaggerates a
> bit. Regardless, it implies that perhaps the method -- KF vs
> regression -- isn't as important as how you treat crunch time.
(You
> can say the same thing about human perception, of course. We seem
to
> give a lot of credit to players who hit clutch shots, perhaps more
> than we should.)

WINVAL's top5:

1. Garnett
2. Duncan
3. Nowitzki
4. Rip Hamilton
5. Pippen

(source: April 27, 2003 NY Times)
• ... From: Dean Oliver To: Sent: Wednesday, February 25, 2004 11:41 AM Subject: [APBR_analysis] Re: My
Message 10 of 30 , Feb 25, 2004
----- Original Message -----
From: "Dean Oliver" <deano@...>
To: <APBR_analysis@yahoogroups.com>
Sent: Wednesday, February 25, 2004 11:41 AM
Subject: [APBR_analysis] Re: My version of WINVAL (my analysis of it)

<snipping>

> (continuing from my previous message on different points)
>
....
> 3. I'll echo Ed's plea to consider pace as it does have some impact.
> I'm not sure it's all that big with what you're doing, but I'm not
> sure it's small either.
>

I think it's pretty small.

> 4. I believe WINVAL had Scottie Pippen as the 2nd best player last
> year. That is closer to your WV2. It's amazing to me how much
> weighting by clutch time matters in your system, which makes me
> reconsider whether it is a good measure. Perhaps it exaggerates a
> bit.

With the 82games data, can't we just get a probability of winning in
situation x, and assign a weight based on that? For example, if teams in the
past have come back from 10 down with 3 minutes left 10% of the time, why
not weight the "clutchiness" of the situation based on that?

> 5. What is the R2 for your estimates of wv1 and wv2?

r2 = .776

Biggest differences between WV1 and WV2:

Jeff McInnis + 7.3
Danny Ferry + 6.3
Kevin Willis + 5.4
Marcus Fizer + 5.1
DeShawn Stevenson + 5.1
Laphonso Ellis + 4.6
Zydrunas Ilgauskas + 4.5
Lonny Baxter + 4.4
Alan Henderson + 4.4
J.R. Bremer + 4.1

Fred Hoiberg - 4.2
Melvin Ely - 4.3
Earl Watson - 4.5
John Stockton - 4.7
Chris Jefferies - 5.0
Darvin Ham - 5.6
Dirk Nowitzki - 5.7
Juan Dixon - 5.7
Dale Davis - 6.3
Arvydas Sabonis - 8.6

ed
• ... teams in the ... time, why ... I ve wanted to do this for a very long time. I have these odds at each quarter, but not in the 4th, where the odds are hard
Message 11 of 30 , Feb 25, 2004
--- In APBR_analysis@yahoogroups.com, igor eduardo küpfer
<edkupfer@r...> wrote:
> ----- Original Message -----
> From: "Dean Oliver" <deano@r...>
> To: <APBR_analysis@yahoogroups.com>
> Sent: Wednesday, February 25, 2004 11:41 AM
> Subject: [APBR_analysis] Re: My version of WINVAL (my analysis of it)
> > 4. I believe WINVAL had Scottie Pippen as the 2nd best player last
> > year. That is closer to your WV2. It's amazing to me how much
> > weighting by clutch time matters in your system, which makes me
> > reconsider whether it is a good measure. Perhaps it exaggerates a
> > bit.
>
> With the 82games data, can't we just get a probability of winning in
> situation x, and assign a weight based on that? For example, if
teams in the
> past have come back from 10 down with 3 minutes left 10% of the
time, why
> not weight the "clutchiness" of the situation based on that?

I've wanted to do this for a very long time. I have these odds at
each quarter, but not in the 4th, where the odds are hard to estimate
(you also have to account for who has possession in the last minute or
so).

>
> > 5. What is the R2 for your estimates of wv1 and wv2?
>
> r2 = .776
>

Good, but not great.

> Biggest differences between WV1 and WV2:
>
> Jeff McInnis + 7.3
> Danny Ferry + 6.3
> Kevin Willis + 5.4
> Marcus Fizer + 5.1
> DeShawn Stevenson + 5.1
> Laphonso Ellis + 4.6
> Zydrunas Ilgauskas + 4.5
> Lonny Baxter + 4.4
> Alan Henderson + 4.4
> J.R. Bremer + 4.1
>
> Fred Hoiberg - 4.2
> Melvin Ely - 4.3
> Earl Watson - 4.5
> John Stockton - 4.7
> Chris Jefferies - 5.0
> Darvin Ham - 5.6
> Dirk Nowitzki - 5.7
> Juan Dixon - 5.7
> Dale Davis - 6.3
> Arvydas Sabonis - 8.6
>

Pretty sizeable differences. This is per 40 minutes? What's the
average deviation between wv1 and wv2? A couple points per 40 minutes
is a lot to attribute to handling of clutch play.

DeanO

Dean Oliver
"Oliver's book provides an insightful framework for basketball. His
approach highlights and simplifies the basic goals of team offenses
and defenses, with an interesting description of how teamwork among
players with different roles can be evaluated. This book is a unique
and surprisingly practical addition to a coach's library." Dean
Smith, Hall of Fame Basketball Coach, University of North Carolina
• ... From: Dean Oliver To: Sent: Wednesday, February 25, 2004 2:15 PM Subject: [APBR_analysis] Re: My
Message 12 of 30 , Feb 25, 2004
----- Original Message -----
From: "Dean Oliver" <deano@...>
To: <APBR_analysis@yahoogroups.com>
Sent: Wednesday, February 25, 2004 2:15 PM
Subject: [APBR_analysis] Re: My version of WINVAL (my analysis of it)

> Pretty sizeable differences. This is per 40 minutes? What's the
> average deviation between wv1 and wv2?

Avg Deviation = 1.78.

ed
• Sagarin had Pip as the 4th best player in the NBA in 03 (behind KG, Duncan and Nowitzki) but 2nd in impact, a measurement of a player s contributions when
Message 13 of 30 , Feb 25, 2004
Sagarin had Pip as the 4th best player in the NBA in '03 (behind KG,
Duncan and Nowitzki) but 2nd in "impact," a measurement of a
player's contributions when the game is close. Anyone who followed
Portland last year saw that the Blazers offense ran much more
smoothly and the defense was much sounder when Pip was playing point
guard. Unfortunately, he suffered a knee injury in the second half
of the season that required surgery and he has not been the same
since (even after a subsequent procedure in the summer). He is
currently on the injured list, hoping to get healthy enough to
finish the season (and his career) as an active player. During the
playoff series against Dallas he was only available for spot duty
but he helped spark Portland's comeback from a 3-1 deficit with a
tremendous 4th quarter in game 5. His final stats (9 points, 5
assists and 2 rebounds in 16 minutes) are not spectacular but he put
up most of his numbers in the last 10 min. of the fourth quarter and
he also made several "non-statistical" contributions (taking
charges, deflecting passes, orchestrating the team on offense and
defense)--not bad for a guy playing on one knee who spent most of
the game trying to get loose by riding an exercise bike. Coach
Maurice Cheeks was monitoring his minutes even before the knee
injury, so that game just represented a concentrated, dramatic
example of what Pip did all season--have a major impact on his
team's success in a short period of time during a stretch when the
game was close. That is why Sagarin and Rosenbaum rate Pip so
highly. It would be interesting to see how those rating systems
would evaluate Pip's contributions during his prime, particularly
during the playoffs. I don't believe that Pip "suddenly" became a
clutch player as a 37 year old with a bad knee and I don't think
that his production was a fluke. I do think that the numbers are
somehow surprising or disturbing to his detractors, who no doubt
will try to find some way to "explain away" this "anomaly."

> 4. I believe WINVAL had Scottie Pippen as the 2nd best player last
> year. That is closer to your WV2. It's amazing to me how much
> weighting by clutch time matters in your system, which makes me
> reconsider whether it is a good measure. Perhaps it exaggerates a
> bit.
• I ve got the article right in front of me and in the sidebar chart Pip is 4th and Rip is 5th. ... last ... a ... seem ... more
Message 14 of 30 , Feb 25, 2004
I've got the article right in front of me and in the sidebar chart
Pip is 4th and Rip is 5th.

--- In APBR_analysis@yahoogroups.com, "wimpds" <wimpds@y...> wrote:
> >
> > 4. I believe WINVAL had Scottie Pippen as the 2nd best player
last
> > year. That is closer to your WV2. It's amazing to me how much
> > weighting by clutch time matters in your system, which makes me
> > reconsider whether it is a good measure. Perhaps it exaggerates
a
> > bit. Regardless, it implies that perhaps the method -- KF vs
> > regression -- isn't as important as how you treat crunch time.
> (You
> > can say the same thing about human perception, of course. We
seem
> to
> > give a lot of credit to players who hit clutch shots, perhaps
more
> > than we should.)
>
> WINVAL's top5:
>
> 1. Garnett
> 2. Duncan
> 3. Nowitzki
> 4. Rip Hamilton
> 5. Pippen
>
> (source: April 27, 2003 NY Times)
• ******************************************************************* DeanO writes: 2. Your method for estimating crunch time is interesting. I need to plot that
Message 15 of 30 , Feb 26, 2004
*******************************************************************
DeanO writes:

2. Your method for estimating crunch time is interesting. I need to
plot that up, but it looks like a pretty decent estimate on first
glance.

3. I'll echo Ed's plea to consider pace as it does have some impact.
I'm not sure it's all that big with what you're doing, but I'm not
sure it's small either.
*******************************************************************

DanR replies:

I am working on a new version that incorporates pace, in fact it
possession-based rather than time-based. Like practically every
other seeminly minor change that I make with these data, it appears
that the results change quite a bit.

With these results, I can also compute offensive and defensive
ratings.

*******************************************************************
DeanO writes:

4. I believe WINVAL had Scottie Pippen as the 2nd best player last
year. That is closer to your WV2. It's amazing to me how much
weighting by clutch time matters in your system, which makes me
reconsider whether it is a good measure. Perhaps it exaggerates a
bit. Regardless, it implies that perhaps the method -- KF vs
regression -- isn't as important as how you treat crunch time. (You
can say the same thing about human perception, of course. We seem to
give a lot of credit to players who hit clutch shots, perhaps more
than we should.)
*******************************************************************

DanR replies:

Pretty much anything you do with this system changes the results
quite a bit. There just is not enough variation to estimate player
value precisely.

*******************************************************************
DeanO writes:

5. What is the R2 for your estimates of wv1 and wv2? One thing I
tend to think of a KF doing is predicting the winner of a basketball
game about 2/3rds of the time. I wonder if there is a way of seeing
how your measure does on such a thing.
*******************************************************************

DanR replies:

Were you wanting the R-squared from these regressions or the squared
correlation between these two measures? With the big standard
errors I am not sure there is a lot we could learn from constructing
the predictions you mention above.

*******************************************************************
DeanO writes:

6. How did you develop this index? Why do you put the stats together
that way?
*******************************************************************

DanR replies:

This is just a simple index I use for comparison purposes. I would
hate to spend a lot of time defending it, since I don't think there
is anything particularly special about it.

*******************************************************************
7. Adding entirely new seasons may reduce the collinearity, but it
also weakens the inherent assumption in these methods that player
ability remains constant over the estimation period. If Gilbert
Arenas played so much more poorly early this season just because of
his injury, sheesh, that's a tough thing to deal with.
*******************************************************************

DanR replies:

That is sure part of the tradeoff, but given how noisy the estimates
are currently, it is a tradeoff I am more than willing to pay.

*******************************************************************
8. Your point 3 in message 3268 (below) -- I like what it's saying
but I'm not seeing how it comes out of the results. I don't see, for
instance, where you regress against field goal attempts, two pt fga
vs three pt fga, etc. The r2 on those regressions aren't very good
either. What's PTS_AST, FG2_TP, STL_TO?
*******************************************************************

DanR replies:

The second regression has all three of those variables in it
(fg2p40, tpp40, ftp40). That regression has a higher R-squared than
the first regression, which is based upon made missed field goals
and free throws. And what is interesting is that the R-squared is
higher despite the fact that the second regression has one less
parameter.

Yes, you said this before that the R-squared isn't very good, but I
don't know why you say this. If I was regressing team outcomes on
team statistics, I would expect a much higher R-squared, but for an
individual-level regression like this, I don't have much of
expectation for what the R-squared should be. (BTW, economists tend
to place far less importance on R-squared relative to other folks
who use statistics.)

PTS_AST is a test in the second regression that the beta for points
per 40 minutes is the same as the beta for assists per 40 minutes.
(This hypothesis is not rejected.)

FG2_TP is a test in the second regression that the beta two point
field goal attempts per 40 minutes is the same as the beta for three
point attempts per 40 minutes. (This hypothesis is soundly
rejected. Three point attempts appear to be less costly, even after
accounting for points scored.)

STL_TO is a test in the second regression that the beta for steals
per 40 minutes is the negative of the beta for turnovers per 40
minutes. (This hypothesis is in the ballpark of being rejected, but
it is not.)
• ... KG, ... Does this imply that a straight-up trade of Pippen for -- say -- Shaq would have made the Blazers less successful? Or were the Blazers a singularly
Message 16 of 30 , Feb 26, 2004
--- In APBR_analysis@yahoogroups.com, "doc319" <doc319@y...> wrote:
> Sagarin had Pip as the 4th best player in the NBA in '03 (behind
KG,
> Duncan and Nowitzki) but 2nd in "impact," a measurement of a
> player's contributions when the game is close. Anyone who followed
> Portland last year saw that the Blazers offense ran much more
> smoothly and the defense was much sounder when Pip was playing ...

Does this imply that a straight-up trade of Pippen for -- say --
Shaq would have made the Blazers less successful?

Or were the Blazers a singularly dysfunctional team, with whom
Pippen just happened to fill their gaps particularly well?

I'm still waiting for the other shoe to drop. Aren't the Lakers a
shadow of themselves with Shaq out of the lineup? I'd think he
would be near the top of any 'impact' measurement.

Maybe we're seeing a true "impact to team" measurement, and there's
no league-wide comparison forthcoming: Ostertag-without-a-backup is
just more 'impactful' than Shaq-backed-by-Medvedenko (or whomever).

> ...I do think that the numbers are
> somehow surprising or disturbing to his detractors, who no doubt
> will try to find some way to "explain away" this "anomaly."

Well, I'm not a Pip detractor by any means. Neither do I think
his 'impact' is that great, at this stage in his career, over a full
season.
• Right, my mistake.
Message 17 of 30 , Feb 26, 2004
Right, my mistake.

--- In APBR_analysis@yahoogroups.com, "doc319" <doc319@y...> wrote:
> I've got the article right in front of me and in the sidebar chart
> Pip is 4th and Rip is 5th.
>
>
>
>
> --- In APBR_analysis@yahoogroups.com, "wimpds" <wimpds@y...> wrote:
> > >
> > > 4. I believe WINVAL had Scottie Pippen as the 2nd best player
> last
> > > year. That is closer to your WV2. It's amazing to me how much
> > > weighting by clutch time matters in your system, which makes me
> > > reconsider whether it is a good measure. Perhaps it exaggerates
> a
> > > bit. Regardless, it implies that perhaps the method -- KF vs
> > > regression -- isn't as important as how you treat crunch time.
> > (You
> > > can say the same thing about human perception, of course. We
> seem
> > to
> > > give a lot of credit to players who hit clutch shots, perhaps
> more
> > > than we should.)
> >
> > WINVAL's top5:
> >
> > 1. Garnett
> > 2. Duncan
> > 3. Nowitzki
> > 4. Rip Hamilton
> > 5. Pippen
> >
> > (source: April 27, 2003 NY Times)
• I m not suggesting anything of the sort; I was just responding to the post about where Pippen s impact ranked in Sagarin as opposed to Rosenbaum. Shaq could
Message 18 of 30 , Feb 26, 2004
I'm not suggesting anything of the sort; I was just responding to
the post about where Pippen's "impact" ranked in Sagarin as opposed
to Rosenbaum. Shaq could not provide for Portland the exact
qualities that Pip did (playing point guard, man to man defense on
the perimeter, etc.) but of course he would more than offset that by
providing a dominant post presence.

The Blazers were surely dysfunctional but when Pip was at full speed
they actually were very "functional" (When Pip was healthy and
starting at point guard they went 22-5 over one stretch of the
season, including wins against Minn, Sac, Spurs,
Dall); "dysfunction" returned when Pip injured his left knee and the
team finished 8-10 down the stretch.

>
> Does this imply that a straight-up trade of Pippen for -- say --
> Shaq would have made the Blazers less successful?
>
> Or were the Blazers a singularly dysfunctional team, with whom
> Pippen just happened to fill their gaps particularly well?

That's kind of my point: many people seem to not believe that he was
that "impactful" last year (or during his prime), but Sagarin and
Rosenbaum's numbers suggest otherwise. Regular observers of the
Blazers last year would also acknowlege his "impact."

>
> Well, I'm not a Pip detractor by any means. Neither do I think
> his 'impact' is that great, at this stage in his career, over a
full
> season.
• In one sense, you are absolutely correct: Pip s impact over a full season is not as great as that of some of the other players because he no longer is able
Message 19 of 30 , Feb 26, 2004
In one sense, you are absolutely correct: Pip's "impact" over a full
season is not as great as that of some of the other players because
he no longer is able to play a full season without suffering
injuries that take him out of the lineup or curtail his
effectiveness. Sagarin and Rosenbaum's numbers suggest that last
year Pip had a significant "impact" down the stretch of close ball
games. That is borne out by Portland's record when he was healthy
last year, by his performance in the fourth quarter of game five
versus Dallas and by any number of games last year in which he
played a major role in Portland winning. So, while he did
not "impact" a full season, he did have quite an "impact" in the
games that he played in when he was healthy. Of course, Sagarin and
Rosenbaum did not "penalize" Pip for the games that he missed, the
majority of which Portland lost.

>
> Well, I'm not a Pip detractor by any means. Neither do I think
> his 'impact' is that great, at this stage in his career, over a
full
> season.
• I think the problem in WINVAL is relating impactful to good. If I m on a team that doesn t have any smart team defenders and needs a point guard, and I
Message 20 of 30 , Feb 27, 2004
I think the problem in WINVAL is relating "impactful" to "good." If
I'm on a team that doesn't have any smart team defenders and needs a
point guard, and I happen to provide both of those things, then I
will show up as very impactful in these rankings, even though I might
not be that good. The Blazers played great once they moved Pippen to
point and got Stoudamire out of there, for both of those reasons.

If they swapped Pippen for Shaq they wouldn't get the same kind of
impact right away because Sabonis and Davis were quality centers last
year, but in the real world they could trade those two for a good
point guard very easily and end up way ahead on the game.

--- In APBR_analysis@yahoogroups.com, "doc319" <doc319@y...> wrote:
> I'm not suggesting anything of the sort; I was just responding to
> the post about where Pippen's "impact" ranked in Sagarin as opposed
> to Rosenbaum. Shaq could not provide for Portland the exact
> qualities that Pip did (playing point guard, man to man defense on
> the perimeter, etc.) but of course he would more than offset that
by
> providing a dominant post presence.
>
> The Blazers were surely dysfunctional but when Pip was at full
speed
> they actually were very "functional" (When Pip was healthy and
> starting at point guard they went 22-5 over one stretch of the
> season, including wins against Minn, Sac, Spurs,
> Dall); "dysfunction" returned when Pip injured his left knee and
the
> team finished 8-10 down the stretch.
>
>
> >
> > Does this imply that a straight-up trade of Pippen for -- say --
> > Shaq would have made the Blazers less successful?
> >
> > Or were the Blazers a singularly dysfunctional team, with whom
> > Pippen just happened to fill their gaps particularly well?
>
>
>
>
>
> That's kind of my point: many people seem to not believe that he
was
> that "impactful" last year (or during his prime), but Sagarin and
> Rosenbaum's numbers suggest otherwise. Regular observers of the
> Blazers last year would also acknowlege his "impact."
>
>
> >
> > Well, I'm not a Pip detractor by any means. Neither do I think
> > his 'impact' is that great, at this stage in his career, over a
> full
> > season.
• Thanks Dan, very interesting, and confirms a lot of my own suspicions about WINVAL. On the 3-point attempts being less costly, I think one thing you have to
Message 21 of 30 , Feb 27, 2004
Thanks Dan, very interesting, and confirms a lot of my own suspicions

On the 3-point attempts being less costly, I think one thing you have
to consider is free throws. In other words (as I casually make an
assumption which may not be true), if you're not treating a two-point
attempt where the shooter is fouled as a two-point attempt, then it's
bound to lower the expected value of a two-point shot. Since hardly
anybody gets fouled on a 3, it makes the 3 look better by comparison.

--- In APBR_analysis@yahoogroups.com, "dan_t_rosenbaum"
<rosenbaum@u...> wrote:
plus/minus
> statistics, but one I would hope not to spend forever on is whether
> or not I am exactly replicating what Winston and Sagarin are
doing.
> I am sure that I am not, but other than the fact that they have
more
> years of data at their disposal (which in this case is an important
> difference), I highly doubt what they are doing is using the data
in
> a significantly more efficient manner than I am. What they are
> doing is surely different (and the results are probably quite a bit
> different due to the noisiness of this methodology), but the
general
> theme of my work has to be very close to theirs.
>
> So let me talk about a few issues related to these adjusted
> plus/minus statistics.
>
> 1. The single most important feature of these results is how noisy
> the estimates are. Relative to my Tendex-like index, the standard
> errors for these adjusted plus/minus statistics are 3.5 to 5.5
times
> larger. What that means is that the precision in these adjusted
> plus/minus statistics in a whole season is about equivalent to what
> you get with a Tendex-like index in three to seven games. That is
> why DeanO says in his book that they don't pass the laugh test;
> these estimates are really, really noisy.
>
> I should add, however, that another season or two worth of data
will
> help more than adding an equivalent number of games for a Tendex-
> like index, because a new season will bring lots of new player
> combinations, which will help break up the very strong
> multicollinearity that sharply reduces the variation that can
> identify the value of these hundreds of players. There are issues
> in making use of more than one year of this type of data, but I
> suspect it will help things a lot. But at the end of the day, I
> suspect it will still be a lot more noisy than something like a
> Tendex index.
>
> 2. So is the conclusion that something like WINVAL is completely
> useless? No. The great advantage of this approach is that IMO it
> is the least biased (in the strict statistical sense) methodology
of
> gauging player value of any methodology that I have seen proposed.
> Unlike other methods that we know leave important features of
player
> value, such as defense and a lot of non-assist passing, this method
> in theory captures much closer to everything that is relevant.
> (There are still things it misses, but IMO those things are second
> order relative to the things other methods miss.)
>
> That said, being the least biased is only part of the equation.
The
> other part is how precisely can we estimate player value with this
> methodology? What is the variance of the estimates? Well, the
> upshot is that this methodology is tremendously noisy relative to
> other methods, which makes it very hard to use.
>
> For example, using 2002-03 data there are only about 50 players
that
> using this method, we can say with the usual level of certainty
used
> in statistics (a 5% rejection region) that those players are
> significantly better than the average replacement player (players
> who played less than 513 minutes). On the other hand, using my
> Tendex-like index we can say that about nearly 200 players.
>
> 3. So how can these adjusted plus/minus statistics be used, given
> how noisy they are? With more data they might become precise
enough
> to be used on their own, but I think the best way to use them is
how
> I used them in the regressions at the very end of my link.
>
> http://www.uncg.edu/bae/people/rosenbaum/NBA/wv1.lst~
>
> Here I regressed these adjusted plus/minus statistics onto per 40
> minute statistics. The coefficients on these per 40 minute
> statistics are estimates of the weights that we should use in
> indexes that weight various statistics. Now I am not arguing that
> the linear weights should be pulled straight from these
regressions,
> but I think we can learn some things from these regressions.
>
> a. The first lesson seems to be that specifications with attempts
> rather than misses seem to fit better. The second specification
> which uses two point field goal attempts, three point attempts, and
> free throw attempts has a higher R-squared that the first
> specification, even though it uses one less explanatory variable.
> (Perhaps David Berri was right about that.)
>
> b. It appears that rebounds have much less value than usually
> assumed (especially relative to what Berri assumed) and that steals
> and blocks are much more valuable. Perhaps these defensive
> statistics are highly correlated with other unmeasured defensive
> qualities.
>
> Also, even after accounting for points scored, it appears that
three
> point misses are far less costly than two point misses. Perhaps
> having three point shooters on the floor spreads the floor,
> resulting in fewer turnovers and higher field goal percentages for
> other players, i.e. things that are not picked up in the three
point
> players' own statistics.
>
> Note that I tested whether the cost of a two pointer was the same
as
> that of a three pointer and the p-value for the test was 0.0024,
> suggesting that three point attempts for whatever reason are less
> costly than two points, even after accounting for the points scored
> on those attempts.
>
> Well, that is probably enough for now, since I suspect other things
> will come up later. I hope this was interesting.
>
> Best wishes,
> Dan
• My latest version does exactly what asked and the evidence suggests that the values for offensive and defensive rebounds probably are about the same. If
Message 22 of 30 , Feb 28, 2004
that the values for offensive and defensive rebounds probably are
about the same. If anything, there is a slightly higher value for
defensive rebounds. Perhaps guys who specialize in getting
offensive rebounds tend to not be offensive threats, which makes it
harder for their teammates to score. These WINVAL statistics would
pick up such externalities, while the accounting-type approach
inherent in Tendex, PER, and DeanO's stuff would not pick up such
externalities.

--- In APBR_analysis@yahoogroups.com, "wimpds" <wimpds@y...> wrote:
> Dan,
>
> Thanks a lot for doing this. Quite interesting, though I wish the
> collinearity wasn't so damaging. Perhaps you could run a model
that
> could contribute to another discussion going on? What happens
when
> you separate offensive and defensive rebounds? Other work would
> suggest we'd get a larger coefficient for offensive rebounds.
>
> Ben
• I think people are making too big of a deal out of this Shaq vs. Pippen comparison. WINVAL in an 82 game season produces player value estimates that are about
Message 23 of 30 , Feb 28, 2004
I think people are making too big of a deal out of this Shaq vs.
Pippen comparison. WINVAL in an 82 game season produces player
value estimates that are about a precise as those in 4-9 games for a
Tendex estimate. It is easy to imagine Pippen being equally
productive as Shaq over a 4-9 game stretch, which is all that these
data are really able to say.

That said, regular plus/minus statistics are heavily influenced by
how good a given player's substitutes might be. WINVAL accounts for
this, so this is really just a criticism of regular plus/minus
statistics and not WINVAL.

--- In APBR_analysis@yahoogroups.com, "John Hollinger"
<alleyoop2@y...> wrote:
> I think the problem in WINVAL is relating "impactful" to "good."
If
> I'm on a team that doesn't have any smart team defenders and needs
a
> point guard, and I happen to provide both of those things, then I
> will show up as very impactful in these rankings, even though I
might
> not be that good. The Blazers played great once they moved Pippen
to
> point and got Stoudamire out of there, for both of those reasons.
>
> If they swapped Pippen for Shaq they wouldn't get the same kind of
> impact right away because Sabonis and Davis were quality centers
last
> year, but in the real world they could trade those two for a good
> point guard very easily and end up way ahead on the game.
>
>
>
>
> --- In APBR_analysis@yahoogroups.com, "doc319" <doc319@y...> wrote:
> > I'm not suggesting anything of the sort; I was just responding
to
> > the post about where Pippen's "impact" ranked in Sagarin as
opposed
> > to Rosenbaum. Shaq could not provide for Portland the exact
> > qualities that Pip did (playing point guard, man to man defense
on
> > the perimeter, etc.) but of course he would more than offset
that
> by
> > providing a dominant post presence.
> >
> > The Blazers were surely dysfunctional but when Pip was at full
> speed
> > they actually were very "functional" (When Pip was healthy and
> > starting at point guard they went 22-5 over one stretch of the
> > season, including wins against Minn, Sac, Spurs,
> > Dall); "dysfunction" returned when Pip injured his left knee and
> the
> > team finished 8-10 down the stretch.
> >
> >
> > >
> > > Does this imply that a straight-up trade of Pippen for -- say -
-
> > > Shaq would have made the Blazers less successful?
> > >
> > > Or were the Blazers a singularly dysfunctional team, with whom
> > > Pippen just happened to fill their gaps particularly well?
> >
> >
> >
> >
> >
> > That's kind of my point: many people seem to not believe that he
> was
> > that "impactful" last year (or during his prime), but Sagarin
and
> > Rosenbaum's numbers suggest otherwise. Regular observers of the
> > Blazers last year would also acknowlege his "impact."
> >
> >
> > >
> > > Well, I'm not a Pip detractor by any means. Neither do I
think
> > > his 'impact' is that great, at this stage in his career, over
a
> > full
> > > season.
• Well, let s compare three players. Mr. Three - generates 10 points per game on 10 three point attempts Mr. Two - generates 10 points per game on 10 two point
Message 24 of 30 , Feb 28, 2004
Well, let's compare three players.

Mr. Three - generates 10 points per game on 10 three point attempts
Mr. Two - generates 10 points per game on 10 two point attempts
Mr. Free Throw - generates 10 points per game on 6 two point
attempts and 9 free throw attempts

Assuming 0.44 FGA for a free throw, Mr. Three and Mr. Two each have
10 field goal attempts, while Mr. Free Throw has 9.96.

According to my Table 3, here are the "values" for the three players.

http://www.uncg.edu/bae/people/rosenbaum/NBA/winval1.htm#table3

Mr. Three = 10*0.935 - 10*0.128 = 8.07
Mr. Two = 10*0.935 - 10*0.727 = 2.08
Mr. Free Throw = 10*0.935 - 6*0.727 + 9*0.105 = 5.93

This regression seems to suggest that Mr. Three is the most
valuable, even more valuable than Mr. Free Throw who generates lots
of free throws with his two point attempts (and shoots a slightly
higher true shooting percentage).

Now, do I believe these regression coefficients are exactly right.
Of course not, but I think it might tell us something about how
valuable it is to have good three point shooters on the court.

In keeping the floor spread, three pointers shooters provide
valuable externalities to their teammates who post up down low and
to those who slash to the basket.

--- In APBR_analysis@yahoogroups.com, "John Hollinger"
<alleyoop2@y...> wrote:
> Thanks Dan, very interesting, and confirms a lot of my own
suspicions
>
> On the 3-point attempts being less costly, I think one thing you
have
> to consider is free throws. In other words (as I casually make an
> assumption which may not be true), if you're not treating a two-
point
> attempt where the shooter is fouled as a two-point attempt, then
it's
> bound to lower the expected value of a two-point shot. Since
hardly
> anybody gets fouled on a 3, it makes the 3 look better by
comparison.
• And here is the link to the relevant section of my latest version. http://www.uncg.edu/bae/people/rosenbaum/NBA/winval1.htm#table3 ... suggests ... it ...
Message 25 of 30 , Feb 28, 2004

http://www.uncg.edu/bae/people/rosenbaum/NBA/winval1.htm#table3

--- In APBR_analysis@yahoogroups.com, "dan_t_rosenbaum"
<rosenbaum@u...> wrote:
suggests
> that the values for offensive and defensive rebounds probably are
> about the same. If anything, there is a slightly higher value for
> defensive rebounds. Perhaps guys who specialize in getting
> offensive rebounds tend to not be offensive threats, which makes
it
> harder for their teammates to score. These WINVAL statistics
would
> pick up such externalities, while the accounting-type approach
> inherent in Tendex, PER, and DeanO's stuff would not pick up such
> externalities.
>
> --- In APBR_analysis@yahoogroups.com, "wimpds" <wimpds@y...> wrote:
> > Dan,
> >
> > Thanks a lot for doing this. Quite interesting, though I wish
the
> > collinearity wasn't so damaging. Perhaps you could run a model
> that
> > could contribute to another discussion going on? What happens
> when
> > you separate offensive and defensive rebounds? Other work would
> > suggest we'd get a larger coefficient for offensive rebounds.
> >
> > Ben
• You make a small mistake in your text where you claim that the offensive rebound coefficient is higher than that for defensive rebounds. Just out of
Message 26 of 30 , Feb 29, 2004
You make a small mistake in your text where you claim that the
offensive rebound coefficient is higher than that for defensive
rebounds. Just out of curiousity, what does it look like if you
force the intercept to zero? [If you have the time and inclination
to run the model.]

Thanks again.

--- In APBR_analysis@yahoogroups.com, "dan_t_rosenbaum"
<rosenbaum@u...> wrote:
suggests
> that the values for offensive and defensive rebounds probably are
> about the same. If anything, there is a slightly higher value for
> defensive rebounds. Perhaps guys who specialize in getting
> offensive rebounds tend to not be offensive threats, which makes
it
> harder for their teammates to score. These WINVAL statistics
would
> pick up such externalities, while the accounting-type approach
> inherent in Tendex, PER, and DeanO's stuff would not pick up such
> externalities.
>
> --- In APBR_analysis@yahoogroups.com, "wimpds" <wimpds@y...> wrote:
> > Dan,
> >
> > Thanks a lot for doing this. Quite interesting, though I wish
the
> > collinearity wasn't so damaging. Perhaps you could run a model
> that
> > could contribute to another discussion going on? What happens
> when
> > you separate offensive and defensive rebounds? Other work would
> > suggest we'd get a larger coefficient for offensive rebounds.
> >
> > Ben
• My latest version does exactly what asked and the evidence suggests that the values for offensive and defensive rebounds probably are about the same.  If
Message 27 of 30 , Feb 29, 2004
My latest version does exactly what asked and the evidence suggests that the values for offensive and defensive rebounds probably are about the same.  If anything, there is a slightly higher value for defensive rebounds.

they are the same, in terms of value towards a team, as each either starts (def) or regains (off) a team possession, and a team possession has an inherent average value each season (unless you want to get overly technical as someone else pointed out and not league at it from the overall league perspective but from the fact that certain teams have a higer value for a team possession than others because they are more efficient at scoring)...

however as to whether a def reb has a slightly higher value, what i can point out is if you look at a number of the great players (olajuwom, k.malone, bird, magic, jordan, etc) you'll see that as they got older, their percentage of def to off rebounds increases as they learn how to win. that tells me def rebs are "more important" at least to them, i'm guessing because they learn its most important to get back to play D rather than it is to go after an off reb, but that's only an assumption...

bob chaikin
bchaikin@...

• Thanks for catching the mistake. It should be fixed now. If I force the intercept to be zero, the offensive and defensive rebound coefficients change to the
Message 28 of 30 , Feb 29, 2004
Thanks for catching the mistake. It should be fixed now. If I
force the intercept to be zero, the offensive and defensive rebound
coefficients change to the following. (Note that forcing the
intercept to be equal to zero implies an index value for replacement
players that equals zero.)

NAME COEFF (ST ERROR)
ORP40 0.125913 (0.44470215)
DRP40 0.264566 (0.22329696)

--- In APBR_analysis@yahoogroups.com, "wimpds" <wimpds@y...> wrote:
> You make a small mistake in your text where you claim that the
> offensive rebound coefficient is higher than that for defensive
> rebounds. Just out of curiousity, what does it look like if you
> force the intercept to zero? [If you have the time and
inclination
> to run the model.]
>
> Thanks again.
• ... I ve been thinking about this statement a bit. Inherently, these play-by-play data do feature all the interactions of a player. They do have
Message 29 of 30 , Mar 4, 2004
--- In APBR_analysis@yahoogroups.com, "dan_t_rosenbaum"
<rosenbaum@u...> wrote:
> harder for their teammates to score. These WINVAL statistics would
> pick up such externalities, while the accounting-type approach
> inherent in Tendex, PER, and DeanO's stuff would not pick up such
> externalities.
>

play-by-play data do feature all the interactions of a player. They
do have "externalities." In general, we think that it's good to
capture more and not less, but I guess the reason that there is so
much noise is that all those externalities really add it. Capturing
all that information can be hiding true "talent" by capturing also:

- decisions to go to a certain player in the offense
- system, such as man or zone defense
- decisions on what man covers who

I'm sure there are other things, but I note that these are actually
very important in making translations to other teams. These are, like
the lineups themselves, very collinear. The defense of Sam Cassell
looked a lot worse in Milwaukee than in Minnesota where he has an
effective zone to protect him. Minnesota couldn't have seen that
really with these stats until it actually put him in a zone. How
would Sam do as the first option in an offense? He's not been that,
so if the Bobcats pick him up in the expansion draft (doubtful, of
course), would he be able to do that? Or, how would Flip Murray be
defensively if the Sonics didn't play their trap all the time?

The belief of how important these factors are in performance then
matters. If you believe that talent is 99% of the game, the
externalities don't matter much. But just the fact that there is a
fair amount of noise in the estimates helps support my idea that they
do matter (even with the new season of data that Dan has). It's hard
putting a number on it, but I tend to go with 70% of performance is
talent.

DeanO

Dean Oliver
"Oliver goes beyond stats to dissect what it takes to win. His breezy
style makes for enjoyable reading, but there are plenty of points of
wisdom as well. This book can be appreciated by fans, players,
coaches and executives, but more importantly it can be used as a text
book for all these groups. You are sure to learn something you didn't
Baseball and Hidden Game of Football

> --- In APBR_analysis@yahoogroups.com, "wimpds" <wimpds@y...> wrote:
> > Dan,
> >
> > Thanks a lot for doing this. Quite interesting, though I wish the
> > collinearity wasn't so damaging. Perhaps you could run a model
> that
> > could contribute to another discussion going on? What happens
> when
> > you separate offensive and defensive rebounds? Other work would
> > suggest we'd get a larger coefficient for offensive rebounds.
> >
> > Ben
• I probably think of this a little differently. Let s think of three kinds of contributions that a player might make to winning. 1. Measured statistics, i.e.
Message 30 of 30 , Mar 4, 2004
I probably think of this a little differently. Let's think of three
kinds of contributions that a player might make to winning.

1. Measured statistics, i.e. points, rebounds, assists, etc.
- These measured statistics are the grist of most statistical
measures of player quality.

2. Unmeasured contributions to winning that occur during a given
block of a game
- This is what the WINVAL play-by-play method is ideal at
measuring. It picks up things like good entry passes that don't
lead to assists, good hands that keep a bad pass from being a
turnover for the passer, a good pick, not spreading the floor
effectively, checking a guy on defense, providing good help defense,
disrupting a passer so that he misses an open cutter, etc.

3. Unmeasured contributions to winning that occur outside a block of
a game
- For the most part, WINVAL misses these. This would include fouls
picked up in one block of a game that make a player more tentative
in another part of the game, offensive or defensive plays during one
block of game that have effects on later blocks of the game, team
cancers or team leaders that affect the effort level of players in
games and practices (these cancers and leaders may not even play),
etc.

When I referred to "externalities," I probably was thinking of some
combination of those contributions in categories two and three.

In my opinion, talent is involved in the first two categories and
possibly in the third as well, so it is misleading to think of
WINVAL as mostly picking up the effort contributions not picked up
by the talent contributions inherent in the measured statistics. It
is a measured versus unmeasured issue, not a talent versus effort
issue.

DeanO also makes a point about interactions between players, citing
Sam Cassell as an example. WINVAL is not much better than
traditional statistics at picking this up - at least in the context
DeanO brings it up. I guess Minnesota could have looked at how
effective Cassell was in cases where he was surrounded by
Milwaukee's best defenders. But this is tough, and I bet this would
not be very helpful in extrapolating exactly how Cassell would
perform in the situation he is now in with Minnesota or how he would
do with Charlotte next season (although that may be a lot like
Milwaukee).

So some of these interactions are almost a fourth consideration
separate from the three kinds of contributions that I outline
above.

--- In APBR_analysis@yahoogroups.com, "Dean Oliver" <deano@r...>
wrote:
> play-by-play data do feature all the interactions of a player.
They
> do have "externalities." In general, we think that it's good to
> capture more and not less, but I guess the reason that there is so
> much noise is that all those externalities really add it.
Capturing
> all that information can be hiding true "talent" by capturing also:
>
> - decisions to go to a certain player in the offense
> - system, such as man or zone defense
> - decisions on what man covers who
>
> I'm sure there are other things, but I note that these are actually
> very important in making translations to other teams. These are,
like
> the lineups themselves, very collinear. The defense of Sam Cassell
> looked a lot worse in Milwaukee than in Minnesota where he has an
> effective zone to protect him. Minnesota couldn't have seen that
> really with these stats until it actually put him in a zone. How
> would Sam do as the first option in an offense? He's not been
that,
> so if the Bobcats pick him up in the expansion draft (doubtful, of
> course), would he be able to do that? Or, how would Flip Murray be
> defensively if the Sonics didn't play their trap all the time?
>
> The belief of how important these factors are in performance then
> matters. If you believe that talent is 99% of the game, the
> externalities don't matter much. But just the fact that there is a
> fair amount of noise in the estimates helps support my idea that
they
> do matter (even with the new season of data that Dan has). It's
hard
> putting a number on it, but I tend to go with 70% of performance is
> talent.
>
> DeanO
Your message has been successfully submitted and would be delivered to recipients shortly.