- Feb 24, 2004There are several issues to consider about these adjusted plus/minus

statistics, but one I would hope not to spend forever on is whether

or not I am exactly replicating what Winston and Sagarin are doing.

I am sure that I am not, but other than the fact that they have more

years of data at their disposal (which in this case is an important

difference), I highly doubt what they are doing is using the data in

a significantly more efficient manner than I am. What they are

doing is surely different (and the results are probably quite a bit

different due to the noisiness of this methodology), but the general

theme of my work has to be very close to theirs.

So let me talk about a few issues related to these adjusted

plus/minus statistics.

1. The single most important feature of these results is how noisy

the estimates are. Relative to my Tendex-like index, the standard

errors for these adjusted plus/minus statistics are 3.5 to 5.5 times

larger. What that means is that the precision in these adjusted

plus/minus statistics in a whole season is about equivalent to what

you get with a Tendex-like index in three to seven games. That is

why DeanO says in his book that they don't pass the laugh test;

these estimates are really, really noisy.

I should add, however, that another season or two worth of data will

help more than adding an equivalent number of games for a Tendex-

like index, because a new season will bring lots of new player

combinations, which will help break up the very strong

multicollinearity that sharply reduces the variation that can

identify the value of these hundreds of players. There are issues

in making use of more than one year of this type of data, but I

suspect it will help things a lot. But at the end of the day, I

suspect it will still be a lot more noisy than something like a

Tendex index.

2. So is the conclusion that something like WINVAL is completely

useless? No. The great advantage of this approach is that IMO it

is the least biased (in the strict statistical sense) methodology of

gauging player value of any methodology that I have seen proposed.

Unlike other methods that we know leave important features of player

value, such as defense and a lot of non-assist passing, this method

in theory captures much closer to everything that is relevant.

(There are still things it misses, but IMO those things are second

order relative to the things other methods miss.)

That said, being the least biased is only part of the equation. The

other part is how precisely can we estimate player value with this

methodology? What is the variance of the estimates? Well, the

upshot is that this methodology is tremendously noisy relative to

other methods, which makes it very hard to use.

For example, using 2002-03 data there are only about 50 players that

using this method, we can say with the usual level of certainty used

in statistics (a 5% rejection region) that those players are

significantly better than the average replacement player (players

who played less than 513 minutes). On the other hand, using my

Tendex-like index we can say that about nearly 200 players.

3. So how can these adjusted plus/minus statistics be used, given

how noisy they are? With more data they might become precise enough

to be used on their own, but I think the best way to use them is how

I used them in the regressions at the very end of my link.

http://www.uncg.edu/bae/people/rosenbaum/NBA/wv1.lst~

Here I regressed these adjusted plus/minus statistics onto per 40

minute statistics. The coefficients on these per 40 minute

statistics are estimates of the weights that we should use in

indexes that weight various statistics. Now I am not arguing that

the linear weights should be pulled straight from these regressions,

but I think we can learn some things from these regressions.

a. The first lesson seems to be that specifications with attempts

rather than misses seem to fit better. The second specification

which uses two point field goal attempts, three point attempts, and

free throw attempts has a higher R-squared that the first

specification, even though it uses one less explanatory variable.

(Perhaps David Berri was right about that.)

b. It appears that rebounds have much less value than usually

assumed (especially relative to what Berri assumed) and that steals

and blocks are much more valuable. Perhaps these defensive

statistics are highly correlated with other unmeasured defensive

qualities.

Also, even after accounting for points scored, it appears that three

point misses are far less costly than two point misses. Perhaps

having three point shooters on the floor spreads the floor,

resulting in fewer turnovers and higher field goal percentages for

other players, i.e. things that are not picked up in the three point

players' own statistics.

Note that I tested whether the cost of a two pointer was the same as

that of a three pointer and the p-value for the test was 0.0024,

suggesting that three point attempts for whatever reason are less

costly than two points, even after accounting for the points scored

on those attempts.

Well, that is probably enough for now, since I suspect other things

will come up later. I hope this was interesting.

Best wishes,

Dan - << Previous post in topic Next post in topic >>