3290Re: My version of WINVAL (my analysis of it)
- Feb 27, 2004Thanks Dan, very interesting, and confirms a lot of my own suspicions
On the 3-point attempts being less costly, I think one thing you have
to consider is free throws. In other words (as I casually make an
assumption which may not be true), if you're not treating a two-point
attempt where the shooter is fouled as a two-point attempt, then it's
bound to lower the expected value of a two-point shot. Since hardly
anybody gets fouled on a 3, it makes the 3 look better by comparison.
--- In APBR_analysis@yahoogroups.com, "dan_t_rosenbaum"
> There are several issues to consider about these adjustedplus/minus
> statistics, but one I would hope not to spend forever on is whetherdoing.
> or not I am exactly replicating what Winston and Sagarin are
> I am sure that I am not, but other than the fact that they havemore
> years of data at their disposal (which in this case is an importantin
> difference), I highly doubt what they are doing is using the data
> a significantly more efficient manner than I am. What they aregeneral
> doing is surely different (and the results are probably quite a bit
> different due to the noisiness of this methodology), but the
> theme of my work has to be very close to theirs.times
> So let me talk about a few issues related to these adjusted
> plus/minus statistics.
> 1. The single most important feature of these results is how noisy
> the estimates are. Relative to my Tendex-like index, the standard
> errors for these adjusted plus/minus statistics are 3.5 to 5.5
> larger. What that means is that the precision in these adjustedwill
> plus/minus statistics in a whole season is about equivalent to what
> you get with a Tendex-like index in three to seven games. That is
> why DeanO says in his book that they don't pass the laugh test;
> these estimates are really, really noisy.
> I should add, however, that another season or two worth of data
> help more than adding an equivalent number of games for a Tendex-of
> like index, because a new season will bring lots of new player
> combinations, which will help break up the very strong
> multicollinearity that sharply reduces the variation that can
> identify the value of these hundreds of players. There are issues
> in making use of more than one year of this type of data, but I
> suspect it will help things a lot. But at the end of the day, I
> suspect it will still be a lot more noisy than something like a
> Tendex index.
> 2. So is the conclusion that something like WINVAL is completely
> useless? No. The great advantage of this approach is that IMO it
> is the least biased (in the strict statistical sense) methodology
> gauging player value of any methodology that I have seen proposed.player
> Unlike other methods that we know leave important features of
> value, such as defense and a lot of non-assist passing, this methodThe
> in theory captures much closer to everything that is relevant.
> (There are still things it misses, but IMO those things are second
> order relative to the things other methods miss.)
> That said, being the least biased is only part of the equation.
> other part is how precisely can we estimate player value with thisthat
> methodology? What is the variance of the estimates? Well, the
> upshot is that this methodology is tremendously noisy relative to
> other methods, which makes it very hard to use.
> For example, using 2002-03 data there are only about 50 players
> using this method, we can say with the usual level of certaintyused
> in statistics (a 5% rejection region) that those players areenough
> significantly better than the average replacement player (players
> who played less than 513 minutes). On the other hand, using my
> Tendex-like index we can say that about nearly 200 players.
> 3. So how can these adjusted plus/minus statistics be used, given
> how noisy they are? With more data they might become precise
> to be used on their own, but I think the best way to use them ishow
> I used them in the regressions at the very end of my link.regressions,
> Here I regressed these adjusted plus/minus statistics onto per 40
> minute statistics. The coefficients on these per 40 minute
> statistics are estimates of the weights that we should use in
> indexes that weight various statistics. Now I am not arguing that
> the linear weights should be pulled straight from these
> but I think we can learn some things from these regressions.three
> a. The first lesson seems to be that specifications with attempts
> rather than misses seem to fit better. The second specification
> which uses two point field goal attempts, three point attempts, and
> free throw attempts has a higher R-squared that the first
> specification, even though it uses one less explanatory variable.
> (Perhaps David Berri was right about that.)
> b. It appears that rebounds have much less value than usually
> assumed (especially relative to what Berri assumed) and that steals
> and blocks are much more valuable. Perhaps these defensive
> statistics are highly correlated with other unmeasured defensive
> Also, even after accounting for points scored, it appears that
> point misses are far less costly than two point misses. Perhapspoint
> having three point shooters on the floor spreads the floor,
> resulting in fewer turnovers and higher field goal percentages for
> other players, i.e. things that are not picked up in the three
> players' own statistics.as
> Note that I tested whether the cost of a two pointer was the same
> that of a three pointer and the p-value for the test was 0.0024,
> suggesting that three point attempts for whatever reason are less
> costly than two points, even after accounting for the points scored
> on those attempts.
> Well, that is probably enough for now, since I suspect other things
> will come up later. I hope this was interesting.
> Best wishes,
- << Previous post in topic Next post in topic >>