Loading ...
Sorry, an error occurred while loading the content.

3290Re: My version of WINVAL (my analysis of it)

Expand Messages
  • John Hollinger
    Feb 27, 2004
    • 0 Attachment
      Thanks Dan, very interesting, and confirms a lot of my own suspicions
      about WINVAL.

      On the 3-point attempts being less costly, I think one thing you have
      to consider is free throws. In other words (as I casually make an
      assumption which may not be true), if you're not treating a two-point
      attempt where the shooter is fouled as a two-point attempt, then it's
      bound to lower the expected value of a two-point shot. Since hardly
      anybody gets fouled on a 3, it makes the 3 look better by comparison.

      --- In APBR_analysis@yahoogroups.com, "dan_t_rosenbaum"
      <rosenbaum@u...> wrote:
      > There are several issues to consider about these adjusted
      > statistics, but one I would hope not to spend forever on is whether
      > or not I am exactly replicating what Winston and Sagarin are
      > I am sure that I am not, but other than the fact that they have
      > years of data at their disposal (which in this case is an important
      > difference), I highly doubt what they are doing is using the data
      > a significantly more efficient manner than I am. What they are
      > doing is surely different (and the results are probably quite a bit
      > different due to the noisiness of this methodology), but the
      > theme of my work has to be very close to theirs.
      > So let me talk about a few issues related to these adjusted
      > plus/minus statistics.
      > 1. The single most important feature of these results is how noisy
      > the estimates are. Relative to my Tendex-like index, the standard
      > errors for these adjusted plus/minus statistics are 3.5 to 5.5
      > larger. What that means is that the precision in these adjusted
      > plus/minus statistics in a whole season is about equivalent to what
      > you get with a Tendex-like index in three to seven games. That is
      > why DeanO says in his book that they don't pass the laugh test;
      > these estimates are really, really noisy.
      > I should add, however, that another season or two worth of data
      > help more than adding an equivalent number of games for a Tendex-
      > like index, because a new season will bring lots of new player
      > combinations, which will help break up the very strong
      > multicollinearity that sharply reduces the variation that can
      > identify the value of these hundreds of players. There are issues
      > in making use of more than one year of this type of data, but I
      > suspect it will help things a lot. But at the end of the day, I
      > suspect it will still be a lot more noisy than something like a
      > Tendex index.
      > 2. So is the conclusion that something like WINVAL is completely
      > useless? No. The great advantage of this approach is that IMO it
      > is the least biased (in the strict statistical sense) methodology
      > gauging player value of any methodology that I have seen proposed.
      > Unlike other methods that we know leave important features of
      > value, such as defense and a lot of non-assist passing, this method
      > in theory captures much closer to everything that is relevant.
      > (There are still things it misses, but IMO those things are second
      > order relative to the things other methods miss.)
      > That said, being the least biased is only part of the equation.
      > other part is how precisely can we estimate player value with this
      > methodology? What is the variance of the estimates? Well, the
      > upshot is that this methodology is tremendously noisy relative to
      > other methods, which makes it very hard to use.
      > For example, using 2002-03 data there are only about 50 players
      > using this method, we can say with the usual level of certainty
      > in statistics (a 5% rejection region) that those players are
      > significantly better than the average replacement player (players
      > who played less than 513 minutes). On the other hand, using my
      > Tendex-like index we can say that about nearly 200 players.
      > 3. So how can these adjusted plus/minus statistics be used, given
      > how noisy they are? With more data they might become precise
      > to be used on their own, but I think the best way to use them is
      > I used them in the regressions at the very end of my link.
      > http://www.uncg.edu/bae/people/rosenbaum/NBA/wv1.lst~
      > Here I regressed these adjusted plus/minus statistics onto per 40
      > minute statistics. The coefficients on these per 40 minute
      > statistics are estimates of the weights that we should use in
      > indexes that weight various statistics. Now I am not arguing that
      > the linear weights should be pulled straight from these
      > but I think we can learn some things from these regressions.
      > a. The first lesson seems to be that specifications with attempts
      > rather than misses seem to fit better. The second specification
      > which uses two point field goal attempts, three point attempts, and
      > free throw attempts has a higher R-squared that the first
      > specification, even though it uses one less explanatory variable.
      > (Perhaps David Berri was right about that.)
      > b. It appears that rebounds have much less value than usually
      > assumed (especially relative to what Berri assumed) and that steals
      > and blocks are much more valuable. Perhaps these defensive
      > statistics are highly correlated with other unmeasured defensive
      > qualities.
      > Also, even after accounting for points scored, it appears that
      > point misses are far less costly than two point misses. Perhaps
      > having three point shooters on the floor spreads the floor,
      > resulting in fewer turnovers and higher field goal percentages for
      > other players, i.e. things that are not picked up in the three
      > players' own statistics.
      > Note that I tested whether the cost of a two pointer was the same
      > that of a three pointer and the p-value for the test was 0.0024,
      > suggesting that three point attempts for whatever reason are less
      > costly than two points, even after accounting for the points scored
      > on those attempts.
      > Well, that is probably enough for now, since I suspect other things
      > will come up later. I hope this was interesting.
      > Best wishes,
      > Dan
    • Show all 30 messages in this topic