Please pardon me for this argument that I am going to make. I have
not been at this as long as most of you have, so I apologize if I
step on any toes. This is a pretty long post, and hopefully folks
will find it of some value.
The linear weights approaches generally come in three forms. The
first, such as the NBA's efficiency statistic, simple counts up the
good things and subtracts the bad things. Since +1 and -1 are not
the only possible linear weights one could choose to use, there are
not many that defend such a weighting scheme.
The second approach is what I will call the possessions-based
approach. The essence of this approach is to count every
contribution to either points scored or a failed possession and to
count it only once. This is certainly the approach used to
construct John Hollinger's PER and its lies behind the construction
of Dean Oliver's offensive and defensive ratings. Also, a large
fraction of the arguments on this board are about the proper way to
do this possessions-based accounting.
So what is wrong with this approach? The problem is that there are
numerous contributions to successful or failed possessions for which
there are no statistics - a good pick, an ineffective blockout, a
good entry pass that leads to a score but not an assist, the
presence of a shot blocker that keeps his opponents from driving to
the hoop. One could easily argue that the unmeasured contributions
to successful or failed possessions are more than the measured
contributions, e.g. points, assists, steals, etc.
This is one place where basketball statistics diverge from baseball
statistics. In baseball, the unmeasured contributions to a run
scored are a much smaller proportion of the total contributions.
Things like good baserunning, the effect of a base-stealing threat
on the pitches a batter sees, the effect a strong-armed right
fielder has on a batter in a sacrifice fly situation are difficult
to measure, at best. But these things are far less important in
measuring run production that the unmeasured contributions in
basketball are in measuring point production.
This possessions-based point production approach is one of the
concepts we have borrowed from baseball and Bill James. And IMO I
don't think it fits as well in basketball. So many things are
unmeasured in basketball that when coming up with a linear weight to
put in front of blocks, we don't want to only account for the missed
shot it produces (and its appropriate probability of becoming a
defensive rebound). We also want that weight to reflect the lower
field goal percentage for those guys who arch the ball higher when
coming in the lane or who don't come in the lane for fear that their
shot will be blocked. On the other hand, we want it to reflect that
guys who block shots may be more susceptible to pump fakes.
Offensive rebounds without question help a team, but that is not the
only consideration when trying to figure out what linear weight to
put in front of offensive rebounds. What about if guys who get lots
of offensive rebounds tend to be non-factors on offense who clog the
lane and make it harder for their teammates to score? If that is
the case, it is possible that the proper linear weight for an
offensive rebound could even be negative.
Similar arguments could be made for negative linear weights on
steals or positive linear weights on free throw attempts or three
point attempts or personal fouls. The point here is that the proper
linear weights boil down to be being an empirical question and not a
matter of logic. The question is how can we estimate the proper
(And this ignores the whole issue of the proper functional form, but
we will leave that for another day.)
This gets us to the third approach, using some type of estimation
approach to estimate the proper linear weights. This is the
approach that David Berri used to estimate linear weights. He ran
regressions using team-level data and then applied the weights
estimated from that data to individual players. This is another
technique borrowed from baseball, and IMO another case where an
approach that works well in baseball does not work very well in
At the team-level, the benefit of an assisted basket is fully
subsumed in a better field goal percentage. The issue of how much
of a field goal to attribute to the assist just doesn't come up.
The benefits of having players who use a lot of possessions fairly
efficiently allowing their teammates to stick to more high
percentage field goal attempts also does not come up. Ironically,
much of what counts as team play is ignored using this team-level
approach to estimating linear weights. (And this ignores other
complications that this approach entails.) All told, these problems
are pretty severe and I would not be surprised if the results of
this approach were worse than those for even something like the
NBA's efficiency statistic.
So this gets me to the approach that I outlined in my WINVAL stuff.
The approach basically estimates plus/minus ratings adjusted for
home court advantage and then other players sharing the floor with
the given player. Then I regress these adjusted plus/minus ratings
on various statistics (adjusted for pace).
Here is what I get. (The estimated linear weights are in the
PARAMETER ESTIMATE column.)
PTSP40 = points scored per 40 minutes
FG2AP40 = two point field goals attempted per 40 minutes
TAP40 = three pointers attempted per 40 minutes
FTAP40 = free throws attempted per 40 minutes
ASP40 = assists per 40 minutes
ORP40 = offensive rebounds per 40 minutes
DRP40 = defensive rebounds per 40 minutes
TOP40 = turnovers per 40 minutes
STP40 = steals per 40 minutes
BKP40 = blocks per 40 minutes
PFP40 = personal fouls per 40 minutes
Root MSE 183.20337 R-square 0.4407
Dep Mean 4.88730 Adj R-sq 0.4201
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 -11.792794 2.20556250 -5.347 0.0001
PTSP40 1 0.935041 0.25573422 3.656 0.0003
FG2AP40 1 -0.726939 0.25623989 -2.837 0.0049
TAP40 1 -0.128389 0.33902488 -0.379 0.7052
FTAP40 1 0.104796 0.32231296 0.325 0.7453
ASP40 1 0.795411 0.19939588 3.989 0.0001
ORP40 1 0.458335 0.43008193 1.066 0.2874
DRP40 1 0.570479 0.22121421 2.579 0.0104
TOP40 1 -0.717767 0.56808664 -1.263 0.2074
STP40 1 2.482385 0.57789004 4.296 0.0001
BKP40 1 2.021543 0.41889403 4.826 0.0001
PFP40 1 -0.004383 0.32491006 -0.013 0.9892
As you can tell, these estimates differ in a lot of ways from the
linear weights commonly used. Steals and blocks are much more
heavily weighted. Rebounds are weighted less. Free throw attempts
seem to have the wrong sign, and personal fouls almost do.
But again the approach here is different. The question I am asking
here is not whether when a guy is at line and misses it, should it
count against him? (In this regression it does not.) The question
is do players who get to the foul line and miss tend, holding the
other statistics constant, tend to result in that team's players
score more than its opponents. That is a very different question
than the possessions-based approach takes.
Now this approach is not the be all and end all, and I think there
are hybrids of the two approaches that may be better. For example,
the possessions-based statistics may be better to put in this
regression that what I currently have in there. And there are
things that focusing on little chunks of time misses, such as fouls
generated that help in later chunks of the game.
But the bigger point is this. Given the large fraction of
unmeasured contributions in basketball, IMO linear weights are
really an empirical questions and logic can only get us so far.
And with that, the prosecution rests. :)