--- In APBR_analysis@yahoogroups.com
, "roland_beech" <roland@t...> wrote:
> well on the subject of the accuracy of the 82games data I can say
> the 03-04 season information is much cleaner than the 02-03
> there are a few known bugs in our databases for this year that we
> are slowly (but hopefully surely) getting around to fixing, and some
> known bugs in the 02-03 databases that we are unlikely to fix in the
> near term.
> there are also a number of issues on how to handle things. For
> instance with regards to plus/minus currently the numbers are
> displayed on a "literalist" basis, which is to say that whatever
> five players are on the court for each team when points are scored,
> are the players to whom the +/- is credited (debited?)
> now this might seem logical but it is arguably not the optimal or
> right way to do it. The most common circumstance that could be
> challenged is when a player is fouled and going to the free throw
> line, but substitutions are made before he completes his two shots.
> In such event players coming into the game are 'on the hook' for his
> freebies even though they were not on court when he was fouled, and
> likewise the players leaving may be let off the hook (particularly
> if the guy who committed the foul fouled out!) We have the
> capability to track the plus/minus "the right way" and probably will
> at some point, but it hasn't happened yet. Here's another fun
> case: (my memory may be erratic) Welsch was fouled but hurt, so he
> had to leave the game and Pierce entered and shot the free throws
> for him...now do you give Pierce's plus/minus free throw points to
> Welsch and not to Pierce even though Pierce is the one actually
> making or missing the shot? I'm not even sure if the NBA box score
> would credit Welsch with the points somehow??
> the theory on some levels is that the plus/minus oddities average
> out over the course of a season (although they very well might not
> for some players who are only subbed in a certain way). Another
> issue is at the end of a close game you commonly see players subbed
> in on offense and out on defense (and vice versa) on timeouts.
> a raw plus/minus already has a number of drawbacks to begin with:
> who was on the court for the opponents, and your team with you, did
> you come into the game when the other team/your team was in the
> penalty situation, garbage time minutes versus clutch, etc...
> As a result we may move away from the +/- towards an on court team
> offensive points per possession and defensive points per possession
> delta, where we actually count the real possessions as opposed to
> estimating with a formula. You can see an estimated version on the
> on court/off court page for each player.
> To sum it up, basically we try hard to make the data, both in the
> databases and on the web pages (two different things) as accurate as
> possible, but undoubtedly there will be some errors that creep in.
> We are always happy (well, usually happy) for people to point out
> errors they might come across.
Let me again preface my remarks by singing the praises of the 82games site. I could
hardly have hoped for a more wonderful resource for the stats-interested NBA fan.
That said, let me try to clarify (and elaborate) my concerns. My overarching concern is
that I want to be sure that appearances are reality. Toward that end two separate
things would offer such assurance:
First, a detailed disclosure of methods (perhaps a permanent link on the homepage?)
Call it an academic desire (necessity really) to read the footnotes and not just the
abridged version of the story. Obviously, given the simplicity of the +/- stat, one
need not go into great detail on how that is calculated. More to the point, I am
interested in seeing how the raw game data is gathered and the methods and
numbers of people involved in transcribing it to the utilized format. Additionally, for
stats involving some degree of interpretation, say, distance from basket on a shot
attempt, or, say, possibly, estimating the degree of defensive opposition to a given
shot, I would like to see a written formal definition of these terms and how they are
Second, the ability to check the data (hence the previous request for the stats form).
This is of course time consuming for those "attending the seminar", but ultimately
necessary. And let me be clear, there is nothing that has been said or unsaid that
would lead me to believe that the posted results are inaccurate; to the contrary, the
apparent attention to detail in definitions, as expressed above, would suggest that the
data are darn good. However, I would like to see the point confirmed and also to see
to what degree measurement error is a factor (especially for the statistical categories
where some discretion is involved) and if so how much.