## 229Re: Similarity Scores

Expand Messages
• Sep 10, 2001
> Just some thoughts off the top of my head: I agree that this is
something
> that ought to be done for basketball. I'm a little leery of the
work
> that's been done in baseball, as Bill James' formula looks like a
> hand-cooked one. I'm sure its results are reasonable, but I'd like
to see
> an approach that's more systematic.
>

I've asked Bill about the approach he took and why he took it. I
don't expect to hear back anytime soon.

> There are several statistical techniques which I think are directly
useful
> here. Cluster analysis (the computer looks at the data and divides
them
> into groups); discriminant analysis and logistical regression
(determine
> the factors which predict which group a player will be in, e.g.
Hall of
> Fame vs non Hall of Fame); and probably the most useful of all:
> Mahalanobis distance, or some variation thereof.

> However, Mahalonobis distance does NOT take into account the second
> problem, that certain variables might deserve more weight than
others.
> Maybe we WOULD want to "double count" PTs scored, given that it's
> probably the single most important statistic for a player, or at
least one
> of the most important ones for distinguishing all-stars and hall of
famers
> from journeymen.

First -- I don't know of Mahalonobis distance stuff. Sounds like
multivariate regression, though. You may have to do this analysis or
point me at software that does it. What is it going to spit out?
Weights on the different stats?

>
> Anyway, that's how I would approach the problem. For a first pass,
I'd do
> statistics per game, rather than per 48 minutes, per year, or per
career
> (career stats would be useless for comparing, say, Steve Francis to
Isiah
> Thomas, because Francis' career stats are still so low).

We could compare Thomas' first 2 years to Francis', a very useful
comparison.

>I'd do
straight
> Mahalanobis distance at first, throwing in all variables (FTA, FTM,
AND
> FT%) and see if the results looked reasonable. If not, then at
least I'd
> have some coefficients to start with, and could start doubling or
halving
> some.
>

I agree, I think (not knowing exactly what the coefficients mean).
Getting a common starting point is the most important thing I want to
get out of this discussion. Similarity scores are ultimately
somewhat subjective, but if we can all start with the same set of
numbers, at least we have a foundation.

> Also, after the initial analysis, I'd want to put in some sort of
> correction for era or game pace. Bob Cousy's 43% career FG% (or
whatever
> it was, I'm saying this off the top of my head) reminds me more of
Isiah
> Thomas's 46% than it does Alan Iverson's 43%. Despite the
superficial
> similarity of Cousy's and Iverson's FG%. (Again I'm not vouching
for
> those specific numbers, just saying that I'd rather see the numbers
in
> context, i.e. corrected for era and/or game pace.)
>

I think we want to keep the era correction separate. I can't find
where he said it, but I know James wanted to keep it separate.

My one attempt at basketball similarity scores is buried somewhere.
I looked at player-season comparisons back in '98. The motivation
was identifying who was similar to Kobe Bryant, since there was so
much controversy at the time about how good he was going to be.
I remember finding a lot more self-similarity across players than
cross-similarity (Bryant's 2nd year resembled his first more than
it resembled a lot of other players' seasons, for example). The
player who seemed most similar to Bryant at the time was Allen
Iverson, but my #'s were weird. I'd say now that Bryant and Iverson
aren't as similar as Bryant and Jordan or Bryant and VCarter, which I
think harks at factors I didn't consider -- height and position. (It
may also hark at false impressions. Jordan's early career numbers
are MUCH better than Bryant's, even if you account for the 10% drop
in pace and the 5-8% drop in offensive efficiency.)

For kicks, here is Iverson 99 and Bryant 2001:

GS MIN FG FGA FG% fg3m fg3a fg3%
Iverson 1.0 41.5 9.1 22.0 0.412 1.2 4.1 0.291
Bryant 1.0 40.9 10.3 22.2 0.464 0.9 2.9 0.305

FT FTA FT% OR DR TR AST PF DQ
7.4 9.9 0.751 1.4 3.5 4.9 4.6 2.0 0.0
7.0 8.2 0.853 1.5 4.3 5.9 5.0 3.3 0.0

STL TO BLK PTS
2.29 3.48 0.15 26.8
1.68 3.24 0.63 28.5

A priori, I'd like to call these two seasons in the 700-750 range on
similarity scores. Easily identifiable similarities, but significant
and obvious differences.

For my #'s, I have (in per game stats) for Iverson and Bryant, resp:

Defensive Stops Def. Net
ScPoss Poss Fl% Ortg PtsProd /Min /Poss Rtg. Win%
12.8 25.0 0.511 106.6 26.7 0.182 0.484 97.3 0.820
12.8 24.4 0.524 110.9 27.0 0.191 0.494 103.0 0.773

At about the same age, Bryant's offensive skills are more efficient
than Iverson's, which we would probably all agree on. Defensively,
it's hard to say.

Dean Oliver