Loading ...
Sorry, an error occurred while loading the content.

229Re: Similarity Scores

Expand Messages
  • Dean Oliver
    Sep 10, 2001
      --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
      > Just some thoughts off the top of my head: I agree that this is
      > that ought to be done for basketball. I'm a little leery of the
      > that's been done in baseball, as Bill James' formula looks like a
      > hand-cooked one. I'm sure its results are reasonable, but I'd like
      to see
      > an approach that's more systematic.

      I've asked Bill about the approach he took and why he took it. I
      don't expect to hear back anytime soon.

      > There are several statistical techniques which I think are directly
      > here. Cluster analysis (the computer looks at the data and divides
      > into groups); discriminant analysis and logistical regression
      > the factors which predict which group a player will be in, e.g.
      Hall of
      > Fame vs non Hall of Fame); and probably the most useful of all:
      > Mahalanobis distance, or some variation thereof.

      > However, Mahalonobis distance does NOT take into account the second
      > problem, that certain variables might deserve more weight than
      > Maybe we WOULD want to "double count" PTs scored, given that it's
      > probably the single most important statistic for a player, or at
      least one
      > of the most important ones for distinguishing all-stars and hall of
      > from journeymen.

      First -- I don't know of Mahalonobis distance stuff. Sounds like
      multivariate regression, though. You may have to do this analysis or
      point me at software that does it. What is it going to spit out?
      Weights on the different stats?

      > Anyway, that's how I would approach the problem. For a first pass,
      I'd do
      > statistics per game, rather than per 48 minutes, per year, or per
      > (career stats would be useless for comparing, say, Steve Francis to
      > Thomas, because Francis' career stats are still so low).

      We could compare Thomas' first 2 years to Francis', a very useful

      >I'd do
      > Mahalanobis distance at first, throwing in all variables (FTA, FTM,
      > FT%) and see if the results looked reasonable. If not, then at
      least I'd
      > have some coefficients to start with, and could start doubling or
      > some.

      I agree, I think (not knowing exactly what the coefficients mean).
      Getting a common starting point is the most important thing I want to
      get out of this discussion. Similarity scores are ultimately
      somewhat subjective, but if we can all start with the same set of
      numbers, at least we have a foundation.

      > Also, after the initial analysis, I'd want to put in some sort of
      > correction for era or game pace. Bob Cousy's 43% career FG% (or
      > it was, I'm saying this off the top of my head) reminds me more of
      > Thomas's 46% than it does Alan Iverson's 43%. Despite the
      > similarity of Cousy's and Iverson's FG%. (Again I'm not vouching
      > those specific numbers, just saying that I'd rather see the numbers
      > context, i.e. corrected for era and/or game pace.)

      I think we want to keep the era correction separate. I can't find
      where he said it, but I know James wanted to keep it separate.

      My one attempt at basketball similarity scores is buried somewhere.
      I looked at player-season comparisons back in '98. The motivation
      was identifying who was similar to Kobe Bryant, since there was so
      much controversy at the time about how good he was going to be.
      I remember finding a lot more self-similarity across players than
      cross-similarity (Bryant's 2nd year resembled his first more than
      it resembled a lot of other players' seasons, for example). The
      player who seemed most similar to Bryant at the time was Allen
      Iverson, but my #'s were weird. I'd say now that Bryant and Iverson
      aren't as similar as Bryant and Jordan or Bryant and VCarter, which I
      think harks at factors I didn't consider -- height and position. (It
      may also hark at false impressions. Jordan's early career numbers
      are MUCH better than Bryant's, even if you account for the 10% drop
      in pace and the 5-8% drop in offensive efficiency.)

      For kicks, here is Iverson 99 and Bryant 2001:

      GS MIN FG FGA FG% fg3m fg3a fg3%
      Iverson 1.0 41.5 9.1 22.0 0.412 1.2 4.1 0.291
      Bryant 1.0 40.9 10.3 22.2 0.464 0.9 2.9 0.305

      7.4 9.9 0.751 1.4 3.5 4.9 4.6 2.0 0.0
      7.0 8.2 0.853 1.5 4.3 5.9 5.0 3.3 0.0

      2.29 3.48 0.15 26.8
      1.68 3.24 0.63 28.5

      A priori, I'd like to call these two seasons in the 700-750 range on
      similarity scores. Easily identifiable similarities, but significant
      and obvious differences.

      For my #'s, I have (in per game stats) for Iverson and Bryant, resp:

      Defensive Stops Def. Net
      ScPoss Poss Fl% Ortg PtsProd /Min /Poss Rtg. Win%
      12.8 25.0 0.511 106.6 26.7 0.182 0.484 97.3 0.820
      12.8 24.4 0.524 110.9 27.0 0.191 0.494 103.0 0.773

      At about the same age, Bryant's offensive skills are more efficient
      than Iverson's, which we would probably all agree on. Defensively,
      it's hard to say.

      Dean Oliver
      Journal of Basketball Studies
    • Show all 16 messages in this topic