Loading ...
Sorry, an error occurred while loading the content.

226Similarity Scores

Expand Messages
  • Dean Oliver
    Sep 10, 2001
      The concept of similarity scores is one that Bill James viewed as one
      of his most important. His introduction to the method in the '86

      The most important new method to be introduced this year is that of
      similarity scores. Similarity scores are a way of objectively fixing
      the "degree of resemblance" between two players or between two teams.
      Among all the methods that I have developed over the years, this
      method is the most flexible, the most adaptable, the most useful in
      many different contexts....

      The similarity scores begin with the assumption that players who are
      identical in all respects considered will have a similarity score of
      1000. For each difference between the two, there is a "penalty", or
      reduction from the 1000. Similarity scores are designed so that:

      <500 -- players who would not usually be perceived as being
      essentially similar.

      ~600 -- Slight similarities, but major differences

      ~700 -- Important, easily identifiable similarities but also
      significant and obvious differences

      ~800 -- very prominent, obvious similarities, but easily identifiable

      >850 -- substantially similar

      >900 -- very similar

      >950 -- rare, indicating that true similarities are emphasized by
      random chance.


      1. When discussing whether or not a player should be elected to the
      Hall of Fame, one of the key questions to focus on -- probably the
      most important question -- is who are the most similar other players
      and are they in the Hall?

      2. How to measure consistency from season to season

      3. How do we measure the accuracy of career projection methods?

      4. How to make career projections by comparing players of similar
      age to others

      5. Salary negotiations

      6. (A baseball specific thing, involving park factors)

      7. Setting control groups for studies

      8. Constructing theoretical models of players/teams and identifying
      real players/teams similar to the model.


      We started discussing this over in APBR, but I think the details of
      making this work can get technical, so I brought it here. I do think
      this is a major missing factor in basketball and it frustrates me
      that, as easy as it seems to do, I haven't been able to develop
      something like this.

      Baseballreference.com has a list of players and who they are similar
      to -- something that Robert and I have talked about doing for
      basketball eventually. For instance, here is Roberto Alomar


      Within that page are the list of players similar to him overall, 3 of
      which are in the Hall. When comparing players at age 32 to Alomar,
      the list shows 8 HOFers, the other two being Pete Rose and Ryne
      Sandberg, suggesting that Alomar is on track to be in the Hall (even
      as a batter, since these scores don't account for defense).

      This page


      describes how the scores are calculated for baseball career #s. I'd
      think that we could come up with a similar method for basketball.
      The 86 book has the method for comparing seasons.

      One of the problems I had was with redundancy of stats. FG% is
      reflected in FG and FGA, for example. James didn't worry about it
      too much, but I do in basketball.

      To be clear, this is not a rating tool. It doesn't tell you who is
      better or worse; it tells you who is similar. In the old argument of
      Shawn Kemp, perhaps we find that the most similar players to him are
      all out of the HOF -- then that suggests he isn't that great. Maybe
      his best season compares with those put up by Wilt, KMalone, etc.,
      suggesting great seasons.

      Finally, Greg Thomas took a stab at a method for player-careers back
      in the spring Cage Chronicles:


      He did something like MikeT was suggesting, scaling points, etc. by
      some average, then subtracting differences. Kinda interesting and
      not a bad attempt, but some things I'd change/review:

      1. Different scale than specified by James, I think. The scores are
      particularly high (Wilt Chamberlain and Arvydas Sabonis have
      similarity score of 952!!)

      2. Uses only points, assists, and rebounds.

      3. I think he looks at per minute #'s, not career totals.

      4. He standardized by era.

      It's a good first attempt, but I think there is room for improvement.

      Has anyone done anything like this?

      (Another difficulty I've had is in making my Access db do these
      calculations easily.)

      Dean Oliver
      Journal of Basketball Studies
    • Show all 16 messages in this topic