Loading ...
Sorry, an error occurred while loading the content.

Similarity Scores: First Cut (long)

Expand Messages
  • Dean Oliver
    I looked a little more at James rules and figured that I wouldn t take a more rigorous cut and just adapt what he had. I decided to start with season
    Message 1 of 2 , Sep 13, 2001
    • 0 Attachment
      I looked a little more at James' rules and figured that I wouldn't
      take a more rigorous cut and just adapt what he had. I decided to
      start with season comparisons, since that is the easiest (and I have
      a number of player seasons back to '89 in my db).

      Specifically, I noted that James compared player-seasons by first
      starting with games. For every 5 games difference in player season,
      James took off 1 point. Well, a baseball season is roughly half the
      length, so I said 2.5 games difference in basketball player season.
      From there, many of James' numbers appear to be game-average related.
      For instance, there are about 3.5 at-bats per game, and he seemed to
      multiply the factor of 5 times 3.5 because, for at-bats, every
      difference of 20 took off 1 point. He did this for runs, hits, etc.
      It appears that he skewed things a little toward the high side,
      though. Like he rounded up to 20, he would round up a lot. For
      correlated numbers (like doubles and triples are correlated to hits),
      he rounded up much more. An average baseball player probably has
      20-25 doubles, but James subtracted 1 point for every difference of
      1.5 doubles (1.5/5*160 ~ 50 > 20). Then he puts his own subjective
      weights in there, too. Differences in strikeouts aren't as important
      to him as one might expect.

      So here are the first set of assignments I made:

      Pos ???
      G 2.5
      Min 75
      fgm 12
      fga 27
      fg% 0.001
      fg3m 3
      fg3a 9
      fg3% 0.0035
      ftm 7.5
      fta 10
      ft% 0.002
      oreb 7.5
      dreb 19
      ast 7.5
      stl 1
      tov 6
      blk 1
      pf 10

      (I make no adjustment for position at this point, in part because
      it's less well-defined for basketball, in part because some defensive
      stats are listed, which was not the case for baseball.)

      For every 2.5 game difference in a player's season, I subtract 1
      point off of 1000. For every 75 minutes, subtract 1 point. And so
      on.

      The hard ones to define were the values for percentages. I basically
      estimated standard deviations for all my players and saw that the SD
      for fg3% was about 3.5x that of the SD for fg% and multiplied.

      Frankly, it seems like a decent start. I first worked with Kobe
      Bryant's 2001 season. (You'll notice that comparisons to the 1999
      shortened season never show up.) Here is the list of most similar
      seasons:

      Player Team Season Score
      Hill,Grant det 2000 871
      richmond,mitch gol 1991 865
      Bryant,Kobe lal 2000 865
      wilkins,dominiq lac 1994 849
      robinson,glenn mil 2001 842
      richmond,mitch gol 1990 832
      richmond,mitch sac 1994 831
      tripucka,kelly cha 1989 830
      mullin,chris gol 1990 828
      barkley,charles pho 1996 822

      A lot of decent players here, but no perfect matches (no Jordan or
      Vince Carter, either). You'll recall the scale:

      <500 -- players who would not usually be perceived as being
      essentially similar.
      ~600 -- Slight similarities, but major differences
      ~700 -- Important, easily identifiable similarities but also
      significant and obvious differences
      ~800 -- very prominent, obvious similarities, but easily identifiable
      distinctions.
      >850 -- substantially similar
      >900 -- very similar
      >950 -- rare, indicating that true similarities are emphasized by
      random chance.

      Now here is Shaq 2001 comparison:

      Player Team Season Score
      o'neal,shaquill orl 1995 915
      O'Neal,Shaquill lal 2000 914
      o'neal,shaquill orl 1994 855
      webber,chris gsw 1994 774
      mutombo,dikembe den 1994 762
      o'neal,shaquill lal 1998 752
      duncan,tim san 1998 738
      robinson,david san 1990 725
      o'neal,shaquill orl 1993 715
      Mourning,Alonzo mia 2000 715


      He's similar to himself a lot of times. But really, no one is that
      similar to him. Webber, Mutombo, Duncan, Robinson, and Mourning have
      some similarity. I kinda like that.

      Jamal Mashburn's 2001 season (who is a good, but not great player):

      Player Team Season Score
      payton,gary sea 2001 899
      jackson,jimmy njn 1997 879
      hardaway,tim mia 1998 868
      drexler,clyde hou 1998 863
      marbury,stephon min 1997 862
      finley,michael dal 1998 860
      johnson,larry cha 1995 860
      finley,michael dal 2001 858
      hardaway,tim gsw 1993 856
      anderson,kenny njn 1995 856

      Here is Eric Williams' 2001 season:

      Player Team Season Score
      Fisher,Derek lal 2000 907
      cummings,vontee gsw 2001 889
      rivers,doc san 1996 884
      horry,robert lal 2001 877
      johnson,anthony sac 1998 875
      roth,scott min 1990 869
      rivers,doc san 1995 866
      johnson,dermarr atl 2001 864
      hunter,lindsey det 1996 854
      mashburn,jamal mia 1997 853

      Williams is kind of an ordinary player. I think I'd want higher
      scores to reflect general similarity with an ordinary guy. I think,
      in general, my scores are a little too low. (Not even mentioning
      that it is a very eclectic group that is similar to him here.)

      I need to do a little bit of side-by-side seasonal comparison of #'s
      (something I didn't do up here for you) to see where my subjective
      comparisons would change the scores. But I am generally pretty happy
      with this first cut. There will be some tweaking, I'm sure, to make
      me a lot happier -- and, given that my travel plans have been trashed
      this week, I might be able to work on the 2nd cut.

      Some other comparisons for fun.

      Jordan '91:

      Player Team Season SimScore
      jordan,michael chi 1992 891
      jordan,michael chi 1990 844
      chambers,tom pho 1990 842
      jordan,michael chi 1989 831
      jordan,michael chi 1993 820
      mullin,chris gol 1991 805
      jordan,michael chi 1996 799
      Malone,Karl uta 2000 792
      mullin,chris gsw 1992 787
      richmond,mitch gol 1990 783

      Olajuwon '94:

      Player Team Season SimScore
      ewing,patrick nyk 1993 823
      ewing,patrick nyk 1994 819
      ewing,patrick nyk 1995 807
      ewing,patrick nyk 1990 804
      ewing,patrick nyk 1992 795
      olajuwon,hakeem hou 1996 787
      kemp,shawn sea 1997 781
      malone,karl uta 1995 780
      baker,vin mil 1997 779
      mourning,alonzo cha 1995 779

      Kemp '96:

      Player Team Season SimScore
      kemp,shawn sea 1997 837
      kemp,shawn sea 1995 819
      kemp,shawn sea 1994 798
      malone,karl uta 1990 779
      thorpe,otis hou 1991 773
      mourning,alonzo cha 1995 763
      olajuwon,hakeem hou 1994 761
      ewing,patrick nyk 1990 747
      malone,karl uta 1989 745
      mourning,alonzo mia 1996 742

      Notice no similarity to Kemp post-'97. Hmmm.

      Reggie Miller '96:

      Player Team Season SimScore
      hawkins,hersey phi 1993 901
      miller,reggie ind 1993 897
      hawkins,hersey phi 1992 895
      richmond,mitch sac 1998 894
      porter,terry por 1992 889
      elliott,sean san 1996 882
      miller,reggie ind 1995 878
      Allen,Ray mil 2000 875
      miller,reggie ind 1998 875
      hawkins,hersey sea 1996 868


      Last, but not least, Iverson 2001:

      Player Team Season SimScore
      Iverson,Allen phi 2000 865
      sprewell,latrel gsw 1996 825
      mashburn,jamal dal 1995 813
      Stackhouse,Jerr det 2000 798
      marbury,stephon njn 2001 795
      stoudamire,damo por 1998 795
      sprewell,latrel gsw 1995 794
      sprewell,latrel gsw 1994 793
      Marbury,Stephon njn 2000 790
      dumars,joe det 1995 783

      Weird mix, but no other MVP like seasons....


      Dean Oliver
      Journal of Basketball Studies
    • Dean Oliver
      There were slight errors in my calcs on the previous post. I m working on slight tweaks to the method (weights are too heavy on blocks and steals, I think,
      Message 2 of 2 , Sep 13, 2001
      • 0 Attachment
        There were slight errors in my calcs on the previous post. I'm
        working on slight tweaks to the method (weights are too heavy on
        blocks and steals, I think, and maybe adding in points). I'll fix
        everything when modifications are done.


        --- In APBR_analysis@y..., "Dean Oliver" <deano@t...> wrote:
        >
        > I looked a little more at James' rules and figured that I wouldn't
        > take a more rigorous cut and just adapt what he had. I decided to
        > start with season comparisons, since that is the easiest (and I have
        > a number of player seasons back to '89 in my db).
        >
        > Specifically, I noted that James compared player-seasons by first
        > starting with games. For every 5 games difference in player season,
        > James took off 1 point. Well, a baseball season is roughly half the
        > length, so I said 2.5 games difference in basketball player season.

        > From there, many of James' numbers appear to be game-average
        related.
        > For instance, there are about 3.5 at-bats per game, and he seemed
        to
        > multiply the factor of 5 times 3.5 because, for at-bats, every
        > difference of 20 took off 1 point. He did this for runs, hits, etc.
        > It appears that he skewed things a little toward the high side,
        > though. Like he rounded up to 20, he would round up a lot. For
        > correlated numbers (like doubles and triples are correlated to
        hits),
        > he rounded up much more. An average baseball player probably has
        > 20-25 doubles, but James subtracted 1 point for every difference of
        > 1.5 doubles (1.5/5*160 ~ 50 > 20). Then he puts his own subjective
        > weights in there, too. Differences in strikeouts aren't as
        important
        > to him as one might expect.
        >
        > So here are the first set of assignments I made:
        >
        > Pos ???
        > G 2.5
        > Min 75
        > fgm 12
        > fga 27
        > fg% 0.001
        > fg3m 3
        > fg3a 9
        > fg3% 0.0035
        > ftm 7.5
        > fta 10
        > ft% 0.002
        > oreb 7.5
        > dreb 19
        > ast 7.5
        > stl 1
        > tov 6
        > blk 1
        > pf 10
        >
        > (I make no adjustment for position at this point, in part because
        > it's less well-defined for basketball, in part because some
        defensive
        > stats are listed, which was not the case for baseball.)
        >
        > For every 2.5 game difference in a player's season, I subtract 1
        > point off of 1000. For every 75 minutes, subtract 1 point. And so
        > on.
        >
        > The hard ones to define were the values for percentages. I
        basically
        > estimated standard deviations for all my players and saw that the SD
        > for fg3% was about 3.5x that of the SD for fg% and multiplied.
        >
        > Frankly, it seems like a decent start. I first worked with Kobe
        > Bryant's 2001 season. (You'll notice that comparisons to the 1999
        > shortened season never show up.) Here is the list of most similar
        > seasons:
        >
        > Player Team Season Score
        > Hill,Grant det 2000 871
        > richmond,mitch gol 1991 865
        > Bryant,Kobe lal 2000 865
        > wilkins,dominiq lac 1994 849
        > robinson,glenn mil 2001 842
        > richmond,mitch gol 1990 832
        > richmond,mitch sac 1994 831
        > tripucka,kelly cha 1989 830
        > mullin,chris gol 1990 828
        > barkley,charles pho 1996 822
        >
        > A lot of decent players here, but no perfect matches (no Jordan or
        > Vince Carter, either). You'll recall the scale:
        >
        > <500 -- players who would not usually be perceived as being
        > essentially similar.
        > ~600 -- Slight similarities, but major differences
        > ~700 -- Important, easily identifiable similarities but also
        > significant and obvious differences
        > ~800 -- very prominent, obvious similarities, but easily
        identifiable
        > distinctions.
        > >850 -- substantially similar
        > >900 -- very similar
        > >950 -- rare, indicating that true similarities are emphasized by
        > random chance.
        >
        > Now here is Shaq 2001 comparison:
        >
        > Player Team Season Score
        > o'neal,shaquill orl 1995 915
        > O'Neal,Shaquill lal 2000 914
        > o'neal,shaquill orl 1994 855
        > webber,chris gsw 1994 774
        > mutombo,dikembe den 1994 762
        > o'neal,shaquill lal 1998 752
        > duncan,tim san 1998 738
        > robinson,david san 1990 725
        > o'neal,shaquill orl 1993 715
        > Mourning,Alonzo mia 2000 715
        >
        >
        > He's similar to himself a lot of times. But really, no one is that
        > similar to him. Webber, Mutombo, Duncan, Robinson, and Mourning
        have
        > some similarity. I kinda like that.
        >
        > Jamal Mashburn's 2001 season (who is a good, but not great player):
        >
        > Player Team Season Score
        > payton,gary sea 2001 899
        > jackson,jimmy njn 1997 879
        > hardaway,tim mia 1998 868
        > drexler,clyde hou 1998 863
        > marbury,stephon min 1997 862
        > finley,michael dal 1998 860
        > johnson,larry cha 1995 860
        > finley,michael dal 2001 858
        > hardaway,tim gsw 1993 856
        > anderson,kenny njn 1995 856
        >
        > Here is Eric Williams' 2001 season:
        >
        > Player Team Season Score
        > Fisher,Derek lal 2000 907
        > cummings,vontee gsw 2001 889
        > rivers,doc san 1996 884
        > horry,robert lal 2001 877
        > johnson,anthony sac 1998 875
        > roth,scott min 1990 869
        > rivers,doc san 1995 866
        > johnson,dermarr atl 2001 864
        > hunter,lindsey det 1996 854
        > mashburn,jamal mia 1997 853
        >
        > Williams is kind of an ordinary player. I think I'd want higher
        > scores to reflect general similarity with an ordinary guy. I think,
        > in general, my scores are a little too low. (Not even mentioning
        > that it is a very eclectic group that is similar to him here.)
        >
        > I need to do a little bit of side-by-side seasonal comparison of #'s
        > (something I didn't do up here for you) to see where my subjective
        > comparisons would change the scores. But I am generally pretty
        happy
        > with this first cut. There will be some tweaking, I'm sure, to make
        > me a lot happier -- and, given that my travel plans have been
        trashed
        > this week, I might be able to work on the 2nd cut.
        >
        > Some other comparisons for fun.
        >
        > Jordan '91:
        >
        > Player Team Season SimScore
        > jordan,michael chi 1992 891
        > jordan,michael chi 1990 844
        > chambers,tom pho 1990 842
        > jordan,michael chi 1989 831
        > jordan,michael chi 1993 820
        > mullin,chris gol 1991 805
        > jordan,michael chi 1996 799
        > Malone,Karl uta 2000 792
        > mullin,chris gsw 1992 787
        > richmond,mitch gol 1990 783
        >
        > Olajuwon '94:
        >
        > Player Team Season SimScore
        > ewing,patrick nyk 1993 823
        > ewing,patrick nyk 1994 819
        > ewing,patrick nyk 1995 807
        > ewing,patrick nyk 1990 804
        > ewing,patrick nyk 1992 795
        > olajuwon,hakeem hou 1996 787
        > kemp,shawn sea 1997 781
        > malone,karl uta 1995 780
        > baker,vin mil 1997 779
        > mourning,alonzo cha 1995 779
        >
        > Kemp '96:
        >
        > Player Team Season SimScore
        > kemp,shawn sea 1997 837
        > kemp,shawn sea 1995 819
        > kemp,shawn sea 1994 798
        > malone,karl uta 1990 779
        > thorpe,otis hou 1991 773
        > mourning,alonzo cha 1995 763
        > olajuwon,hakeem hou 1994 761
        > ewing,patrick nyk 1990 747
        > malone,karl uta 1989 745
        > mourning,alonzo mia 1996 742
        >
        > Notice no similarity to Kemp post-'97. Hmmm.
        >
        > Reggie Miller '96:
        >
        > Player Team Season SimScore
        > hawkins,hersey phi 1993 901
        > miller,reggie ind 1993 897
        > hawkins,hersey phi 1992 895
        > richmond,mitch sac 1998 894
        > porter,terry por 1992 889
        > elliott,sean san 1996 882
        > miller,reggie ind 1995 878
        > Allen,Ray mil 2000 875
        > miller,reggie ind 1998 875
        > hawkins,hersey sea 1996 868
        >
        >
        > Last, but not least, Iverson 2001:
        >
        > Player Team Season SimScore
        > Iverson,Allen phi 2000 865
        > sprewell,latrel gsw 1996 825
        > mashburn,jamal dal 1995 813
        > Stackhouse,Jerr det 2000 798
        > marbury,stephon njn 2001 795
        > stoudamire,damo por 1998 795
        > sprewell,latrel gsw 1995 794
        > sprewell,latrel gsw 1994 793
        > Marbury,Stephon njn 2000 790
        > dumars,joe det 1995 783
        >
        > Weird mix, but no other MVP like seasons....
        >
        >
        > Dean Oliver
        > Journal of Basketball Studies
      Your message has been successfully submitted and would be delivered to recipients shortly.