## Re: Similarity Scores: First Cut (long)

• There were slight errors in my calcs on the previous post. I m working on slight tweaks to the method (weights are too heavy on blocks and steals, I think,
Message 1 of 2 , Sep 13, 2001
There were slight errors in my calcs on the previous post. I'm
working on slight tweaks to the method (weights are too heavy on
blocks and steals, I think, and maybe adding in points). I'll fix
everything when modifications are done.

>
> I looked a little more at James' rules and figured that I wouldn't
> take a more rigorous cut and just adapt what he had. I decided to
> start with season comparisons, since that is the easiest (and I have
> a number of player seasons back to '89 in my db).
>
> Specifically, I noted that James compared player-seasons by first
> starting with games. For every 5 games difference in player season,
> James took off 1 point. Well, a baseball season is roughly half the
> length, so I said 2.5 games difference in basketball player season.

> From there, many of James' numbers appear to be game-average
related.
> For instance, there are about 3.5 at-bats per game, and he seemed
to
> multiply the factor of 5 times 3.5 because, for at-bats, every
> difference of 20 took off 1 point. He did this for runs, hits, etc.
> It appears that he skewed things a little toward the high side,
> though. Like he rounded up to 20, he would round up a lot. For
> correlated numbers (like doubles and triples are correlated to
hits),
> he rounded up much more. An average baseball player probably has
> 20-25 doubles, but James subtracted 1 point for every difference of
> 1.5 doubles (1.5/5*160 ~ 50 > 20). Then he puts his own subjective
> weights in there, too. Differences in strikeouts aren't as
important
> to him as one might expect.
>
> So here are the first set of assignments I made:
>
> Pos ???
> G 2.5
> Min 75
> fgm 12
> fga 27
> fg% 0.001
> fg3m 3
> fg3a 9
> fg3% 0.0035
> ftm 7.5
> fta 10
> ft% 0.002
> oreb 7.5
> dreb 19
> ast 7.5
> stl 1
> tov 6
> blk 1
> pf 10
>
> (I make no adjustment for position at this point, in part because
> it's less well-defined for basketball, in part because some
defensive
> stats are listed, which was not the case for baseball.)
>
> For every 2.5 game difference in a player's season, I subtract 1
> point off of 1000. For every 75 minutes, subtract 1 point. And so
> on.
>
> The hard ones to define were the values for percentages. I
basically
> estimated standard deviations for all my players and saw that the SD
> for fg3% was about 3.5x that of the SD for fg% and multiplied.
>
> Frankly, it seems like a decent start. I first worked with Kobe
> Bryant's 2001 season. (You'll notice that comparisons to the 1999
> shortened season never show up.) Here is the list of most similar
> seasons:
>
> Player Team Season Score
> Hill,Grant det 2000 871
> richmond,mitch gol 1991 865
> Bryant,Kobe lal 2000 865
> wilkins,dominiq lac 1994 849
> robinson,glenn mil 2001 842
> richmond,mitch gol 1990 832
> richmond,mitch sac 1994 831
> tripucka,kelly cha 1989 830
> mullin,chris gol 1990 828
> barkley,charles pho 1996 822
>
> A lot of decent players here, but no perfect matches (no Jordan or
> Vince Carter, either). You'll recall the scale:
>
> <500 -- players who would not usually be perceived as being
> essentially similar.
> ~600 -- Slight similarities, but major differences
> ~700 -- Important, easily identifiable similarities but also
> significant and obvious differences
> ~800 -- very prominent, obvious similarities, but easily
identifiable
> distinctions.
> >850 -- substantially similar
> >900 -- very similar
> >950 -- rare, indicating that true similarities are emphasized by
> random chance.
>
> Now here is Shaq 2001 comparison:
>
> Player Team Season Score
> o'neal,shaquill orl 1995 915
> O'Neal,Shaquill lal 2000 914
> o'neal,shaquill orl 1994 855
> webber,chris gsw 1994 774
> mutombo,dikembe den 1994 762
> o'neal,shaquill lal 1998 752
> duncan,tim san 1998 738
> robinson,david san 1990 725
> o'neal,shaquill orl 1993 715
> Mourning,Alonzo mia 2000 715
>
>
> He's similar to himself a lot of times. But really, no one is that
> similar to him. Webber, Mutombo, Duncan, Robinson, and Mourning
have
> some similarity. I kinda like that.
>
> Jamal Mashburn's 2001 season (who is a good, but not great player):
>
> Player Team Season Score
> payton,gary sea 2001 899
> jackson,jimmy njn 1997 879
> hardaway,tim mia 1998 868
> drexler,clyde hou 1998 863
> marbury,stephon min 1997 862
> finley,michael dal 1998 860
> johnson,larry cha 1995 860
> finley,michael dal 2001 858
> hardaway,tim gsw 1993 856
> anderson,kenny njn 1995 856
>
> Here is Eric Williams' 2001 season:
>
> Player Team Season Score
> Fisher,Derek lal 2000 907
> cummings,vontee gsw 2001 889
> rivers,doc san 1996 884
> horry,robert lal 2001 877
> johnson,anthony sac 1998 875
> roth,scott min 1990 869
> rivers,doc san 1995 866
> johnson,dermarr atl 2001 864
> hunter,lindsey det 1996 854
> mashburn,jamal mia 1997 853
>
> Williams is kind of an ordinary player. I think I'd want higher
> scores to reflect general similarity with an ordinary guy. I think,
> in general, my scores are a little too low. (Not even mentioning
> that it is a very eclectic group that is similar to him here.)
>
> I need to do a little bit of side-by-side seasonal comparison of #'s
> (something I didn't do up here for you) to see where my subjective
> comparisons would change the scores. But I am generally pretty
happy
> with this first cut. There will be some tweaking, I'm sure, to make
> me a lot happier -- and, given that my travel plans have been
trashed
> this week, I might be able to work on the 2nd cut.
>
> Some other comparisons for fun.
>
> Jordan '91:
>
> Player Team Season SimScore
> jordan,michael chi 1992 891
> jordan,michael chi 1990 844
> chambers,tom pho 1990 842
> jordan,michael chi 1989 831
> jordan,michael chi 1993 820
> mullin,chris gol 1991 805
> jordan,michael chi 1996 799
> Malone,Karl uta 2000 792
> mullin,chris gsw 1992 787
> richmond,mitch gol 1990 783
>
> Olajuwon '94:
>
> Player Team Season SimScore
> ewing,patrick nyk 1993 823
> ewing,patrick nyk 1994 819
> ewing,patrick nyk 1995 807
> ewing,patrick nyk 1990 804
> ewing,patrick nyk 1992 795
> olajuwon,hakeem hou 1996 787
> kemp,shawn sea 1997 781
> malone,karl uta 1995 780
> baker,vin mil 1997 779
> mourning,alonzo cha 1995 779
>
> Kemp '96:
>
> Player Team Season SimScore
> kemp,shawn sea 1997 837
> kemp,shawn sea 1995 819
> kemp,shawn sea 1994 798
> malone,karl uta 1990 779
> thorpe,otis hou 1991 773
> mourning,alonzo cha 1995 763
> olajuwon,hakeem hou 1994 761
> ewing,patrick nyk 1990 747
> malone,karl uta 1989 745
> mourning,alonzo mia 1996 742
>
> Notice no similarity to Kemp post-'97. Hmmm.
>
> Reggie Miller '96:
>
> Player Team Season SimScore
> hawkins,hersey phi 1993 901
> miller,reggie ind 1993 897
> hawkins,hersey phi 1992 895
> richmond,mitch sac 1998 894
> porter,terry por 1992 889
> elliott,sean san 1996 882
> miller,reggie ind 1995 878
> Allen,Ray mil 2000 875
> miller,reggie ind 1998 875
> hawkins,hersey sea 1996 868
>
>
> Last, but not least, Iverson 2001:
>
> Player Team Season SimScore
> Iverson,Allen phi 2000 865
> sprewell,latrel gsw 1996 825
> mashburn,jamal dal 1995 813
> Stackhouse,Jerr det 2000 798
> marbury,stephon njn 2001 795
> stoudamire,damo por 1998 795
> sprewell,latrel gsw 1995 794
> sprewell,latrel gsw 1994 793
> Marbury,Stephon njn 2000 790
> dumars,joe det 1995 783
>
> Weird mix, but no other MVP like seasons....
>
>
> Dean Oliver