## Re: [APBR_analysis] Re: Similarity Scores

Expand Messages
• ... [...] ... The weights are not scalar, but are instead implicitly contained in a matrix. I usually use a stat package called SPSS but I just checked and
Message 1 of 16 , Sep 12, 2001
On Tue, 11 Sep 2001, Dean Oliver wrote:

[...]

> First -- I don't know of Mahalonobis distance stuff. Sounds like
> multivariate regression, though. You may have to do this analysis or
> point me at software that does it. What is it going to spit out?
> Weights on the different stats?

The weights are not scalar, but are instead implicitly contained in a
matrix. I usually use a stat package called SPSS but I just checked and
surprisingly, although Mahalanobis distance is calculated and used in a
number of statistics that it calculates, it doesn't have a command for
simply computing good ol' Mahalanobis distance.

However, the formula for Mahalanobis distance is pretty simple. Let x and
y be vectors of the variables that we're measuring, for two different
players. E.g. "x" might be Magic's pts per game, assists per game, FG%,
asst/TO ratio, min/game, etc. etc etc. "y" would be the same stats, but
for Larry Bird.

Let S stand for the covariance matrix of all players' stats. (For an
example of how to calculate the elements of the covariance matrix, see

http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm

).

Then the Mahalanobis distance is simply (x-y)S^-1(x-y) in matrix
notation. (The "x-y" are vectors, and S^-1 is the inverse of S.)

Here's a web-page with some other distance metrics:

http://www.mathworks.com/access/helpdesk/help/toolbox/stats/pdist.shtml

However all of these view all the variables as being essentially equally
important. Hence the possible need for weighting. Or the use of some
outside rating system or external criteria (e.g. Hall of Fame vs non Hall
of Fame status, and we could use discriminant analysis or logistic
regression to calculate the coefficients for predicting HoF status).

[...career comparisons]

>We could compare Thomas' first 2 years to Francis', a very useful
>comparison.

Yes, good point. Although if we're doing career totals, we'd presumably
still want a correction for 82-game seasons vs 72-game seasons.

> I think we want to keep the era correction separate. I can't find
> where he said it, but I know James wanted to keep it separate.

Yes, probably best done, as you suggest elsewhere, by having two sets of
similarity stats: "absolute" and "relative" (or "corrected", or
"standardized" or whatever we want to call them).

--MKT
• ... Excellent move, Dean ... This is one reason I have concentrated on combining all scoring- related data into one scoring ability number. It seems quite
Message 2 of 16 , Sep 12, 2001
--- In APBR_analysis@y..., "Dean Oliver" <deano@t...> wrote:
> We started discussing this over in APBR, but I think the details of
> making this work can get technical, so I brought it here.

Excellent move, Dean
>
> One of the problems I had was with redundancy of stats. FG% is
> reflected in FG and FGA, for example. James didn't worry about it
> too much, but I do in basketball.
>
This is one reason I have concentrated on combining all scoring-
related data into one "scoring ability" number. It seems quite clear
to me that "points is points", and likewise, attempts are attempts
(or possessions used up). Thus the "scoring efficiency", which I
believe is also a term used in another way, and which implies to me
that it includes turnovers incurred while attempting to score,
offensive fouls, and the "ability to get a shot off"; so I might
prefer to call Pts/(Attempts*2) something like "scoring percentage".
I also feel comfortable with using a player's ScoPct/.527
(historical standard ScoPct) as a number to factor into a player's
points-per-minute rate. I justify this by noting that a high-
scoring, low-percent scorer on a weak team would just have to shoot
less (and take higher-percentage shots) on a better team.
Conversely, a low-scoring, high-percentage shooter on a good team
would almost certainly be asked to take more shots on a weaker team.
Generally, his percentage would go down, but possibly his "scoring
ability" number would be fairly constant as he moves from team to
team.
Ty Corbin had such a career spell, as he went from a go-to guy on
the woeful Wolves, to a contributor on the contending Jazz; his
minutes and ppg went rollercoastering, but his measurable 'scoring
ability' was pretty constant.

>....In the old argument
of
> Shawn Kemp, perhaps we find that the most similar players to him
are
> all out of the HOF -- then that suggests he isn't that great.
Maybe
> his best season compares with those put up by Wilt, KMalone, etc.,
> suggesting great seasons.
>
In one member's standardized numbers, Kemp's career 'abilities'
are :
21 pts, 12 reb, 2 ast, 2 blk. This compares to Artis Gilmore, Moses
Malone. But many fewer minutes for Kemp, and lesser totals.
• ... ....discriminant analysis and logistical regression .... Euclidean distance, ... distance does NOT take into account the second ... others. ... A friend of
Message 3 of 16 , Sep 12, 2001
....discriminant analysis and logistical regression .... Euclidean
distance,
> sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the difference
> between, say, Magic Johnson and Larry Bird ....., Mahalonobis
distance does NOT take into account the second
> problem, that certain variables might deserve more weight than
others.
>
A friend of mine says "Anyone who drives faster than me is a fukkin
maniac, and anyone who drives slower is a goddamn asshole".
Similarly, I say, anyone who uses less math than me is some kind of
moron, and whoever uses more must be some kind of geek.

> Also, after the initial analysis, I'd want to put in some sort of
> correction for era or game pace. Bob Cousy's 43% career FG% (or
whatever
> it was, I'm saying this off the top of my head) reminds me more of
Isiah
> Thomas's 46% than it does Alan Iverson's 43%. Despite the
superficial
> similarity of Cousy's and Iverson's FG%. (Again I'm not vouching
for
> those specific numbers, just saying that I'd rather see the numbers
in
> context, i.e. corrected for era and/or game pace.)

Cousy never once managed to make 40% of his FG during a season; his
career scoring pct. was .440. (Iverson's is .500; Isiah's was .508).
>
> For Hall of Fame purposes, I think discriminant analysis or
logistic or
> probit regressions are better than merely measuring distance. I
did this
> once for NBA all-stars one season, the predictions were not 100%
accurate
> but you could at least separate the players into three groups:
clear
> all-stars, clear non-stars, and the "on the bubble" players.
>
>
> --MKT
>
>
Last season, the West selected my top 11 Western players to the
allstar team, but skipped #12 Nowitzki in favor of teammate Michael
Meanwhile the East seemed to pick at random, ignoring most forwards
as they had ignored all point guards the year before.
• ... choose ... Another ... probably ... I tried my hand at a variation of the Euclidian distance, since I can understand the formula (and pronounce it, too). I
Message 4 of 16 , Sep 14, 2001
>.... Euclidean distance,
> sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the difference
> between, say, Magic Johnson and Larry Bird in whatever variables we
choose
> to look at.
>
> But there are problems with Euclidean distance, specfically one that
> Dean Oliver alludes to: some variables are redundant or
> partially redundant with each other,
> e.g. FG Made and Points Scored, or even Off Rebds and Def Rebds.
Another
> problem is that not all variables are equally important: some
probably
> should be given greater weight than others ...

I tried my hand at a variation of the Euclidian distance, since I can
understand the formula (and pronounce it, too).
I took 5 stats: scoring, rebounding, assists, steals, blocks. I used
my normalized (standardized) versions. Because points are much more
abundant than, say, steals, I reduced this difference by taking the
square root of each stat. I compared the top 31 players on my
infamous "alltime" list to the other 514 in the list. (I actually
ran out of columns in Excel, for the first time.)
The formula is drudgery to type, but it starts like this:
E = (sqrt(a1)-sqrt(b1))^2 + (sqrt(a2)-sqrt(b2))^2 +... and so on, up
to a5 and b5, for players a and b, and variables 1-5.
I did not take the square root of the whole thing, since everything
Not surprisingly, the best players only correspond to other great
players, but some players have much more unique statistical profiles.
In order of "greatest distance from the next-closest profile", we
have:
Sco Reb Ast Stl Blk E
Michael Jordan 33.5 6.5 5.1 2.3 .9
Jerry West 25.1 4.2 6.0 (2.7 .9) .945 (estimated)
No real surprise that Jordan is the "most unique" statistically.
Others scored more than West, but didn't have quality numbers beyond
that.
(Iverson is next, then Karl Malone(!), Kobe, Gervin, Erving, Bird,
Wilkins, Dantley, Barry)

Bill Russell 11.8 14.6 3.8 (1.5 4.0)
Bill Walton 15.9 12.8 4.0 1.0 2.7 .743
Really not very similar, but as close as anyone comes to Russell's
combination of skills.
(Thurmond is close 2nd, then Sam Lacey, Elmore Smith, Mutombo)

Magic Johnson 20.6 7.5 10.4 1.9 .4
Oscar Robertson 22.4 5.3 8.0 (1.5 .3) .644
Magic was "the next Oscar", and then some.
(Grant Hill, Payton, Penny, Strickland, Isiah, Drexler, KJ, Frazier)

John Stockton 17.1 3.3 11.9 2.4 .2
Isiah Thomas 18.0 3.7 8.8 2.0 .3 .543
Stockton is just a giant in the assists category.
(Tim Hardaway, KJ, Strickland, Cousy, Kenny Anderson, Brandon)

Jerry West 25.1 4.2 6.0 (2.7 .9)
Allen Iverson 25.1 3.9 5.5 2.1 .2 .517
Now we have some real across-the-board similarity.
(Barry, Penny, Kobe, Drexler, Maravich, Oscar, Westphal)

Oscar Robertson 22.4 5.3 8.0 (1.5 .3)
Penny Hardaway 20.2 5.1 6.2 1.9 .6 .486
(KJ, Payton, Frazier, Cassell, Tim Hardaway, Price, Brandon, Magic)

Moses Malone 21.6 13.2 1.3 .9 1.4
Shawn Kemp 20.9 11.8 2.2 1.4 1.6 .470
(Parish, Gilmore, Reed, McDyess, Ewing, Hayes, Haywood, McAdoo)

Shaquille O'Neal 29.7 12.7 2.8 .7 2.6
Tim Duncan 25.1 12.0 3.0 .8 2.3 .466
(Kareem, Robinson, Mikan, Pettit, Ewing, Mourning, Wilt, Hakeem)

Artis Gilmore 20.3 11.9 2.3 .6 2.3
Patrick Ewing 23.5 11.1 2.0 1.0 2.6 .446
(Hayes, Parish, Derrick Coleman, Sabonis, McDyess, Kemp, Gallatin)

The remainder of the top 31 (and their closest match)

Kareem AbdulJab. 25.9 10.6 3.4 1.0 2.7
Tim Duncan 25.1 12.0 3.0 .8 2.3 .288
(Robinson, Pettit, Mikan, Ewing, Neil Johnston, Shaq, Hakeem)

Wilt Chamberlain 23.5 14.7 3.5 (1.5 3.0)
George Mikan 24.8 13.1 2.9 (1.3 2.0) .432
(Hakeem, Robinson, Duncan, Pettit, Kareem, Ewing)

Karl Malone 28.1 11.2 3.4 1.4 .8
Charles Barkley 24.2 12.4 3.8 1.6 .8 .444
(Pettit, Johnston, Mikan, Baylor, Jeff Ruland, Bird, Duncan, McAdoo)

Hakeem Olajuwon 23.7 11.7 2.6 1.8 3.2
David Robinson 26.1 11.8 2.8 1.5 3.3 .275

Julius Erving 23.0 7.8 4.0 1.9 1.7
Elgin Baylor 22.5 9.6 3.9 (1.6 1.5) .347
(Webber, Marques Johnson, Shareef, Johnston, Lanier, Ed Macauley,
Schayes, Garnett, Bird, Drexler)

Patrick Ewing 23.5 11.1 2.0 1.0 2.6
Alonzo Mourning 24.5 10.9 1.6 .7 3.2 .332

Bob Pettit 24.2 11.7 2.8 (1.3 1.8)
George Mikan 24.8 13.1 2.9 (1.3 2.0) .231

Elgin Baylor 22.5 9.6 3.9 (1.6 1.5)
Chris Webber 21.1 10.1 4.2 1.5 1.8 .215
(Lanier, Erving, Schayes, Johnston, Shareef, Garnett, Pettit, McAdoo)

Scottie Pippen 18.4 7.5 5.4 2.1 .9
Clyde Drexler 20.6 6.7 5.5 2.1 .7 .306
(Alvan Adams, Connie Hawkins, Toni Kukoc, Billy C., Grant Hill,
Antoine Walker, Marques Johnson, Penny, Cliff Hagan)

Clyde-Scottie likewise

Robert Parish 18.1 11.4 1.5 .9 1.8
Elvin Hayes 17.8 10.9 1.7 1.0 2.6 .161
(Gallatin, McDyess, Seikaly, Reed, Larry Foust, Dan Roundfield,
Sampson, Haywood, Brian Grant)

Bob Lanier 21.4 10.5 3.3 1.2 1.7
Dolph Schayes 20.0 10.1 3.1 (1.4 1.6) .194

(Elvin Hayes-Robert Parish match)

Rick Barry 21.9 5.5 4.5 2.1 .5
Kobe Bryant 23.0 5.2 4.2 1.4 .8 .345
(Chris Mullin, Drexler, Hagan, Moncrief, Penny, Ray Allen)

Kevin McHale 22.1 8.6 1.8 .4 2.0
Rik Smits 19.9 8.3 1.8 .6 1.6 .306
(Lovellete, Darryl Dawkins, Haywood, McAdoo, Yardley, McDyess)

(George Mikan-Bob Pettit)

Dan Issel 21.1 8.5 2.2 1.1 .6
Terry Cummings 19.1 9.3 2.2 1.3 .7 .280
(Chambers, Ceballos, Calvin Natt, Shareef, Yardley, Glenn Robinson)

Clearly, as one goes down the list into more "ordinary" players,
there is a proliferation of close profiles.

Mike Goodman

> >
> >
• It occurred to me that when comparing players through their statistics should we be weighting the comparisons so that some statistics are more important based
Message 5 of 16 , Sep 15, 2001
It occurred to me that when comparing players through their
statistics should we be weighting the comparisons so that some
statistics are more important based on positions? For example, when
comparing point guards the assist category might be more important
for weighing similarity than rebound category. Conversely, do we
really care whether two centers have similar assist numbers if their
points, rebounds, and fg % are similar? I think this sounds somewhat
right with some notable exceptions. The counter argument of course
is that centers who pass well (a la Walton) or shoot 3s well
(Laimbeer and Sikma) are unique and the similarity scores will help
identify players with similar rare skill sets. (To digress, I wonder
if Jason Kidd and some of the Darrell Walker early 90s seasons are
comparable). I am beginning to babble but I think that the question
I am asking is whether positional demands should change how we weight
statistical categories when we try to apply similarity scores?

--- In APBR_analysis@y..., "Mike Goodman" <msg_53@h...> wrote:
> >.... Euclidean distance,
> > sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the
difference
> > between, say, Magic Johnson and Larry Bird in whatever variables
we
> choose
> > to look at.
> >
> > But there are problems with Euclidean distance, specfically one
that
> > Dean Oliver alludes to: some variables are redundant or
> > partially redundant with each other,
> > e.g. FG Made and Points Scored, or even Off Rebds and Def Rebds.
> Another
> > problem is that not all variables are equally important: some
> probably
> > should be given greater weight than others ...
>
> I tried my hand at a variation of the Euclidian distance, since I
can
> understand the formula (and pronounce it, too).
> I took 5 stats: scoring, rebounding, assists, steals, blocks. I
used
> my normalized (standardized) versions. Because points are much
more
> abundant than, say, steals, I reduced this difference by taking the
> square root of each stat. I compared the top 31 players on my
> infamous "alltime" list to the other 514 in the list. (I actually
> ran out of columns in Excel, for the first time.)
> The formula is drudgery to type, but it starts like this:
> E = (sqrt(a1)-sqrt(b1))^2 + (sqrt(a2)-sqrt(b2))^2 +... and so on,
up
> to a5 and b5, for players a and b, and variables 1-5.
> I did not take the square root of the whole thing, since everything
> Not surprisingly, the best players only correspond to other great
> players, but some players have much more unique statistical
profiles.
> In order of "greatest distance from the next-closest profile", we
> have:
> Sco Reb Ast Stl Blk E
> Michael Jordan 33.5 6.5 5.1 2.3 .9
> Jerry West 25.1 4.2 6.0 (2.7 .9) .945 (estimated)
> No real surprise that Jordan is the "most unique" statistically.
> Others scored more than West, but didn't have quality numbers
beyond
> that.
> (Iverson is next, then Karl Malone(!), Kobe, Gervin, Erving, Bird,
> Wilkins, Dantley, Barry)
>
> Bill Russell 11.8 14.6 3.8 (1.5 4.0)
> Bill Walton 15.9 12.8 4.0 1.0 2.7 .743
> Really not very similar, but as close as anyone comes to Russell's
> combination of skills.
> (Thurmond is close 2nd, then Sam Lacey, Elmore Smith, Mutombo)
>
> Magic Johnson 20.6 7.5 10.4 1.9 .4
> Oscar Robertson 22.4 5.3 8.0 (1.5 .3) .644
> Magic was "the next Oscar", and then some.
> (Grant Hill, Payton, Penny, Strickland, Isiah, Drexler, KJ, Frazier)
>
> John Stockton 17.1 3.3 11.9 2.4 .2
> Isiah Thomas 18.0 3.7 8.8 2.0 .3 .543
> Stockton is just a giant in the assists category.
> (Tim Hardaway, KJ, Strickland, Cousy, Kenny Anderson, Brandon)
>
> Jerry West 25.1 4.2 6.0 (2.7 .9)
> Allen Iverson 25.1 3.9 5.5 2.1 .2 .517
> Now we have some real across-the-board similarity.
> (Barry, Penny, Kobe, Drexler, Maravich, Oscar, Westphal)
>
> Oscar Robertson 22.4 5.3 8.0 (1.5 .3)
> Penny Hardaway 20.2 5.1 6.2 1.9 .6 .486
> (KJ, Payton, Frazier, Cassell, Tim Hardaway, Price, Brandon, Magic)
>
> Moses Malone 21.6 13.2 1.3 .9 1.4
> Shawn Kemp 20.9 11.8 2.2 1.4 1.6 .470
> (Parish, Gilmore, Reed, McDyess, Ewing, Hayes, Haywood, McAdoo)
>
> Shaquille O'Neal 29.7 12.7 2.8 .7 2.6
> Tim Duncan 25.1 12.0 3.0 .8 2.3 .466
> (Kareem, Robinson, Mikan, Pettit, Ewing, Mourning, Wilt, Hakeem)
>
> Artis Gilmore 20.3 11.9 2.3 .6 2.3
> Patrick Ewing 23.5 11.1 2.0 1.0 2.6 .446
> (Hayes, Parish, Derrick Coleman, Sabonis, McDyess, Kemp, Gallatin)
>
> The remainder of the top 31 (and their closest match)
>
> Kareem AbdulJab. 25.9 10.6 3.4 1.0 2.7
> Tim Duncan 25.1 12.0 3.0 .8 2.3 .288
> (Robinson, Pettit, Mikan, Ewing, Neil Johnston, Shaq, Hakeem)
>
> Wilt Chamberlain 23.5 14.7 3.5 (1.5 3.0)
> George Mikan 24.8 13.1 2.9 (1.3 2.0) .432
> (Hakeem, Robinson, Duncan, Pettit, Kareem, Ewing)
>
> Karl Malone 28.1 11.2 3.4 1.4 .8
> Charles Barkley 24.2 12.4 3.8 1.6 .8 .444
> (Pettit, Johnston, Mikan, Baylor, Jeff Ruland, Bird, Duncan, McAdoo)
>
> Hakeem Olajuwon 23.7 11.7 2.6 1.8 3.2
> David Robinson 26.1 11.8 2.8 1.5 3.3 .275
>
> Julius Erving 23.0 7.8 4.0 1.9 1.7
> Elgin Baylor 22.5 9.6 3.9 (1.6 1.5) .347
> (Webber, Marques Johnson, Shareef, Johnston, Lanier, Ed Macauley,
> Schayes, Garnett, Bird, Drexler)
>
> Patrick Ewing 23.5 11.1 2.0 1.0 2.6
> Alonzo Mourning 24.5 10.9 1.6 .7 3.2 .332
>
> Bob Pettit 24.2 11.7 2.8 (1.3 1.8)
> George Mikan 24.8 13.1 2.9 (1.3 2.0) .231
>
> Elgin Baylor 22.5 9.6 3.9 (1.6 1.5)
> Chris Webber 21.1 10.1 4.2 1.5 1.8 .215
> (Lanier, Erving, Schayes, Johnston, Shareef, Garnett, Pettit,
>
> Scottie Pippen 18.4 7.5 5.4 2.1 .9
> Clyde Drexler 20.6 6.7 5.5 2.1 .7 .306
> (Alvan Adams, Connie Hawkins, Toni Kukoc, Billy C., Grant Hill,
> Antoine Walker, Marques Johnson, Penny, Cliff Hagan)
>
> Clyde-Scottie likewise
>
> Robert Parish 18.1 11.4 1.5 .9 1.8
> Elvin Hayes 17.8 10.9 1.7 1.0 2.6 .161
> (Gallatin, McDyess, Seikaly, Reed, Larry Foust, Dan Roundfield,
> Sampson, Haywood, Brian Grant)
>
> Bob Lanier 21.4 10.5 3.3 1.2 1.7
> Dolph Schayes 20.0 10.1 3.1 (1.4 1.6) .194
>
> (Elvin Hayes-Robert Parish match)
>
> Rick Barry 21.9 5.5 4.5 2.1 .5
> Kobe Bryant 23.0 5.2 4.2 1.4 .8 .345
> (Chris Mullin, Drexler, Hagan, Moncrief, Penny, Ray Allen)
>
> Kevin McHale 22.1 8.6 1.8 .4 2.0
> Rik Smits 19.9 8.3 1.8 .6 1.6 .306
> (Lovellete, Darryl Dawkins, Haywood, McAdoo, Yardley, McDyess)
>
> (George Mikan-Bob Pettit)
>
> Dan Issel 21.1 8.5 2.2 1.1 .6
> Terry Cummings 19.1 9.3 2.2 1.3 .7 .280
> (Chambers, Ceballos, Calvin Natt, Shareef, Yardley, Glenn Robinson)
>
> Clearly, as one goes down the list into more "ordinary" players,
> there is a proliferation of close profiles.
>
>
> Mike Goodman
>
> > >
> > >
• ... Yes and No. What we re trying to come up with here is a general set of rules that can be applied at default (as a basis for studies, that can be
Message 6 of 16 , Sep 16, 2001
--- In APBR_analysis@y..., harlanzo@y... wrote:
> It occurred to me that when comparing players through their
> statistics should we be weighting the comparisons so that some
> statistics are more important based on positions?

Yes and No. What we're trying to come up with here is a general set
of rules that can be applied at default (as a basis for studies,
that can be modified). James always said that the method's blessing
and curse was its flexibility. We SHOULD modify it for specific
comparisons -- perhaps among point guards. There will always be a
lot of different versions around, but we want one set for general
comparisons, in part because, using your example, we can't
necessarily identify who point guards are.

I also thought of a reason not to use Euclidean distance -- it
weights big differences too much. At least that is the subjective
opinion a lot of times. It's the old argument between standard
deviation and mean absolute difference -- the first weights big
differences a lot but is mathematically easier, but the second seems
to reflect more of what we want. The similarity scores, as James did
them and as I modified them, fit into the mean absolute difference
category. In Mike's categories, then, this implies that there is
likely one very big difference between Jordan's numbers and everyone
else (probably scoring average) -- that gets emphasized, making him
the most unique player. I'd like to take a stab at career similarity
scores using the approach I've outlined to see whether it id's Jordan
as most unique, too.

MikeG -- While I like the comparisons you did, there are 2 comments I
would make:

1. I'd like to see some non-standardized comparisons. I do like the
standardized because they make some sense, but I think
non-standardized will also tell a story.

2. You really need some comparison of shooting percentages and
turnovers. It really caught my eye with the Duncan-Kareem
comparison. I see some similarity between these two, but there are
big differences in offensive efficiency. Kareem was nearly
unstoppable offensively - my floor%'s and offensive efficiencies
reflect that. Duncan is very stoppable, his offensive rating and
floor percentage blending in to be about average. Kareem fell to
average offensively only in his last year. (I also don't think that
Kareem was the defensive force that Duncan is, but my memories are
biased by the Kareem post-'80, when he wasn't as good as he was when
younger.)

Dean Oliver

> For example,
when
> comparing point guards the assist category might be more important
> for weighing similarity than rebound category. Conversely, do we
> really care whether two centers have similar assist numbers if
their
> points, rebounds, and fg % are similar? I think this sounds
somewhat
> right with some notable exceptions. The counter argument of course
> is that centers who pass well (a la Walton) or shoot 3s well
> (Laimbeer and Sikma) are unique and the similarity scores will help
> identify players with similar rare skill sets. (To digress, I
wonder
> if Jason Kidd and some of the Darrell Walker early 90s seasons are
> comparable). I am beginning to babble but I think that the
question
> I am asking is whether positional demands should change how we
weight
> statistical categories when we try to apply similarity scores?
>
>
> --- In APBR_analysis@y..., "Mike Goodman" <msg_53@h...> wrote:
wrote:
> > >.... Euclidean distance,
> > > sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the
> difference
> > > between, say, Magic Johnson and Larry Bird in whatever
variables
> we
> > choose
> > > to look at.
> > >
> > > But there are problems with Euclidean distance, specfically one
> that
> > > Dean Oliver alludes to: some variables are redundant or
> > > partially redundant with each other,
> > > e.g. FG Made and Points Scored, or even Off Rebds and Def
Rebds.
> > Another
> > > problem is that not all variables are equally important: some
> > probably
> > > should be given greater weight than others ...
> >
> > I tried my hand at a variation of the Euclidian distance, since I
> can
> > understand the formula (and pronounce it, too).
> > I took 5 stats: scoring, rebounding, assists, steals, blocks. I
> used
> > my normalized (standardized) versions. Because points are much
> more
> > abundant than, say, steals, I reduced this difference by taking
the
> > square root of each stat. I compared the top 31 players on my
> > infamous "alltime" list to the other 514 in the list. (I
actually
> > ran out of columns in Excel, for the first time.)
> > The formula is drudgery to type, but it starts like this:
> > E = (sqrt(a1)-sqrt(b1))^2 + (sqrt(a2)-sqrt(b2))^2 +... and so on,
> up
> > to a5 and b5, for players a and b, and variables 1-5.
> > I did not take the square root of the whole thing, since
everything
> > was already square-rooted once.
> > Not surprisingly, the best players only correspond to other great
> > players, but some players have much more unique statistical
> profiles.
> > In order of "greatest distance from the next-closest profile", we
> > have:
> > Sco Reb Ast Stl Blk E
> > Michael Jordan 33.5 6.5 5.1 2.3 .9
> > Jerry West 25.1 4.2 6.0 (2.7 .9) .945 (estimated)
> > No real surprise that Jordan is the "most unique" statistically.

> > Others scored more than West, but didn't have quality numbers
> beyond
> > that.
> > (Iverson is next, then Karl Malone(!), Kobe, Gervin, Erving,
Bird,
> > Wilkins, Dantley, Barry)
> >
> > Bill Russell 11.8 14.6 3.8 (1.5 4.0)
> > Bill Walton 15.9 12.8 4.0 1.0 2.7 .743
> > Really not very similar, but as close as anyone comes to
Russell's
> > combination of skills.
> > (Thurmond is close 2nd, then Sam Lacey, Elmore Smith, Mutombo)
> >
> > Magic Johnson 20.6 7.5 10.4 1.9 .4
> > Oscar Robertson 22.4 5.3 8.0 (1.5 .3) .644
> > Magic was "the next Oscar", and then some.
> > (Grant Hill, Payton, Penny, Strickland, Isiah, Drexler, KJ,
Frazier)
> >
> > John Stockton 17.1 3.3 11.9 2.4 .2
> > Isiah Thomas 18.0 3.7 8.8 2.0 .3 .543
> > Stockton is just a giant in the assists category.
> > (Tim Hardaway, KJ, Strickland, Cousy, Kenny Anderson, Brandon)
> >
> > Jerry West 25.1 4.2 6.0 (2.7 .9)
> > Allen Iverson 25.1 3.9 5.5 2.1 .2 .517
> > Now we have some real across-the-board similarity.
> > (Barry, Penny, Kobe, Drexler, Maravich, Oscar, Westphal)
> >
> > Oscar Robertson 22.4 5.3 8.0 (1.5 .3)
> > Penny Hardaway 20.2 5.1 6.2 1.9 .6 .486
> > (KJ, Payton, Frazier, Cassell, Tim Hardaway, Price, Brandon,
Magic)
> >
> > Moses Malone 21.6 13.2 1.3 .9 1.4
> > Shawn Kemp 20.9 11.8 2.2 1.4 1.6 .470
> > (Parish, Gilmore, Reed, McDyess, Ewing, Hayes, Haywood, McAdoo)
> >
> > Shaquille O'Neal 29.7 12.7 2.8 .7 2.6
> > Tim Duncan 25.1 12.0 3.0 .8 2.3 .466
> > (Kareem, Robinson, Mikan, Pettit, Ewing, Mourning, Wilt, Hakeem)
> >
> > Artis Gilmore 20.3 11.9 2.3 .6 2.3
> > Patrick Ewing 23.5 11.1 2.0 1.0 2.6 .446
> > (Hayes, Parish, Derrick Coleman, Sabonis, McDyess, Kemp,
Gallatin)
> >
> > The remainder of the top 31 (and their closest match)
> >
> > Kareem AbdulJab. 25.9 10.6 3.4 1.0 2.7
> > Tim Duncan 25.1 12.0 3.0 .8 2.3 .288
> > (Robinson, Pettit, Mikan, Ewing, Neil Johnston, Shaq, Hakeem)
> >
> > Wilt Chamberlain 23.5 14.7 3.5 (1.5 3.0)
> > George Mikan 24.8 13.1 2.9 (1.3 2.0) .432
> > (Hakeem, Robinson, Duncan, Pettit, Kareem, Ewing)
> >
> > Karl Malone 28.1 11.2 3.4 1.4 .8
> > Charles Barkley 24.2 12.4 3.8 1.6 .8 .444
> > (Pettit, Johnston, Mikan, Baylor, Jeff Ruland, Bird, Duncan,
> >
> > Hakeem Olajuwon 23.7 11.7 2.6 1.8 3.2
> > David Robinson 26.1 11.8 2.8 1.5 3.3 .275
> >
> > Julius Erving 23.0 7.8 4.0 1.9 1.7
> > Elgin Baylor 22.5 9.6 3.9 (1.6 1.5) .347
> > (Webber, Marques Johnson, Shareef, Johnston, Lanier, Ed Macauley,
> > Schayes, Garnett, Bird, Drexler)
> >
> > Patrick Ewing 23.5 11.1 2.0 1.0 2.6
> > Alonzo Mourning 24.5 10.9 1.6 .7 3.2 .332
> >
> > Bob Pettit 24.2 11.7 2.8 (1.3 1.8)
> > George Mikan 24.8 13.1 2.9 (1.3 2.0) .231
> >
> > Elgin Baylor 22.5 9.6 3.9 (1.6 1.5)
> > Chris Webber 21.1 10.1 4.2 1.5 1.8 .215
> > (Lanier, Erving, Schayes, Johnston, Shareef, Garnett, Pettit,
> >
> > Scottie Pippen 18.4 7.5 5.4 2.1 .9
> > Clyde Drexler 20.6 6.7 5.5 2.1 .7 .306
> > (Alvan Adams, Connie Hawkins, Toni Kukoc, Billy C., Grant Hill,
> > Antoine Walker, Marques Johnson, Penny, Cliff Hagan)
> >
> > Clyde-Scottie likewise
> >
> > Robert Parish 18.1 11.4 1.5 .9 1.8
> > Elvin Hayes 17.8 10.9 1.7 1.0 2.6 .161
> > (Gallatin, McDyess, Seikaly, Reed, Larry Foust, Dan Roundfield,
> > Sampson, Haywood, Brian Grant)
> >
> > Bob Lanier 21.4 10.5 3.3 1.2 1.7
> > Dolph Schayes 20.0 10.1 3.1 (1.4 1.6) .194
> >
> > (Elvin Hayes-Robert Parish match)
> >
> > Rick Barry 21.9 5.5 4.5 2.1 .5
> > Kobe Bryant 23.0 5.2 4.2 1.4 .8 .345
> > (Chris Mullin, Drexler, Hagan, Moncrief, Penny, Ray Allen)
> >
> > Kevin McHale 22.1 8.6 1.8 .4 2.0
> > Rik Smits 19.9 8.3 1.8 .6 1.6 .306
> > (Lovellete, Darryl Dawkins, Haywood, McAdoo, Yardley, McDyess)
> >
> > (George Mikan-Bob Pettit)
> >
> > Dan Issel 21.1 8.5 2.2 1.1 .6
> > Terry Cummings 19.1 9.3 2.2 1.3 .7 .280
> > (Chambers, Ceballos, Calvin Natt, Shareef, Yardley, Glenn
Robinson)
> >
> > Clearly, as one goes down the list into more "ordinary" players,
> > there is a proliferation of close profiles.
> >
> >
> > Mike Goodman
> >
> > > >
> > > >
• Personally, I don t ever consider position to be a quantifiable statistic. Many forwards have been forced to play center; many forwards are not clearly
Message 7 of 16 , Sep 16, 2001
Personally, I don't ever consider 'position' to be a quantifiable
statistic. Many forwards have been forced to play center; many
forwards are not clearly 'power' or 'small' forwards; many players
are not exclusively guards or forwards; many versatile guards do
plenty of scoring and passing, and rebounding.
The possible fragmenting of these lists is virtually infinite. An
assist from a center is exactly as important as an assist from a
guard. A rebounding guard, a center who gets steals as well as
blocks, all these things make a player unique, or at least
differentiate him from the norm.
The issue of 3-point shooting might be worth looking into. How one
goes about racking up one's scoring totals is of some interest. Then
again, it might invite breaking down points into dunks, layups, etc.
In the end, points are points. A player's scoring may come from
inside moves when he is young, and from outside shots later. The
contribution is still the same.
One thing these similarity indexes do reveal, is that there are
some 'classic' profiles by position. Wilt, Kareem, Hakeem, Shaq,
Robinson, Ewing, Moses, Gilmore, all averaged 22-28 pts, 12-15 reb, 2-
3 blocks. But the well-rounded centers seem to have enjoyed more
success.
The demands of one's position are somewhat situational. The best
players can usually do whatever is most needed.

--- In APBR_analysis@y..., harlanzo@y... wrote:
> It occurred to me that when comparing players through their
> statistics should we be weighting the comparisons so that some
> statistics are more important based on positions? For example,
when
> comparing point guards the assist category might be more important
> for weighing similarity than rebound category. Conversely, do we
> really care whether two centers have similar assist numbers if
their
> points, rebounds, and fg % are similar? I think this sounds
somewhat
> right with some notable exceptions. The counter argument of course
> is that centers who pass well (a la Walton) or shoot 3s well
> (Laimbeer and Sikma) are unique and the similarity scores will help
> identify players with similar rare skill sets. (To digress, I
wonder
> if Jason Kidd and some of the Darrell Walker early 90s seasons are
> comparable). I am beginning to babble but I think that the
question
> I am asking is whether positional demands should change how we
weight
> statistical categories when we try to apply similarity scores?
>
>
• ... seems ... I operate under the assumption that points and rebounds are equally important as contributions; so are steals and blocks, but almost everyone
Message 8 of 16 , Sep 16, 2001
--- In APBR_analysis@y..., deano@t... wrote:
>..... a reason not to use Euclidean distance -- it
> weights big differences too much. At least that is the subjective
> opinion a lot of times. It's the old argument between standard
> deviation and mean absolute difference -- the first weights big
> differences a lot but is mathematically easier, but the second
seems
> to reflect more of what we want.

I operate under the assumption that points and rebounds are equally
important as contributions; so are steals and blocks, but almost
everyone gets fewer than 2-3 of these, so it seems fair to weigh them
less. Taking the standard deviation from the mean gives you the
burden of assigning a weight to the statistical category. I avoid
this by presuming that bigger numbers implies bigger weights. That
is, scoring is and should be more important than, say, steals.
(I did reduce the 'difference' factor by taking their square roots.)

> The similarity scores, as James did
> them and as I modified them, fit into the mean absolute difference
> category. In Mike's categories, then, this implies that there is
> likely one very big difference between Jordan's numbers and
everyone
> else (probably scoring average) -- that gets emphasized, making him
> the most unique player. I'd like to take a stab at career
similarity
> scores using the approach I've outlined to see whether it id's
Jordan
> as most unique, too.
>
> MikeG -- While I like the comparisons you did, there are 2 comments
I
> would make:
>
> 1. I'd like to see some non-standardized comparisons. I do like
the
> standardized because they make some sense, but I think
> non-standardized will also tell a story.

Dean, you could do raw averages, but players from the 60s would only
compare to players in the 60s. Actually, a great rebounder in the
90s would seem to compare to an average rebounder in the 60s, for
example.
I don't have a ready database of raw averages.

> 2. You really need some comparison of shooting percentages and
> turnovers. It really caught my eye with the Duncan-Kareem
> comparison. I see some similarity between these two, but there are
> big differences in offensive efficiency. Kareem was nearly
> unstoppable offensively - my floor%'s and offensive efficiencies
> reflect that. Duncan is very stoppable, his offensive rating and
> floor percentage blending in to be about average. Kareem fell to
> average offensively only in his last year. (I also don't think
that
> Kareem was the defensive force that Duncan is, but my memories are
> biased by the Kareem post-'80, when he wasn't as good as he was
when
> younger.)
>
> Dean Oliver

Shooting percentages are part of what determines my standardized
scoring rate, along with game pace (defined as points allowed). I
only did career totals, so Kareem's incredibly long career has been
smoothed over, and his very dominant early seasons are not truly
reflected. Maybe Duncan has peaked, and his career averages really
won't rank close to Kareem's.
Further, Duncan's offensive numbers, in my system, get a big boost
from his being on a great defensive team. You have to agree his
offensive strength is way above average on his team. In other words,
the go-to guy on the championship Spurs is going to rate favorably to
the go-to guy on the champion Bucks from 30 years before, in my
system.

Mike Goodman
>
>
> > > > >
• ... only ... I think this is what I was interested in. I was curious who from today would fit in the 60 s. Or, more interestingly, who from the 70 s might
Message 9 of 16 , Sep 17, 2001
--- In APBR_analysis@y..., msg_53@h... wrote:
> > 1. I'd like to see some non-standardized comparisons. I do like
> the
> > standardized because they make some sense, but I think
> > non-standardized will also tell a story.
>
> Dean, you could do raw averages, but players from the 60s would
only
> compare to players in the 60s. Actually, a great rebounder in the
> 90s would seem to compare to an average rebounder in the 60s, for
> example.
> I don't have a ready database of raw averages.
>

I think this is what I was interested in. I was curious who from
today would fit in the '60's. Or, more interestingly, who from the
'70's might fit in today's game. Are West's raw #'s similar to
Iverson's or to Richmond's? What happens in baseball is that
outstanding players tend to be dissimilar to other players in their
era, but similar to outstanding players of other eras. I have doubt
that this would happen in basketball, using raw #'s, because of the
style change. You seem to be saying the same thing.

(I didn't realize that you don't have a db of raw#'s.)

> > 2. You really need some comparison of shooting percentages and
> > turnovers. It really caught my eye with the Duncan-Kareem
> > comparison. I see some similarity between these two, but there
are
> > big differences in offensive efficiency. Kareem was nearly
> > unstoppable offensively - my floor%'s and offensive efficiencies
> > reflect that. Duncan is very stoppable, his offensive rating and
> > floor percentage blending in to be about average. Kareem fell to
> > average offensively only in his last year. (I also don't think
> that
> > Kareem was the defensive force that Duncan is, but my memories
are
> > biased by the Kareem post-'80, when he wasn't as good as he was
> when
> > younger.)
>
> Shooting percentages are part of what determines my standardized
> scoring rate, along with game pace (defined as points allowed). I
> only did career totals, so Kareem's incredibly long career has been
> smoothed over, and his very dominant early seasons are not truly
> reflected.

One of my personal quibbles with all the tendex-like rating systems
out there is there is that they do combine offensive with defensive
contributions. There is a big difference in my mind between Moses
Malone, who was an offensive force, and Hakeem Olajuwon, who has been
dominant defensively. Both were good in the other thing, but
dominant in just one. Kareem was dominant offensively (and probably
defensively) early on. Duncan has been dominant defensively, not
offensively. (Duncan appears to have more of the competitive fight
than Kareem, but, again, I missed the early Kareem.)

> Maybe Duncan has peaked, and his career averages really
> won't rank close to Kareem's.

I don't think I'd say that Duncan's peaked. He's been pretty
remarkably consistent since entering the league. Maybe it's only
remarkable that he stayed in school long enough to actually be ready
for the league when entering.

> Further, Duncan's offensive numbers, in my system, get a big boost
> from his being on a great defensive team. You have to agree his
> offensive strength is way above average on his team.

Depending on how you define "average", but, yeah, Duncan looks better
offensively than he really is because he plays on a great defensive
team. (He would make most teams better defensively, too.)

> Personally, I don't ever consider 'position' to be a quantifiable
> statistic.

James defined numbers to positions for defensive purposes (a
shortstop is much more valuable to a defense than a 1st baseman, for
example). That might be necessary for some of the older guys because
defensive stats really don't exist in the '60's and early '70's. But
we can probably still assume that a center was the most important
defensive player back then, as he is now. This gets adequately
reflected in blocks, steals, and defensive boards, but you do need
those #'s.

> assist from a center is exactly as important as an assist from a
> guard.

Only a minor point here -- this is not precisely true (though
probably true enough for government work). Assists from guards tend
to be more valuable. This is because they often have to make the
tougher pass than big men. The weight on an assist is proportional
to the expected FG% of the guy he passes to. Historically, big men
have had higher FG% than guards -- hence their assists are weighted
less. (The assists of the best shooting player on a team are less
valuable than the assists of the guys getting him the ball.) This
has changed with the 3 pt shot, but it's a conversion from FG% to
effective FG%...

Dean Oliver
• ... My raw totals and per-game averages are contained in my season files, along with team totals and averages for that season. My composite lists only have
Message 10 of 16 , Sep 18, 2001
--- In APBR_analysis@y..., "Dean Oliver" <deano@t...> wrote:
> (I didn't realize that you don't have a db of raw#'s.)
>
My raw totals and per-game averages are contained in my 'season'
files, along with team totals and averages for that season. My
composite lists only have the 'standardized' rates. From those
rates, I can generate 'equivalent totals'. For 'average'
scoring/rebounding teams, these would be equal to raw season totals.

>
> One of my personal quibbles with all the tendex-like rating systems
> out there is there is that they do combine offensive with defensive
> contributions. There is a big difference in my mind between Moses
> Malone, who was an offensive force, and Hakeem Olajuwon, who has
been
> dominant defensively. Both were good in the other thing, but
> dominant in just one. Kareem was dominant offensively (and
probably
> defensively) early on. Duncan has been dominant defensively, not
> offensively. (Duncan appears to have more of the competitive fight
> than Kareem, but, again, I missed the early Kareem.)

I get your point, Dean, but your examples don't seem the clearest.
Olajuwan is better than Malone because he has all the offense Malone
had PLUS defense. Never seen the Dream shake?
Duncan has virtually all the offense Kareem had, averaged over their
careers, according to my numbers. Kareem did maintain a great
shooting pct., but Duncan plays in an era of universally-tough D.

> I don't think I'd say that Duncan's peaked. He's been pretty
> remarkably consistent since entering the league. Maybe it's only
> remarkable that he stayed in school long enough to actually be
> for the league when entering.

Some guys enter the league at full strength: Wilt, Oscar, Kareem,
Robinson, never improved beyond their first 3 years. Others start as
near- superstars, then several years along suddenly shift into true
superstar mode: Magic, Bird, Olajuwon, ...

>
> Depending on how you define "average", but, yeah, Duncan looks
better
> offensively than he really is because he plays on a great defensive
> team. (He would make most teams better defensively, too.)

Don't know how a guy 'looks better than he really is', DeanO.

>Assists from guards tend
> to be more valuable. This is because they often have to make the
> tougher pass than big men. The weight on an assist is proportional
> to the expected FG% of the guy he passes to. Historically, big men
> have had higher FG% than guards -- hence their assists are weighted
> less. (The assists of the best shooting player on a team are less
> valuable than the assists of the guys getting him the ball.) This
> has changed with the 3 pt shot, but it's a conversion from FG% to
> effective FG%...
>
> Dean Oliver

This is fun, splitting hairs!
If your center kicks out 3 nice passes to guards, who only hit one of
the 3 shots, the center only gets one assist.
The guard can make 3 nice passes inside, 2 of which may be converted,
so he gets 2 assists.
So an equally valid argument is that assists from guards
are 'easier', and assists from centers are 'undercounted'.
I say they are equal.

Perhaps more to the issue, evaluate which players make those
practical passes which may or may not get them an assist, versus
those who will not give up the ball unless it gets them an assist. I
can't discern the 2 types from the statistics, but I know it when I
see it. (It might be partly discernible in that old assist/turnover
ratio.)

Mike Goodman
• ... systems ... defensive ... fight ... Olajuwon was very solid offensively (not stellar, like Kareem) -- I didn t mean to imply otherwise. Malone was just
Message 11 of 16 , Sep 18, 2001
--- In APBR_analysis@y..., "Mike Goodman" <msg_53@h...> wrote:
> > One of my personal quibbles with all the tendex-like rating
systems
> > out there is there is that they do combine offensive with
defensive
> > contributions. There is a big difference in my mind between Moses
> > Malone, who was an offensive force, and Hakeem Olajuwon, who has
> been
> > dominant defensively. Both were good in the other thing, but
> > dominant in just one. Kareem was dominant offensively (and
> probably
> > defensively) early on. Duncan has been dominant defensively, not
> > offensively. (Duncan appears to have more of the competitive
fight
> > than Kareem, but, again, I missed the early Kareem.)
>
> I get your point, Dean, but your examples don't seem the clearest.
> Olajuwan is better than Malone because he has all the offense Malone
> had PLUS defense. Never seen the Dream shake?
> Duncan has virtually all the offense Kareem had, averaged over their
> careers, according to my numbers. Kareem did maintain a great
> shooting pct., but Duncan plays in an era of universally-tough D.

Olajuwon was very solid offensively (not stellar, like Kareem) -- I
didn't mean to imply otherwise. Malone was just the epitome of a good
offensive center who wasn't that good defensively. Rik Smits is
another example of the poor defensive type who can score (not as well
as Olajuwon/Moses). Olajuwon is very DISSIMILAR to these guys because
he is much better defensively. Similarity is all I'm trying to
capture, not quality.

I looked at Duncan's offensive #'s last night and his offensive rating
has been between about 104 and 108 since entering the league, when
average offensive ratings have been between about 100 and 103. He's a
little more efficient than average. My recollection of Kareem's #'s
were about 115 in the early '80s, when average was about 106-108 --
relatively higher than Duncan's. Again, these two players just don't
seem very SIMILAR to me. I would think of David Robinson as more
similar to Kareem. Or possibly Olajuwon. Probably Wilt. Not
Russell.

> >
> > Depending on how you define "average", but, yeah, Duncan looks
> better
> > offensively than he really is because he plays on a great
defensive
> > team. (He would make most teams better defensively, too.)
>
> Don't know how a guy 'looks better than he really is', DeanO.
>

Another way of saying that the hype on Duncan has been a little
extreme. Put him on the Hawks last year and, while he's better than
Mutombo offensively, the team still wouldn't have scored much. They
would have been pretty close to as good defensively as they were with
Mutombo (or better), but they wouldn't be an offensive threat. I
don't think Kareem ever played on a weak offensive team.

> This is fun, splitting hairs!
> If your center kicks out 3 nice passes to guards, who only hit one
of
> the 3 shots, the center only gets one assist.
> The guard can make 3 nice passes inside, 2 of which may be
converted,
> so he gets 2 assists.
> So an equally valid argument is that assists from guards
> are 'easier', and assists from centers are 'undercounted'.
> I say they are equal.
>
> Perhaps more to the issue, evaluate which players make those
> practical passes which may or may not get them an assist, versus
> those who will not give up the ball unless it gets them an assist.

The goal is to identify when a good pass is made. Generally a better
pass is one made to a better shooter. That's all I try to capture. I
capture it in formulas with teammate FG%. For years, I didn't worry
about it and it really didn't matter much. Now I've got more
sophisticated calculation devices. I've actually found that this
adjustment makes the most difference when evaluating different levels
of basketball (high school, college, women's).

Dean Oliver