Loading ...
Sorry, an error occurred while loading the content.

Re: Similarity Scores

Expand Messages
  • Dean Oliver
    ... something ... work ... to see ... I ve asked Bill about the approach he took and why he took it. I don t expect to hear back anytime soon. ... useful ...
    Message 1 of 16 , Sep 10, 2001
    • 0 Attachment
      --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
      > Just some thoughts off the top of my head: I agree that this is
      something
      > that ought to be done for basketball. I'm a little leery of the
      work
      > that's been done in baseball, as Bill James' formula looks like a
      > hand-cooked one. I'm sure its results are reasonable, but I'd like
      to see
      > an approach that's more systematic.
      >

      I've asked Bill about the approach he took and why he took it. I
      don't expect to hear back anytime soon.

      > There are several statistical techniques which I think are directly
      useful
      > here. Cluster analysis (the computer looks at the data and divides
      them
      > into groups); discriminant analysis and logistical regression
      (determine
      > the factors which predict which group a player will be in, e.g.
      Hall of
      > Fame vs non Hall of Fame); and probably the most useful of all:
      > Mahalanobis distance, or some variation thereof.

      > However, Mahalonobis distance does NOT take into account the second
      > problem, that certain variables might deserve more weight than
      others.
      > Maybe we WOULD want to "double count" PTs scored, given that it's
      > probably the single most important statistic for a player, or at
      least one
      > of the most important ones for distinguishing all-stars and hall of
      famers
      > from journeymen.

      First -- I don't know of Mahalonobis distance stuff. Sounds like
      multivariate regression, though. You may have to do this analysis or
      point me at software that does it. What is it going to spit out?
      Weights on the different stats?

      >
      > Anyway, that's how I would approach the problem. For a first pass,
      I'd do
      > statistics per game, rather than per 48 minutes, per year, or per
      career
      > (career stats would be useless for comparing, say, Steve Francis to
      Isiah
      > Thomas, because Francis' career stats are still so low).

      We could compare Thomas' first 2 years to Francis', a very useful
      comparison.

      >I'd do
      straight
      > Mahalanobis distance at first, throwing in all variables (FTA, FTM,
      AND
      > FT%) and see if the results looked reasonable. If not, then at
      least I'd
      > have some coefficients to start with, and could start doubling or
      halving
      > some.
      >

      I agree, I think (not knowing exactly what the coefficients mean).
      Getting a common starting point is the most important thing I want to
      get out of this discussion. Similarity scores are ultimately
      somewhat subjective, but if we can all start with the same set of
      numbers, at least we have a foundation.

      > Also, after the initial analysis, I'd want to put in some sort of
      > correction for era or game pace. Bob Cousy's 43% career FG% (or
      whatever
      > it was, I'm saying this off the top of my head) reminds me more of
      Isiah
      > Thomas's 46% than it does Alan Iverson's 43%. Despite the
      superficial
      > similarity of Cousy's and Iverson's FG%. (Again I'm not vouching
      for
      > those specific numbers, just saying that I'd rather see the numbers
      in
      > context, i.e. corrected for era and/or game pace.)
      >

      I think we want to keep the era correction separate. I can't find
      where he said it, but I know James wanted to keep it separate.

      My one attempt at basketball similarity scores is buried somewhere.
      I looked at player-season comparisons back in '98. The motivation
      was identifying who was similar to Kobe Bryant, since there was so
      much controversy at the time about how good he was going to be.
      I remember finding a lot more self-similarity across players than
      cross-similarity (Bryant's 2nd year resembled his first more than
      it resembled a lot of other players' seasons, for example). The
      player who seemed most similar to Bryant at the time was Allen
      Iverson, but my #'s were weird. I'd say now that Bryant and Iverson
      aren't as similar as Bryant and Jordan or Bryant and VCarter, which I
      think harks at factors I didn't consider -- height and position. (It
      may also hark at false impressions. Jordan's early career numbers
      are MUCH better than Bryant's, even if you account for the 10% drop
      in pace and the 5-8% drop in offensive efficiency.)

      For kicks, here is Iverson 99 and Bryant 2001:

      GS MIN FG FGA FG% fg3m fg3a fg3%
      Iverson 1.0 41.5 9.1 22.0 0.412 1.2 4.1 0.291
      Bryant 1.0 40.9 10.3 22.2 0.464 0.9 2.9 0.305

      FT FTA FT% OR DR TR AST PF DQ
      7.4 9.9 0.751 1.4 3.5 4.9 4.6 2.0 0.0
      7.0 8.2 0.853 1.5 4.3 5.9 5.0 3.3 0.0

      STL TO BLK PTS
      2.29 3.48 0.15 26.8
      1.68 3.24 0.63 28.5

      A priori, I'd like to call these two seasons in the 700-750 range on
      similarity scores. Easily identifiable similarities, but significant
      and obvious differences.

      For my #'s, I have (in per game stats) for Iverson and Bryant, resp:

      Defensive Stops Def. Net
      ScPoss Poss Fl% Ortg PtsProd /Min /Poss Rtg. Win%
      12.8 25.0 0.511 106.6 26.7 0.182 0.484 97.3 0.820
      12.8 24.4 0.524 110.9 27.0 0.191 0.494 103.0 0.773

      At about the same age, Bryant's offensive skills are more efficient
      than Iverson's, which we would probably all agree on. Defensively,
      it's hard to say.

      Dean Oliver
      Journal of Basketball Studies
    • Dean Oliver
      ... This being the case a study might still be ... (ie ... I think we should start here, but not limit it this way. Your example of Houston and Blackman is
      Message 2 of 16 , Sep 10, 2001
      • 0 Attachment
        --- In APBR_analysis@y..., harlanzo@y... wrote:
        This being the case a study might still be
        > interesting. I would suggest that because of how the game has
        > changed that any model should really be limited to a specific era
        (ie
        > since 90-91 season).

        I think we should start here, but not limit it this way.

        Your example of Houston and Blackman is interesting. I think players
        like Blackman did evolve into players like Houston, but their styles
        were/are different. There weren't many Houston-types in the '80's.
        We want to show that. In some cases, we may want to hide that, but
        we don't want to hide it all the time. It points out the problem we
        always have -- that players from the '60's are more similar to
        themselves than they are to today's players. As much as we might
        like to compare Bob Pettit to Karl Malone, I'm sure more players in
        the '50's-'60's are similar to Pettit than Malone is.

        I would call the similarity (on a per game basis) between Houston and
        Blackman about an 800, just gut feel. Career-wise, Houston has a
        ways to go to get to Blackman's level. Even at age 30, it appears
        that Blackman had a bit better numbers.

        Here is the list of comparisons done for the newsletter with the
        original scores assigned and some of my subjective scores:

        Orig MyEst Players
        990 850 Isiah Thomas & Tim Hardaway
        986 800 Julius Erving & Elgin Baylor
        985 Mark Aguirre & Alex English
        984 850 Patrick Ewing & Alonzo Mourning
        976 Kareem Abdul-Jabbar & Bob Pettit
        976 800 David Robinson & Tim Duncan
        969 Willis Reed & Walt Bellamy
        968 850 Reggie Miller & Allan Houston
        968 Kevin Johnson & Stephon Marbury
        967 Oscar Robertson & Sam Cassell
        967 800 Bill Russell & Wes Unseld
        964 750 Karl Malone & Kareem Abdul-Jabaar
        964 Kareem Abdul-Jabbar & Bob Lanier
        963 David Robinson & Hakeem Olajuwon
        961 Isiah Thomas & Kevin Johnson
        959 Jo Jo White & Hal Greer
        958 Jerry West & Pete Maravich
        955 Walt Frazier & Penny Hardaway
        952 600 Wilt Chamberlain & Arvydas Sabonis
        951 Dominique Wilkins & John Drew
        949 Vince Carter & Kobe Bryant
        949 800 Isiah Thomas & Stephon Marbury
        943 Larry Bird & Chris Webber
        943 Kobe Bryant & Alan Iverson
        942 Rick Barry & John Havlicek
        938 Karl Malone & David Robinson
        938 Bill Laimbeer & Dikembe Mutombo
        935 Jerry West & Paul Westphal
        930 Larry Bird & Billy Cunningham
        929 Kareem Abdul-Jabbar & Charles Barkley
        929 Charles Barkley & Kareem Abdul-Jabaar
        927 Walt Frazier & Gary Payton
        927 Karl Malone and Bob Petit
        924 Vince Carter & Alan Iverson
        922 Grant Hill & Elgin Baylor
        919 Bill Russell & Bill Walton
        918 Shaquille O'Neal & David Robinson
        917 Wilt Chamberlain & Kareem Abdul-Jabaar
        914 George Gervin & David Thompson
        903 Wilt Chamberlain & David Robinson
        902 Larry Bird & Elgin Baylor
        897 750 Jason Kidd & Magic Johnson
        888 Shaquille O'Neal & Hakeem Olajuwon
        887 Michael Jordan & Alan Iverson
        885 Charles Barkley & Karl Malone
        884 Michael Jordan & Vince Carter
        882 John Stockton & Larry Brown
        875 Jerry West & Oscar Robertson
        858 850 Shaquille O'Neal & Wilt Chamberlain
        852 800 Oscar Robertson & Magic Johnson
        848 750 Michael Jordan & Kobe Bryant
        830 750 Michael Jordan & Julius Erving
        263 Shaquille O'Neal & John Stockton
      • Michael K. Tamada
        ... [...] ... The weights are not scalar, but are instead implicitly contained in a matrix. I usually use a stat package called SPSS but I just checked and
        Message 3 of 16 , Sep 12, 2001
        • 0 Attachment
          On Tue, 11 Sep 2001, Dean Oliver wrote:

          > --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:

          [...]

          > First -- I don't know of Mahalonobis distance stuff. Sounds like
          > multivariate regression, though. You may have to do this analysis or
          > point me at software that does it. What is it going to spit out?
          > Weights on the different stats?

          The weights are not scalar, but are instead implicitly contained in a
          matrix. I usually use a stat package called SPSS but I just checked and
          surprisingly, although Mahalanobis distance is calculated and used in a
          number of statistics that it calculates, it doesn't have a command for
          simply computing good ol' Mahalanobis distance.

          However, the formula for Mahalanobis distance is pretty simple. Let x and
          y be vectors of the variables that we're measuring, for two different
          players. E.g. "x" might be Magic's pts per game, assists per game, FG%,
          asst/TO ratio, min/game, etc. etc etc. "y" would be the same stats, but
          for Larry Bird.

          Let S stand for the covariance matrix of all players' stats. (For an
          example of how to calculate the elements of the covariance matrix, see

          http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm

          ).


          Then the Mahalanobis distance is simply (x-y)S^-1(x-y) in matrix
          notation. (The "x-y" are vectors, and S^-1 is the inverse of S.)

          Here's a web-page with some other distance metrics:

          http://www.mathworks.com/access/helpdesk/help/toolbox/stats/pdist.shtml


          However all of these view all the variables as being essentially equally
          important. Hence the possible need for weighting. Or the use of some
          outside rating system or external criteria (e.g. Hall of Fame vs non Hall
          of Fame status, and we could use discriminant analysis or logistic
          regression to calculate the coefficients for predicting HoF status).


          [...career comparisons]

          >We could compare Thomas' first 2 years to Francis', a very useful
          >comparison.

          Yes, good point. Although if we're doing career totals, we'd presumably
          still want a correction for 82-game seasons vs 72-game seasons.

          > I think we want to keep the era correction separate. I can't find
          > where he said it, but I know James wanted to keep it separate.

          Yes, probably best done, as you suggest elsewhere, by having two sets of
          similarity stats: "absolute" and "relative" (or "corrected", or
          "standardized" or whatever we want to call them).


          --MKT
        • Mike Goodman
          ... Excellent move, Dean ... This is one reason I have concentrated on combining all scoring- related data into one scoring ability number. It seems quite
          Message 4 of 16 , Sep 12, 2001
          • 0 Attachment
            --- In APBR_analysis@y..., "Dean Oliver" <deano@t...> wrote:
            > We started discussing this over in APBR, but I think the details of
            > making this work can get technical, so I brought it here.

            Excellent move, Dean
            >
            > One of the problems I had was with redundancy of stats. FG% is
            > reflected in FG and FGA, for example. James didn't worry about it
            > too much, but I do in basketball.
            >
            This is one reason I have concentrated on combining all scoring-
            related data into one "scoring ability" number. It seems quite clear
            to me that "points is points", and likewise, attempts are attempts
            (or possessions used up). Thus the "scoring efficiency", which I
            believe is also a term used in another way, and which implies to me
            that it includes turnovers incurred while attempting to score,
            offensive fouls, and the "ability to get a shot off"; so I might
            prefer to call Pts/(Attempts*2) something like "scoring percentage".
            I also feel comfortable with using a player's ScoPct/.527
            (historical standard ScoPct) as a number to factor into a player's
            points-per-minute rate. I justify this by noting that a high-
            scoring, low-percent scorer on a weak team would just have to shoot
            less (and take higher-percentage shots) on a better team.
            Conversely, a low-scoring, high-percentage shooter on a good team
            would almost certainly be asked to take more shots on a weaker team.
            Generally, his percentage would go down, but possibly his "scoring
            ability" number would be fairly constant as he moves from team to
            team.
            Ty Corbin had such a career spell, as he went from a go-to guy on
            the woeful Wolves, to a contributor on the contending Jazz; his
            minutes and ppg went rollercoastering, but his measurable 'scoring
            ability' was pretty constant.

            >....In the old argument
            of
            > Shawn Kemp, perhaps we find that the most similar players to him
            are
            > all out of the HOF -- then that suggests he isn't that great.
            Maybe
            > his best season compares with those put up by Wilt, KMalone, etc.,
            > suggesting great seasons.
            >
            In one member's standardized numbers, Kemp's career 'abilities'
            are :
            21 pts, 12 reb, 2 ast, 2 blk. This compares to Artis Gilmore, Moses
            Malone. But many fewer minutes for Kemp, and lesser totals.
          • Mike Goodman
            ... ....discriminant analysis and logistical regression .... Euclidean distance, ... distance does NOT take into account the second ... others. ... A friend of
            Message 5 of 16 , Sep 12, 2001
            • 0 Attachment
              --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
              ....discriminant analysis and logistical regression .... Euclidean
              distance,
              > sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the difference
              > between, say, Magic Johnson and Larry Bird ....., Mahalonobis
              distance does NOT take into account the second
              > problem, that certain variables might deserve more weight than
              others.
              >
              A friend of mine says "Anyone who drives faster than me is a fukkin
              maniac, and anyone who drives slower is a goddamn asshole".
              Similarly, I say, anyone who uses less math than me is some kind of
              moron, and whoever uses more must be some kind of geek.

              > Also, after the initial analysis, I'd want to put in some sort of
              > correction for era or game pace. Bob Cousy's 43% career FG% (or
              whatever
              > it was, I'm saying this off the top of my head) reminds me more of
              Isiah
              > Thomas's 46% than it does Alan Iverson's 43%. Despite the
              superficial
              > similarity of Cousy's and Iverson's FG%. (Again I'm not vouching
              for
              > those specific numbers, just saying that I'd rather see the numbers
              in
              > context, i.e. corrected for era and/or game pace.)

              Cousy never once managed to make 40% of his FG during a season; his
              career scoring pct. was .440. (Iverson's is .500; Isiah's was .508).
              >
              > For Hall of Fame purposes, I think discriminant analysis or
              logistic or
              > probit regressions are better than merely measuring distance. I
              did this
              > once for NBA all-stars one season, the predictions were not 100%
              accurate
              > but you could at least separate the players into three groups:
              clear
              > all-stars, clear non-stars, and the "on the bubble" players.
              >
              >
              > --MKT
              >
              >
              Last season, the West selected my top 11 Western players to the
              allstar team, but skipped #12 Nowitzki in favor of teammate Michael
              Finley (#30 or thereabouts).
              Meanwhile the East seemed to pick at random, ignoring most forwards
              as they had ignored all point guards the year before.
            • Mike Goodman
              ... choose ... Another ... probably ... I tried my hand at a variation of the Euclidian distance, since I can understand the formula (and pronounce it, too). I
              Message 6 of 16 , Sep 14, 2001
              • 0 Attachment
                --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
                >.... Euclidean distance,
                > sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the difference
                > between, say, Magic Johnson and Larry Bird in whatever variables we
                choose
                > to look at.
                >
                > But there are problems with Euclidean distance, specfically one that
                > Dean Oliver alludes to: some variables are redundant or
                > partially redundant with each other,
                > e.g. FG Made and Points Scored, or even Off Rebds and Def Rebds.
                Another
                > problem is that not all variables are equally important: some
                probably
                > should be given greater weight than others ...

                I tried my hand at a variation of the Euclidian distance, since I can
                understand the formula (and pronounce it, too).
                I took 5 stats: scoring, rebounding, assists, steals, blocks. I used
                my normalized (standardized) versions. Because points are much more
                abundant than, say, steals, I reduced this difference by taking the
                square root of each stat. I compared the top 31 players on my
                infamous "alltime" list to the other 514 in the list. (I actually
                ran out of columns in Excel, for the first time.)
                The formula is drudgery to type, but it starts like this:
                E = (sqrt(a1)-sqrt(b1))^2 + (sqrt(a2)-sqrt(b2))^2 +... and so on, up
                to a5 and b5, for players a and b, and variables 1-5.
                I did not take the square root of the whole thing, since everything
                was already square-rooted once.
                Not surprisingly, the best players only correspond to other great
                players, but some players have much more unique statistical profiles.
                In order of "greatest distance from the next-closest profile", we
                have:
                Sco Reb Ast Stl Blk E
                Michael Jordan 33.5 6.5 5.1 2.3 .9
                Jerry West 25.1 4.2 6.0 (2.7 .9) .945 (estimated)
                No real surprise that Jordan is the "most unique" statistically.
                Others scored more than West, but didn't have quality numbers beyond
                that.
                (Iverson is next, then Karl Malone(!), Kobe, Gervin, Erving, Bird,
                Wilkins, Dantley, Barry)

                Bill Russell 11.8 14.6 3.8 (1.5 4.0)
                Bill Walton 15.9 12.8 4.0 1.0 2.7 .743
                Really not very similar, but as close as anyone comes to Russell's
                combination of skills.
                (Thurmond is close 2nd, then Sam Lacey, Elmore Smith, Mutombo)

                Magic Johnson 20.6 7.5 10.4 1.9 .4
                Oscar Robertson 22.4 5.3 8.0 (1.5 .3) .644
                Magic was "the next Oscar", and then some.
                (Grant Hill, Payton, Penny, Strickland, Isiah, Drexler, KJ, Frazier)

                John Stockton 17.1 3.3 11.9 2.4 .2
                Isiah Thomas 18.0 3.7 8.8 2.0 .3 .543
                Stockton is just a giant in the assists category.
                (Tim Hardaway, KJ, Strickland, Cousy, Kenny Anderson, Brandon)

                Jerry West 25.1 4.2 6.0 (2.7 .9)
                Allen Iverson 25.1 3.9 5.5 2.1 .2 .517
                Now we have some real across-the-board similarity.
                (Barry, Penny, Kobe, Drexler, Maravich, Oscar, Westphal)

                Oscar Robertson 22.4 5.3 8.0 (1.5 .3)
                Penny Hardaway 20.2 5.1 6.2 1.9 .6 .486
                (KJ, Payton, Frazier, Cassell, Tim Hardaway, Price, Brandon, Magic)

                Moses Malone 21.6 13.2 1.3 .9 1.4
                Shawn Kemp 20.9 11.8 2.2 1.4 1.6 .470
                (Parish, Gilmore, Reed, McDyess, Ewing, Hayes, Haywood, McAdoo)

                Shaquille O'Neal 29.7 12.7 2.8 .7 2.6
                Tim Duncan 25.1 12.0 3.0 .8 2.3 .466
                (Kareem, Robinson, Mikan, Pettit, Ewing, Mourning, Wilt, Hakeem)

                Artis Gilmore 20.3 11.9 2.3 .6 2.3
                Patrick Ewing 23.5 11.1 2.0 1.0 2.6 .446
                (Hayes, Parish, Derrick Coleman, Sabonis, McDyess, Kemp, Gallatin)

                The remainder of the top 31 (and their closest match)

                Kareem AbdulJab. 25.9 10.6 3.4 1.0 2.7
                Tim Duncan 25.1 12.0 3.0 .8 2.3 .288
                (Robinson, Pettit, Mikan, Ewing, Neil Johnston, Shaq, Hakeem)

                Wilt Chamberlain 23.5 14.7 3.5 (1.5 3.0)
                George Mikan 24.8 13.1 2.9 (1.3 2.0) .432
                (Hakeem, Robinson, Duncan, Pettit, Kareem, Ewing)

                Karl Malone 28.1 11.2 3.4 1.4 .8
                Charles Barkley 24.2 12.4 3.8 1.6 .8 .444
                (Pettit, Johnston, Mikan, Baylor, Jeff Ruland, Bird, Duncan, McAdoo)

                Hakeem Olajuwon 23.7 11.7 2.6 1.8 3.2
                David Robinson 26.1 11.8 2.8 1.5 3.3 .275

                Julius Erving 23.0 7.8 4.0 1.9 1.7
                Elgin Baylor 22.5 9.6 3.9 (1.6 1.5) .347
                (Webber, Marques Johnson, Shareef, Johnston, Lanier, Ed Macauley,
                Schayes, Garnett, Bird, Drexler)

                Patrick Ewing 23.5 11.1 2.0 1.0 2.6
                Alonzo Mourning 24.5 10.9 1.6 .7 3.2 .332

                Bob Pettit 24.2 11.7 2.8 (1.3 1.8)
                George Mikan 24.8 13.1 2.9 (1.3 2.0) .231

                Elgin Baylor 22.5 9.6 3.9 (1.6 1.5)
                Chris Webber 21.1 10.1 4.2 1.5 1.8 .215
                (Lanier, Erving, Schayes, Johnston, Shareef, Garnett, Pettit, McAdoo)

                Scottie Pippen 18.4 7.5 5.4 2.1 .9
                Clyde Drexler 20.6 6.7 5.5 2.1 .7 .306
                (Alvan Adams, Connie Hawkins, Toni Kukoc, Billy C., Grant Hill,
                Antoine Walker, Marques Johnson, Penny, Cliff Hagan)

                Clyde-Scottie likewise

                Robert Parish 18.1 11.4 1.5 .9 1.8
                Elvin Hayes 17.8 10.9 1.7 1.0 2.6 .161
                (Gallatin, McDyess, Seikaly, Reed, Larry Foust, Dan Roundfield,
                Sampson, Haywood, Brian Grant)

                Bob Lanier 21.4 10.5 3.3 1.2 1.7
                Dolph Schayes 20.0 10.1 3.1 (1.4 1.6) .194

                (Elvin Hayes-Robert Parish match)

                Rick Barry 21.9 5.5 4.5 2.1 .5
                Kobe Bryant 23.0 5.2 4.2 1.4 .8 .345
                (Chris Mullin, Drexler, Hagan, Moncrief, Penny, Ray Allen)

                Kevin McHale 22.1 8.6 1.8 .4 2.0
                Rik Smits 19.9 8.3 1.8 .6 1.6 .306
                (Lovellete, Darryl Dawkins, Haywood, McAdoo, Yardley, McDyess)

                (George Mikan-Bob Pettit)

                Dan Issel 21.1 8.5 2.2 1.1 .6
                Terry Cummings 19.1 9.3 2.2 1.3 .7 .280
                (Chambers, Ceballos, Calvin Natt, Shareef, Yardley, Glenn Robinson)

                Clearly, as one goes down the list into more "ordinary" players,
                there is a proliferation of close profiles.


                Mike Goodman

                > >
                > >
              • harlanzo@yahoo.com
                It occurred to me that when comparing players through their statistics should we be weighting the comparisons so that some statistics are more important based
                Message 7 of 16 , Sep 15, 2001
                • 0 Attachment
                  It occurred to me that when comparing players through their
                  statistics should we be weighting the comparisons so that some
                  statistics are more important based on positions? For example, when
                  comparing point guards the assist category might be more important
                  for weighing similarity than rebound category. Conversely, do we
                  really care whether two centers have similar assist numbers if their
                  points, rebounds, and fg % are similar? I think this sounds somewhat
                  right with some notable exceptions. The counter argument of course
                  is that centers who pass well (a la Walton) or shoot 3s well
                  (Laimbeer and Sikma) are unique and the similarity scores will help
                  identify players with similar rare skill sets. (To digress, I wonder
                  if Jason Kidd and some of the Darrell Walker early 90s seasons are
                  comparable). I am beginning to babble but I think that the question
                  I am asking is whether positional demands should change how we weight
                  statistical categories when we try to apply similarity scores?


                  --- In APBR_analysis@y..., "Mike Goodman" <msg_53@h...> wrote:
                  > --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
                  > >.... Euclidean distance,
                  > > sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the
                  difference
                  > > between, say, Magic Johnson and Larry Bird in whatever variables
                  we
                  > choose
                  > > to look at.
                  > >
                  > > But there are problems with Euclidean distance, specfically one
                  that
                  > > Dean Oliver alludes to: some variables are redundant or
                  > > partially redundant with each other,
                  > > e.g. FG Made and Points Scored, or even Off Rebds and Def Rebds.
                  > Another
                  > > problem is that not all variables are equally important: some
                  > probably
                  > > should be given greater weight than others ...
                  >
                  > I tried my hand at a variation of the Euclidian distance, since I
                  can
                  > understand the formula (and pronounce it, too).
                  > I took 5 stats: scoring, rebounding, assists, steals, blocks. I
                  used
                  > my normalized (standardized) versions. Because points are much
                  more
                  > abundant than, say, steals, I reduced this difference by taking the
                  > square root of each stat. I compared the top 31 players on my
                  > infamous "alltime" list to the other 514 in the list. (I actually
                  > ran out of columns in Excel, for the first time.)
                  > The formula is drudgery to type, but it starts like this:
                  > E = (sqrt(a1)-sqrt(b1))^2 + (sqrt(a2)-sqrt(b2))^2 +... and so on,
                  up
                  > to a5 and b5, for players a and b, and variables 1-5.
                  > I did not take the square root of the whole thing, since everything
                  > was already square-rooted once.
                  > Not surprisingly, the best players only correspond to other great
                  > players, but some players have much more unique statistical
                  profiles.
                  > In order of "greatest distance from the next-closest profile", we
                  > have:
                  > Sco Reb Ast Stl Blk E
                  > Michael Jordan 33.5 6.5 5.1 2.3 .9
                  > Jerry West 25.1 4.2 6.0 (2.7 .9) .945 (estimated)
                  > No real surprise that Jordan is the "most unique" statistically.
                  > Others scored more than West, but didn't have quality numbers
                  beyond
                  > that.
                  > (Iverson is next, then Karl Malone(!), Kobe, Gervin, Erving, Bird,
                  > Wilkins, Dantley, Barry)
                  >
                  > Bill Russell 11.8 14.6 3.8 (1.5 4.0)
                  > Bill Walton 15.9 12.8 4.0 1.0 2.7 .743
                  > Really not very similar, but as close as anyone comes to Russell's
                  > combination of skills.
                  > (Thurmond is close 2nd, then Sam Lacey, Elmore Smith, Mutombo)
                  >
                  > Magic Johnson 20.6 7.5 10.4 1.9 .4
                  > Oscar Robertson 22.4 5.3 8.0 (1.5 .3) .644
                  > Magic was "the next Oscar", and then some.
                  > (Grant Hill, Payton, Penny, Strickland, Isiah, Drexler, KJ, Frazier)
                  >
                  > John Stockton 17.1 3.3 11.9 2.4 .2
                  > Isiah Thomas 18.0 3.7 8.8 2.0 .3 .543
                  > Stockton is just a giant in the assists category.
                  > (Tim Hardaway, KJ, Strickland, Cousy, Kenny Anderson, Brandon)
                  >
                  > Jerry West 25.1 4.2 6.0 (2.7 .9)
                  > Allen Iverson 25.1 3.9 5.5 2.1 .2 .517
                  > Now we have some real across-the-board similarity.
                  > (Barry, Penny, Kobe, Drexler, Maravich, Oscar, Westphal)
                  >
                  > Oscar Robertson 22.4 5.3 8.0 (1.5 .3)
                  > Penny Hardaway 20.2 5.1 6.2 1.9 .6 .486
                  > (KJ, Payton, Frazier, Cassell, Tim Hardaway, Price, Brandon, Magic)
                  >
                  > Moses Malone 21.6 13.2 1.3 .9 1.4
                  > Shawn Kemp 20.9 11.8 2.2 1.4 1.6 .470
                  > (Parish, Gilmore, Reed, McDyess, Ewing, Hayes, Haywood, McAdoo)
                  >
                  > Shaquille O'Neal 29.7 12.7 2.8 .7 2.6
                  > Tim Duncan 25.1 12.0 3.0 .8 2.3 .466
                  > (Kareem, Robinson, Mikan, Pettit, Ewing, Mourning, Wilt, Hakeem)
                  >
                  > Artis Gilmore 20.3 11.9 2.3 .6 2.3
                  > Patrick Ewing 23.5 11.1 2.0 1.0 2.6 .446
                  > (Hayes, Parish, Derrick Coleman, Sabonis, McDyess, Kemp, Gallatin)
                  >
                  > The remainder of the top 31 (and their closest match)
                  >
                  > Kareem AbdulJab. 25.9 10.6 3.4 1.0 2.7
                  > Tim Duncan 25.1 12.0 3.0 .8 2.3 .288
                  > (Robinson, Pettit, Mikan, Ewing, Neil Johnston, Shaq, Hakeem)
                  >
                  > Wilt Chamberlain 23.5 14.7 3.5 (1.5 3.0)
                  > George Mikan 24.8 13.1 2.9 (1.3 2.0) .432
                  > (Hakeem, Robinson, Duncan, Pettit, Kareem, Ewing)
                  >
                  > Karl Malone 28.1 11.2 3.4 1.4 .8
                  > Charles Barkley 24.2 12.4 3.8 1.6 .8 .444
                  > (Pettit, Johnston, Mikan, Baylor, Jeff Ruland, Bird, Duncan, McAdoo)
                  >
                  > Hakeem Olajuwon 23.7 11.7 2.6 1.8 3.2
                  > David Robinson 26.1 11.8 2.8 1.5 3.3 .275
                  >
                  > Julius Erving 23.0 7.8 4.0 1.9 1.7
                  > Elgin Baylor 22.5 9.6 3.9 (1.6 1.5) .347
                  > (Webber, Marques Johnson, Shareef, Johnston, Lanier, Ed Macauley,
                  > Schayes, Garnett, Bird, Drexler)
                  >
                  > Patrick Ewing 23.5 11.1 2.0 1.0 2.6
                  > Alonzo Mourning 24.5 10.9 1.6 .7 3.2 .332
                  >
                  > Bob Pettit 24.2 11.7 2.8 (1.3 1.8)
                  > George Mikan 24.8 13.1 2.9 (1.3 2.0) .231
                  >
                  > Elgin Baylor 22.5 9.6 3.9 (1.6 1.5)
                  > Chris Webber 21.1 10.1 4.2 1.5 1.8 .215
                  > (Lanier, Erving, Schayes, Johnston, Shareef, Garnett, Pettit,
                  McAdoo)
                  >
                  > Scottie Pippen 18.4 7.5 5.4 2.1 .9
                  > Clyde Drexler 20.6 6.7 5.5 2.1 .7 .306
                  > (Alvan Adams, Connie Hawkins, Toni Kukoc, Billy C., Grant Hill,
                  > Antoine Walker, Marques Johnson, Penny, Cliff Hagan)
                  >
                  > Clyde-Scottie likewise
                  >
                  > Robert Parish 18.1 11.4 1.5 .9 1.8
                  > Elvin Hayes 17.8 10.9 1.7 1.0 2.6 .161
                  > (Gallatin, McDyess, Seikaly, Reed, Larry Foust, Dan Roundfield,
                  > Sampson, Haywood, Brian Grant)
                  >
                  > Bob Lanier 21.4 10.5 3.3 1.2 1.7
                  > Dolph Schayes 20.0 10.1 3.1 (1.4 1.6) .194
                  >
                  > (Elvin Hayes-Robert Parish match)
                  >
                  > Rick Barry 21.9 5.5 4.5 2.1 .5
                  > Kobe Bryant 23.0 5.2 4.2 1.4 .8 .345
                  > (Chris Mullin, Drexler, Hagan, Moncrief, Penny, Ray Allen)
                  >
                  > Kevin McHale 22.1 8.6 1.8 .4 2.0
                  > Rik Smits 19.9 8.3 1.8 .6 1.6 .306
                  > (Lovellete, Darryl Dawkins, Haywood, McAdoo, Yardley, McDyess)
                  >
                  > (George Mikan-Bob Pettit)
                  >
                  > Dan Issel 21.1 8.5 2.2 1.1 .6
                  > Terry Cummings 19.1 9.3 2.2 1.3 .7 .280
                  > (Chambers, Ceballos, Calvin Natt, Shareef, Yardley, Glenn Robinson)
                  >
                  > Clearly, as one goes down the list into more "ordinary" players,
                  > there is a proliferation of close profiles.
                  >
                  >
                  > Mike Goodman
                  >
                  > > >
                  > > >
                • deano@tsoft.com
                  ... Yes and No. What we re trying to come up with here is a general set of rules that can be applied at default (as a basis for studies, that can be
                  Message 8 of 16 , Sep 16, 2001
                  • 0 Attachment
                    --- In APBR_analysis@y..., harlanzo@y... wrote:
                    > It occurred to me that when comparing players through their
                    > statistics should we be weighting the comparisons so that some
                    > statistics are more important based on positions?

                    Yes and No. What we're trying to come up with here is a general set
                    of rules that can be applied at default (as a basis for studies,
                    that can be modified). James always said that the method's blessing
                    and curse was its flexibility. We SHOULD modify it for specific
                    comparisons -- perhaps among point guards. There will always be a
                    lot of different versions around, but we want one set for general
                    comparisons, in part because, using your example, we can't
                    necessarily identify who point guards are.

                    I also thought of a reason not to use Euclidean distance -- it
                    weights big differences too much. At least that is the subjective
                    opinion a lot of times. It's the old argument between standard
                    deviation and mean absolute difference -- the first weights big
                    differences a lot but is mathematically easier, but the second seems
                    to reflect more of what we want. The similarity scores, as James did
                    them and as I modified them, fit into the mean absolute difference
                    category. In Mike's categories, then, this implies that there is
                    likely one very big difference between Jordan's numbers and everyone
                    else (probably scoring average) -- that gets emphasized, making him
                    the most unique player. I'd like to take a stab at career similarity
                    scores using the approach I've outlined to see whether it id's Jordan
                    as most unique, too.

                    MikeG -- While I like the comparisons you did, there are 2 comments I
                    would make:

                    1. I'd like to see some non-standardized comparisons. I do like the
                    standardized because they make some sense, but I think
                    non-standardized will also tell a story.

                    2. You really need some comparison of shooting percentages and
                    turnovers. It really caught my eye with the Duncan-Kareem
                    comparison. I see some similarity between these two, but there are
                    big differences in offensive efficiency. Kareem was nearly
                    unstoppable offensively - my floor%'s and offensive efficiencies
                    reflect that. Duncan is very stoppable, his offensive rating and
                    floor percentage blending in to be about average. Kareem fell to
                    average offensively only in his last year. (I also don't think that
                    Kareem was the defensive force that Duncan is, but my memories are
                    biased by the Kareem post-'80, when he wasn't as good as he was when
                    younger.)

                    Dean Oliver
                    Journal of Basketball Studies


                    > For example,
                    when
                    > comparing point guards the assist category might be more important
                    > for weighing similarity than rebound category. Conversely, do we
                    > really care whether two centers have similar assist numbers if
                    their
                    > points, rebounds, and fg % are similar? I think this sounds
                    somewhat
                    > right with some notable exceptions. The counter argument of course
                    > is that centers who pass well (a la Walton) or shoot 3s well
                    > (Laimbeer and Sikma) are unique and the similarity scores will help
                    > identify players with similar rare skill sets. (To digress, I
                    wonder
                    > if Jason Kidd and some of the Darrell Walker early 90s seasons are
                    > comparable). I am beginning to babble but I think that the
                    question
                    > I am asking is whether positional demands should change how we
                    weight
                    > statistical categories when we try to apply similarity scores?
                    >
                    >
                    > --- In APBR_analysis@y..., "Mike Goodman" <msg_53@h...> wrote:
                    > > --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...>
                    wrote:
                    > > >.... Euclidean distance,
                    > > > sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the
                    > difference
                    > > > between, say, Magic Johnson and Larry Bird in whatever
                    variables
                    > we
                    > > choose
                    > > > to look at.
                    > > >
                    > > > But there are problems with Euclidean distance, specfically one
                    > that
                    > > > Dean Oliver alludes to: some variables are redundant or
                    > > > partially redundant with each other,
                    > > > e.g. FG Made and Points Scored, or even Off Rebds and Def
                    Rebds.
                    > > Another
                    > > > problem is that not all variables are equally important: some
                    > > probably
                    > > > should be given greater weight than others ...
                    > >
                    > > I tried my hand at a variation of the Euclidian distance, since I
                    > can
                    > > understand the formula (and pronounce it, too).
                    > > I took 5 stats: scoring, rebounding, assists, steals, blocks. I
                    > used
                    > > my normalized (standardized) versions. Because points are much
                    > more
                    > > abundant than, say, steals, I reduced this difference by taking
                    the
                    > > square root of each stat. I compared the top 31 players on my
                    > > infamous "alltime" list to the other 514 in the list. (I
                    actually
                    > > ran out of columns in Excel, for the first time.)
                    > > The formula is drudgery to type, but it starts like this:
                    > > E = (sqrt(a1)-sqrt(b1))^2 + (sqrt(a2)-sqrt(b2))^2 +... and so on,
                    > up
                    > > to a5 and b5, for players a and b, and variables 1-5.
                    > > I did not take the square root of the whole thing, since
                    everything
                    > > was already square-rooted once.
                    > > Not surprisingly, the best players only correspond to other great
                    > > players, but some players have much more unique statistical
                    > profiles.
                    > > In order of "greatest distance from the next-closest profile", we
                    > > have:
                    > > Sco Reb Ast Stl Blk E
                    > > Michael Jordan 33.5 6.5 5.1 2.3 .9
                    > > Jerry West 25.1 4.2 6.0 (2.7 .9) .945 (estimated)
                    > > No real surprise that Jordan is the "most unique" statistically.

                    > > Others scored more than West, but didn't have quality numbers
                    > beyond
                    > > that.
                    > > (Iverson is next, then Karl Malone(!), Kobe, Gervin, Erving,
                    Bird,
                    > > Wilkins, Dantley, Barry)
                    > >
                    > > Bill Russell 11.8 14.6 3.8 (1.5 4.0)
                    > > Bill Walton 15.9 12.8 4.0 1.0 2.7 .743
                    > > Really not very similar, but as close as anyone comes to
                    Russell's
                    > > combination of skills.
                    > > (Thurmond is close 2nd, then Sam Lacey, Elmore Smith, Mutombo)
                    > >
                    > > Magic Johnson 20.6 7.5 10.4 1.9 .4
                    > > Oscar Robertson 22.4 5.3 8.0 (1.5 .3) .644
                    > > Magic was "the next Oscar", and then some.
                    > > (Grant Hill, Payton, Penny, Strickland, Isiah, Drexler, KJ,
                    Frazier)
                    > >
                    > > John Stockton 17.1 3.3 11.9 2.4 .2
                    > > Isiah Thomas 18.0 3.7 8.8 2.0 .3 .543
                    > > Stockton is just a giant in the assists category.
                    > > (Tim Hardaway, KJ, Strickland, Cousy, Kenny Anderson, Brandon)
                    > >
                    > > Jerry West 25.1 4.2 6.0 (2.7 .9)
                    > > Allen Iverson 25.1 3.9 5.5 2.1 .2 .517
                    > > Now we have some real across-the-board similarity.
                    > > (Barry, Penny, Kobe, Drexler, Maravich, Oscar, Westphal)
                    > >
                    > > Oscar Robertson 22.4 5.3 8.0 (1.5 .3)
                    > > Penny Hardaway 20.2 5.1 6.2 1.9 .6 .486
                    > > (KJ, Payton, Frazier, Cassell, Tim Hardaway, Price, Brandon,
                    Magic)
                    > >
                    > > Moses Malone 21.6 13.2 1.3 .9 1.4
                    > > Shawn Kemp 20.9 11.8 2.2 1.4 1.6 .470
                    > > (Parish, Gilmore, Reed, McDyess, Ewing, Hayes, Haywood, McAdoo)
                    > >
                    > > Shaquille O'Neal 29.7 12.7 2.8 .7 2.6
                    > > Tim Duncan 25.1 12.0 3.0 .8 2.3 .466
                    > > (Kareem, Robinson, Mikan, Pettit, Ewing, Mourning, Wilt, Hakeem)
                    > >
                    > > Artis Gilmore 20.3 11.9 2.3 .6 2.3
                    > > Patrick Ewing 23.5 11.1 2.0 1.0 2.6 .446
                    > > (Hayes, Parish, Derrick Coleman, Sabonis, McDyess, Kemp,
                    Gallatin)
                    > >
                    > > The remainder of the top 31 (and their closest match)
                    > >
                    > > Kareem AbdulJab. 25.9 10.6 3.4 1.0 2.7
                    > > Tim Duncan 25.1 12.0 3.0 .8 2.3 .288
                    > > (Robinson, Pettit, Mikan, Ewing, Neil Johnston, Shaq, Hakeem)
                    > >
                    > > Wilt Chamberlain 23.5 14.7 3.5 (1.5 3.0)
                    > > George Mikan 24.8 13.1 2.9 (1.3 2.0) .432
                    > > (Hakeem, Robinson, Duncan, Pettit, Kareem, Ewing)
                    > >
                    > > Karl Malone 28.1 11.2 3.4 1.4 .8
                    > > Charles Barkley 24.2 12.4 3.8 1.6 .8 .444
                    > > (Pettit, Johnston, Mikan, Baylor, Jeff Ruland, Bird, Duncan,
                    McAdoo)
                    > >
                    > > Hakeem Olajuwon 23.7 11.7 2.6 1.8 3.2
                    > > David Robinson 26.1 11.8 2.8 1.5 3.3 .275
                    > >
                    > > Julius Erving 23.0 7.8 4.0 1.9 1.7
                    > > Elgin Baylor 22.5 9.6 3.9 (1.6 1.5) .347
                    > > (Webber, Marques Johnson, Shareef, Johnston, Lanier, Ed Macauley,
                    > > Schayes, Garnett, Bird, Drexler)
                    > >
                    > > Patrick Ewing 23.5 11.1 2.0 1.0 2.6
                    > > Alonzo Mourning 24.5 10.9 1.6 .7 3.2 .332
                    > >
                    > > Bob Pettit 24.2 11.7 2.8 (1.3 1.8)
                    > > George Mikan 24.8 13.1 2.9 (1.3 2.0) .231
                    > >
                    > > Elgin Baylor 22.5 9.6 3.9 (1.6 1.5)
                    > > Chris Webber 21.1 10.1 4.2 1.5 1.8 .215
                    > > (Lanier, Erving, Schayes, Johnston, Shareef, Garnett, Pettit,
                    > McAdoo)
                    > >
                    > > Scottie Pippen 18.4 7.5 5.4 2.1 .9
                    > > Clyde Drexler 20.6 6.7 5.5 2.1 .7 .306
                    > > (Alvan Adams, Connie Hawkins, Toni Kukoc, Billy C., Grant Hill,
                    > > Antoine Walker, Marques Johnson, Penny, Cliff Hagan)
                    > >
                    > > Clyde-Scottie likewise
                    > >
                    > > Robert Parish 18.1 11.4 1.5 .9 1.8
                    > > Elvin Hayes 17.8 10.9 1.7 1.0 2.6 .161
                    > > (Gallatin, McDyess, Seikaly, Reed, Larry Foust, Dan Roundfield,
                    > > Sampson, Haywood, Brian Grant)
                    > >
                    > > Bob Lanier 21.4 10.5 3.3 1.2 1.7
                    > > Dolph Schayes 20.0 10.1 3.1 (1.4 1.6) .194
                    > >
                    > > (Elvin Hayes-Robert Parish match)
                    > >
                    > > Rick Barry 21.9 5.5 4.5 2.1 .5
                    > > Kobe Bryant 23.0 5.2 4.2 1.4 .8 .345
                    > > (Chris Mullin, Drexler, Hagan, Moncrief, Penny, Ray Allen)
                    > >
                    > > Kevin McHale 22.1 8.6 1.8 .4 2.0
                    > > Rik Smits 19.9 8.3 1.8 .6 1.6 .306
                    > > (Lovellete, Darryl Dawkins, Haywood, McAdoo, Yardley, McDyess)
                    > >
                    > > (George Mikan-Bob Pettit)
                    > >
                    > > Dan Issel 21.1 8.5 2.2 1.1 .6
                    > > Terry Cummings 19.1 9.3 2.2 1.3 .7 .280
                    > > (Chambers, Ceballos, Calvin Natt, Shareef, Yardley, Glenn
                    Robinson)
                    > >
                    > > Clearly, as one goes down the list into more "ordinary" players,
                    > > there is a proliferation of close profiles.
                    > >
                    > >
                    > > Mike Goodman
                    > >
                    > > > >
                    > > > >
                  • msg_53@hotmail.com
                    Personally, I don t ever consider position to be a quantifiable statistic. Many forwards have been forced to play center; many forwards are not clearly
                    Message 9 of 16 , Sep 16, 2001
                    • 0 Attachment
                      Personally, I don't ever consider 'position' to be a quantifiable
                      statistic. Many forwards have been forced to play center; many
                      forwards are not clearly 'power' or 'small' forwards; many players
                      are not exclusively guards or forwards; many versatile guards do
                      plenty of scoring and passing, and rebounding.
                      The possible fragmenting of these lists is virtually infinite. An
                      assist from a center is exactly as important as an assist from a
                      guard. A rebounding guard, a center who gets steals as well as
                      blocks, all these things make a player unique, or at least
                      differentiate him from the norm.
                      The issue of 3-point shooting might be worth looking into. How one
                      goes about racking up one's scoring totals is of some interest. Then
                      again, it might invite breaking down points into dunks, layups, etc.
                      In the end, points are points. A player's scoring may come from
                      inside moves when he is young, and from outside shots later. The
                      contribution is still the same.
                      One thing these similarity indexes do reveal, is that there are
                      some 'classic' profiles by position. Wilt, Kareem, Hakeem, Shaq,
                      Robinson, Ewing, Moses, Gilmore, all averaged 22-28 pts, 12-15 reb, 2-
                      3 blocks. But the well-rounded centers seem to have enjoyed more
                      success.
                      The demands of one's position are somewhat situational. The best
                      players can usually do whatever is most needed.

                      --- In APBR_analysis@y..., harlanzo@y... wrote:
                      > It occurred to me that when comparing players through their
                      > statistics should we be weighting the comparisons so that some
                      > statistics are more important based on positions? For example,
                      when
                      > comparing point guards the assist category might be more important
                      > for weighing similarity than rebound category. Conversely, do we
                      > really care whether two centers have similar assist numbers if
                      their
                      > points, rebounds, and fg % are similar? I think this sounds
                      somewhat
                      > right with some notable exceptions. The counter argument of course
                      > is that centers who pass well (a la Walton) or shoot 3s well
                      > (Laimbeer and Sikma) are unique and the similarity scores will help
                      > identify players with similar rare skill sets. (To digress, I
                      wonder
                      > if Jason Kidd and some of the Darrell Walker early 90s seasons are
                      > comparable). I am beginning to babble but I think that the
                      question
                      > I am asking is whether positional demands should change how we
                      weight
                      > statistical categories when we try to apply similarity scores?
                      >
                      >
                    • msg_53@hotmail.com
                      ... seems ... I operate under the assumption that points and rebounds are equally important as contributions; so are steals and blocks, but almost everyone
                      Message 10 of 16 , Sep 16, 2001
                      • 0 Attachment
                        --- In APBR_analysis@y..., deano@t... wrote:
                        >..... a reason not to use Euclidean distance -- it
                        > weights big differences too much. At least that is the subjective
                        > opinion a lot of times. It's the old argument between standard
                        > deviation and mean absolute difference -- the first weights big
                        > differences a lot but is mathematically easier, but the second
                        seems
                        > to reflect more of what we want.

                        I operate under the assumption that points and rebounds are equally
                        important as contributions; so are steals and blocks, but almost
                        everyone gets fewer than 2-3 of these, so it seems fair to weigh them
                        less. Taking the standard deviation from the mean gives you the
                        burden of assigning a weight to the statistical category. I avoid
                        this by presuming that bigger numbers implies bigger weights. That
                        is, scoring is and should be more important than, say, steals.
                        (I did reduce the 'difference' factor by taking their square roots.)

                        > The similarity scores, as James did
                        > them and as I modified them, fit into the mean absolute difference
                        > category. In Mike's categories, then, this implies that there is
                        > likely one very big difference between Jordan's numbers and
                        everyone
                        > else (probably scoring average) -- that gets emphasized, making him
                        > the most unique player. I'd like to take a stab at career
                        similarity
                        > scores using the approach I've outlined to see whether it id's
                        Jordan
                        > as most unique, too.
                        >
                        > MikeG -- While I like the comparisons you did, there are 2 comments
                        I
                        > would make:
                        >
                        > 1. I'd like to see some non-standardized comparisons. I do like
                        the
                        > standardized because they make some sense, but I think
                        > non-standardized will also tell a story.

                        Dean, you could do raw averages, but players from the 60s would only
                        compare to players in the 60s. Actually, a great rebounder in the
                        90s would seem to compare to an average rebounder in the 60s, for
                        example.
                        I don't have a ready database of raw averages.

                        > 2. You really need some comparison of shooting percentages and
                        > turnovers. It really caught my eye with the Duncan-Kareem
                        > comparison. I see some similarity between these two, but there are
                        > big differences in offensive efficiency. Kareem was nearly
                        > unstoppable offensively - my floor%'s and offensive efficiencies
                        > reflect that. Duncan is very stoppable, his offensive rating and
                        > floor percentage blending in to be about average. Kareem fell to
                        > average offensively only in his last year. (I also don't think
                        that
                        > Kareem was the defensive force that Duncan is, but my memories are
                        > biased by the Kareem post-'80, when he wasn't as good as he was
                        when
                        > younger.)
                        >
                        > Dean Oliver
                        > Journal of Basketball Studies

                        Shooting percentages are part of what determines my standardized
                        scoring rate, along with game pace (defined as points allowed). I
                        only did career totals, so Kareem's incredibly long career has been
                        smoothed over, and his very dominant early seasons are not truly
                        reflected. Maybe Duncan has peaked, and his career averages really
                        won't rank close to Kareem's.
                        Further, Duncan's offensive numbers, in my system, get a big boost
                        from his being on a great defensive team. You have to agree his
                        offensive strength is way above average on his team. In other words,
                        the go-to guy on the championship Spurs is going to rate favorably to
                        the go-to guy on the champion Bucks from 30 years before, in my
                        system.

                        Mike Goodman
                        >
                        >
                        > > > > >
                      • Dean Oliver
                        ... only ... I think this is what I was interested in. I was curious who from today would fit in the 60 s. Or, more interestingly, who from the 70 s might
                        Message 11 of 16 , Sep 17, 2001
                        • 0 Attachment
                          --- In APBR_analysis@y..., msg_53@h... wrote:
                          > > 1. I'd like to see some non-standardized comparisons. I do like
                          > the
                          > > standardized because they make some sense, but I think
                          > > non-standardized will also tell a story.
                          >
                          > Dean, you could do raw averages, but players from the 60s would
                          only
                          > compare to players in the 60s. Actually, a great rebounder in the
                          > 90s would seem to compare to an average rebounder in the 60s, for
                          > example.
                          > I don't have a ready database of raw averages.
                          >

                          I think this is what I was interested in. I was curious who from
                          today would fit in the '60's. Or, more interestingly, who from the
                          '70's might fit in today's game. Are West's raw #'s similar to
                          Iverson's or to Richmond's? What happens in baseball is that
                          outstanding players tend to be dissimilar to other players in their
                          era, but similar to outstanding players of other eras. I have doubt
                          that this would happen in basketball, using raw #'s, because of the
                          style change. You seem to be saying the same thing.

                          (I didn't realize that you don't have a db of raw#'s.)

                          > > 2. You really need some comparison of shooting percentages and
                          > > turnovers. It really caught my eye with the Duncan-Kareem
                          > > comparison. I see some similarity between these two, but there
                          are
                          > > big differences in offensive efficiency. Kareem was nearly
                          > > unstoppable offensively - my floor%'s and offensive efficiencies
                          > > reflect that. Duncan is very stoppable, his offensive rating and
                          > > floor percentage blending in to be about average. Kareem fell to
                          > > average offensively only in his last year. (I also don't think
                          > that
                          > > Kareem was the defensive force that Duncan is, but my memories
                          are
                          > > biased by the Kareem post-'80, when he wasn't as good as he was
                          > when
                          > > younger.)
                          >
                          > Shooting percentages are part of what determines my standardized
                          > scoring rate, along with game pace (defined as points allowed). I
                          > only did career totals, so Kareem's incredibly long career has been
                          > smoothed over, and his very dominant early seasons are not truly
                          > reflected.

                          One of my personal quibbles with all the tendex-like rating systems
                          out there is there is that they do combine offensive with defensive
                          contributions. There is a big difference in my mind between Moses
                          Malone, who was an offensive force, and Hakeem Olajuwon, who has been
                          dominant defensively. Both were good in the other thing, but
                          dominant in just one. Kareem was dominant offensively (and probably
                          defensively) early on. Duncan has been dominant defensively, not
                          offensively. (Duncan appears to have more of the competitive fight
                          than Kareem, but, again, I missed the early Kareem.)

                          > Maybe Duncan has peaked, and his career averages really
                          > won't rank close to Kareem's.

                          I don't think I'd say that Duncan's peaked. He's been pretty
                          remarkably consistent since entering the league. Maybe it's only
                          remarkable that he stayed in school long enough to actually be ready
                          for the league when entering.

                          > Further, Duncan's offensive numbers, in my system, get a big boost
                          > from his being on a great defensive team. You have to agree his
                          > offensive strength is way above average on his team.

                          Depending on how you define "average", but, yeah, Duncan looks better
                          offensively than he really is because he plays on a great defensive
                          team. (He would make most teams better defensively, too.)

                          > Personally, I don't ever consider 'position' to be a quantifiable
                          > statistic.

                          James defined numbers to positions for defensive purposes (a
                          shortstop is much more valuable to a defense than a 1st baseman, for
                          example). That might be necessary for some of the older guys because
                          defensive stats really don't exist in the '60's and early '70's. But
                          we can probably still assume that a center was the most important
                          defensive player back then, as he is now. This gets adequately
                          reflected in blocks, steals, and defensive boards, but you do need
                          those #'s.

                          > assist from a center is exactly as important as an assist from a
                          > guard.

                          Only a minor point here -- this is not precisely true (though
                          probably true enough for government work). Assists from guards tend
                          to be more valuable. This is because they often have to make the
                          tougher pass than big men. The weight on an assist is proportional
                          to the expected FG% of the guy he passes to. Historically, big men
                          have had higher FG% than guards -- hence their assists are weighted
                          less. (The assists of the best shooting player on a team are less
                          valuable than the assists of the guys getting him the ball.) This
                          has changed with the 3 pt shot, but it's a conversion from FG% to
                          effective FG%...

                          Dean Oliver
                          Journal of Basketball Studies
                        • Mike Goodman
                          ... My raw totals and per-game averages are contained in my season files, along with team totals and averages for that season. My composite lists only have
                          Message 12 of 16 , Sep 18, 2001
                          • 0 Attachment
                            --- In APBR_analysis@y..., "Dean Oliver" <deano@t...> wrote:
                            > (I didn't realize that you don't have a db of raw#'s.)
                            >
                            My raw totals and per-game averages are contained in my 'season'
                            files, along with team totals and averages for that season. My
                            composite lists only have the 'standardized' rates. From those
                            rates, I can generate 'equivalent totals'. For 'average'
                            scoring/rebounding teams, these would be equal to raw season totals.

                            >
                            > One of my personal quibbles with all the tendex-like rating systems
                            > out there is there is that they do combine offensive with defensive
                            > contributions. There is a big difference in my mind between Moses
                            > Malone, who was an offensive force, and Hakeem Olajuwon, who has
                            been
                            > dominant defensively. Both were good in the other thing, but
                            > dominant in just one. Kareem was dominant offensively (and
                            probably
                            > defensively) early on. Duncan has been dominant defensively, not
                            > offensively. (Duncan appears to have more of the competitive fight
                            > than Kareem, but, again, I missed the early Kareem.)

                            I get your point, Dean, but your examples don't seem the clearest.
                            Olajuwan is better than Malone because he has all the offense Malone
                            had PLUS defense. Never seen the Dream shake?
                            Duncan has virtually all the offense Kareem had, averaged over their
                            careers, according to my numbers. Kareem did maintain a great
                            shooting pct., but Duncan plays in an era of universally-tough D.

                            > I don't think I'd say that Duncan's peaked. He's been pretty
                            > remarkably consistent since entering the league. Maybe it's only
                            > remarkable that he stayed in school long enough to actually be
                            ready
                            > for the league when entering.

                            Some guys enter the league at full strength: Wilt, Oscar, Kareem,
                            Robinson, never improved beyond their first 3 years. Others start as
                            near- superstars, then several years along suddenly shift into true
                            superstar mode: Magic, Bird, Olajuwon, ...

                            >
                            > Depending on how you define "average", but, yeah, Duncan looks
                            better
                            > offensively than he really is because he plays on a great defensive
                            > team. (He would make most teams better defensively, too.)

                            Don't know how a guy 'looks better than he really is', DeanO.

                            >Assists from guards tend
                            > to be more valuable. This is because they often have to make the
                            > tougher pass than big men. The weight on an assist is proportional
                            > to the expected FG% of the guy he passes to. Historically, big men
                            > have had higher FG% than guards -- hence their assists are weighted
                            > less. (The assists of the best shooting player on a team are less
                            > valuable than the assists of the guys getting him the ball.) This
                            > has changed with the 3 pt shot, but it's a conversion from FG% to
                            > effective FG%...
                            >
                            > Dean Oliver
                            > Journal of Basketball Studies

                            This is fun, splitting hairs!
                            If your center kicks out 3 nice passes to guards, who only hit one of
                            the 3 shots, the center only gets one assist.
                            The guard can make 3 nice passes inside, 2 of which may be converted,
                            so he gets 2 assists.
                            So an equally valid argument is that assists from guards
                            are 'easier', and assists from centers are 'undercounted'.
                            I say they are equal.

                            Perhaps more to the issue, evaluate which players make those
                            practical passes which may or may not get them an assist, versus
                            those who will not give up the ball unless it gets them an assist. I
                            can't discern the 2 types from the statistics, but I know it when I
                            see it. (It might be partly discernible in that old assist/turnover
                            ratio.)


                            Mike Goodman
                          • Dean Oliver
                            ... systems ... defensive ... fight ... Olajuwon was very solid offensively (not stellar, like Kareem) -- I didn t mean to imply otherwise. Malone was just
                            Message 13 of 16 , Sep 18, 2001
                            • 0 Attachment
                              --- In APBR_analysis@y..., "Mike Goodman" <msg_53@h...> wrote:
                              > > One of my personal quibbles with all the tendex-like rating
                              systems
                              > > out there is there is that they do combine offensive with
                              defensive
                              > > contributions. There is a big difference in my mind between Moses
                              > > Malone, who was an offensive force, and Hakeem Olajuwon, who has
                              > been
                              > > dominant defensively. Both were good in the other thing, but
                              > > dominant in just one. Kareem was dominant offensively (and
                              > probably
                              > > defensively) early on. Duncan has been dominant defensively, not
                              > > offensively. (Duncan appears to have more of the competitive
                              fight
                              > > than Kareem, but, again, I missed the early Kareem.)
                              >
                              > I get your point, Dean, but your examples don't seem the clearest.
                              > Olajuwan is better than Malone because he has all the offense Malone
                              > had PLUS defense. Never seen the Dream shake?
                              > Duncan has virtually all the offense Kareem had, averaged over their
                              > careers, according to my numbers. Kareem did maintain a great
                              > shooting pct., but Duncan plays in an era of universally-tough D.

                              Olajuwon was very solid offensively (not stellar, like Kareem) -- I
                              didn't mean to imply otherwise. Malone was just the epitome of a good
                              offensive center who wasn't that good defensively. Rik Smits is
                              another example of the poor defensive type who can score (not as well
                              as Olajuwon/Moses). Olajuwon is very DISSIMILAR to these guys because
                              he is much better defensively. Similarity is all I'm trying to
                              capture, not quality.

                              I looked at Duncan's offensive #'s last night and his offensive rating
                              has been between about 104 and 108 since entering the league, when
                              average offensive ratings have been between about 100 and 103. He's a
                              little more efficient than average. My recollection of Kareem's #'s
                              were about 115 in the early '80s, when average was about 106-108 --
                              relatively higher than Duncan's. Again, these two players just don't
                              seem very SIMILAR to me. I would think of David Robinson as more
                              similar to Kareem. Or possibly Olajuwon. Probably Wilt. Not
                              Russell.

                              > >
                              > > Depending on how you define "average", but, yeah, Duncan looks
                              > better
                              > > offensively than he really is because he plays on a great
                              defensive
                              > > team. (He would make most teams better defensively, too.)
                              >
                              > Don't know how a guy 'looks better than he really is', DeanO.
                              >

                              Another way of saying that the hype on Duncan has been a little
                              extreme. Put him on the Hawks last year and, while he's better than
                              Mutombo offensively, the team still wouldn't have scored much. They
                              would have been pretty close to as good defensively as they were with
                              Mutombo (or better), but they wouldn't be an offensive threat. I
                              don't think Kareem ever played on a weak offensive team.

                              > This is fun, splitting hairs!
                              > If your center kicks out 3 nice passes to guards, who only hit one
                              of
                              > the 3 shots, the center only gets one assist.
                              > The guard can make 3 nice passes inside, 2 of which may be
                              converted,
                              > so he gets 2 assists.
                              > So an equally valid argument is that assists from guards
                              > are 'easier', and assists from centers are 'undercounted'.
                              > I say they are equal.
                              >
                              > Perhaps more to the issue, evaluate which players make those
                              > practical passes which may or may not get them an assist, versus
                              > those who will not give up the ball unless it gets them an assist.

                              The goal is to identify when a good pass is made. Generally a better
                              pass is one made to a better shooter. That's all I try to capture. I
                              capture it in formulas with teammate FG%. For years, I didn't worry
                              about it and it really didn't matter much. Now I've got more
                              sophisticated calculation devices. I've actually found that this
                              adjustment makes the most difference when evaluating different levels
                              of basketball (high school, college, women's).

                              Dean Oliver
                              Journal of Basketball Studies
                            Your message has been successfully submitted and would be delivered to recipients shortly.