Loading ...
Sorry, an error occurred while loading the content.

Similarity Scores

Expand Messages
  • Dean Oliver
    The concept of similarity scores is one that Bill James viewed as one of his most important. His introduction to the method in the 86 Abstract: The most
    Message 1 of 16 , Sep 10, 2001
    View Source
    • 0 Attachment
      The concept of similarity scores is one that Bill James viewed as one
      of his most important. His introduction to the method in the '86
      Abstract:

      The most important new method to be introduced this year is that of
      similarity scores. Similarity scores are a way of objectively fixing
      the "degree of resemblance" between two players or between two teams.
      Among all the methods that I have developed over the years, this
      method is the most flexible, the most adaptable, the most useful in
      many different contexts....

      The similarity scores begin with the assumption that players who are
      identical in all respects considered will have a similarity score of
      1000. For each difference between the two, there is a "penalty", or
      reduction from the 1000. Similarity scores are designed so that:

      <500 -- players who would not usually be perceived as being
      essentially similar.

      ~600 -- Slight similarities, but major differences

      ~700 -- Important, easily identifiable similarities but also
      significant and obvious differences

      ~800 -- very prominent, obvious similarities, but easily identifiable
      distinctions.

      >850 -- substantially similar

      >900 -- very similar

      >950 -- rare, indicating that true similarities are emphasized by
      random chance.

      Uses:

      1. When discussing whether or not a player should be elected to the
      Hall of Fame, one of the key questions to focus on -- probably the
      most important question -- is who are the most similar other players
      and are they in the Hall?

      2. How to measure consistency from season to season

      3. How do we measure the accuracy of career projection methods?

      4. How to make career projections by comparing players of similar
      age to others

      5. Salary negotiations

      6. (A baseball specific thing, involving park factors)

      7. Setting control groups for studies

      8. Constructing theoretical models of players/teams and identifying
      real players/teams similar to the model.

      ------------------------------------------

      We started discussing this over in APBR, but I think the details of
      making this work can get technical, so I brought it here. I do think
      this is a major missing factor in basketball and it frustrates me
      that, as easy as it seems to do, I haven't been able to develop
      something like this.

      Baseballreference.com has a list of players and who they are similar
      to -- something that Robert and I have talked about doing for
      basketball eventually. For instance, here is Roberto Alomar

      http://www.baseballreference.com/a/alomaro01.shtml

      Within that page are the list of players similar to him overall, 3 of
      which are in the Hall. When comparing players at age 32 to Alomar,
      the list shows 8 HOFers, the other two being Pete Rose and Ryne
      Sandberg, suggesting that Alomar is on track to be in the Hall (even
      as a batter, since these scores don't account for defense).

      This page

      http://www.baseballreference.com/about/similarity.shtml

      describes how the scores are calculated for baseball career #s. I'd
      think that we could come up with a similar method for basketball.
      The 86 book has the method for comparing seasons.

      One of the problems I had was with redundancy of stats. FG% is
      reflected in FG and FGA, for example. James didn't worry about it
      too much, but I do in basketball.

      To be clear, this is not a rating tool. It doesn't tell you who is
      better or worse; it tells you who is similar. In the old argument of
      Shawn Kemp, perhaps we find that the most similar players to him are
      all out of the HOF -- then that suggests he isn't that great. Maybe
      his best season compares with those put up by Wilt, KMalone, etc.,
      suggesting great seasons.

      Finally, Greg Thomas took a stab at a method for player-careers back
      in the spring Cage Chronicles:

      http://members.aol.com/bradleyrd/feb2001.html

      He did something like MikeT was suggesting, scaling points, etc. by
      some average, then subtracting differences. Kinda interesting and
      not a bad attempt, but some things I'd change/review:

      1. Different scale than specified by James, I think. The scores are
      particularly high (Wilt Chamberlain and Arvydas Sabonis have
      similarity score of 952!!)

      2. Uses only points, assists, and rebounds.

      3. I think he looks at per minute #'s, not career totals.

      4. He standardized by era.

      It's a good first attempt, but I think there is room for improvement.

      Has anyone done anything like this?

      (Another difficulty I've had is in making my Access db do these
      calculations easily.)


      Dean Oliver
      Journal of Basketball Studies
    • Michael K. Tamada
      Just some thoughts off the top of my head: I agree that this is something that ought to be done for basketball. I m a little leery of the work that s been
      Message 2 of 16 , Sep 10, 2001
      View Source
      • 0 Attachment
        Just some thoughts off the top of my head: I agree that this is something
        that ought to be done for basketball. I'm a little leery of the work
        that's been done in baseball, as Bill James' formula looks like a
        hand-cooked one. I'm sure its results are reasonable, but I'd like to see
        an approach that's more systematic.

        There are several statistical techniques which I think are directly useful
        here. Cluster analysis (the computer looks at the data and divides them
        into groups); discriminant analysis and logistical regression (determine
        the factors which predict which group a player will be in, e.g. Hall of
        Fame vs non Hall of Fame); and probably the most useful of all:
        Mahalanobis distance, or some variation thereof.

        Without going deeply into the nitty gritty, here's an intuitive
        description of it: it's easy enough to understand Euclidean distance,
        sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the difference
        between, say, Magic Johnson and Larry Bird in whatever variables we choose
        to look at.

        But there are problems with Euclidean distance, specfically one that
        Dean Oliver alludes to: some variables are redundant or
        partially redundant with each other,
        e.g. FG Made and Points Scored, or even Off Rebds and Def Rebds. Another
        problem is that not all variables are equally important: some probably
        should be given greater weight than others (or maybe not; even something
        like technical fouls might be of interest; maybe the most similar player
        to Rasheed Wallace would turn out to be Charles Barkley thanks to their
        techs).

        Mahalanobis distance corrects for the first problem by measuring the
        extent to which the variables are correlated with each other, and reducing
        their weight in the distance measure. I.e. once you've got FT Made and FG
        Made and 3PT FG made in the measure, you don't want to be adding PTs
        scored as yet another variable with full weight.

        However, Mahalonobis distance does NOT take into account the second
        problem, that certain variables might deserve more weight than others.
        Maybe we WOULD want to "double count" PTs scored, given that it's
        probably the single most important statistic for a player, or at least one
        of the most important ones for distinguishing all-stars and hall of famers
        from journeymen.

        Anyway, that's how I would approach the problem. For a first pass, I'd do
        statistics per game, rather than per 48 minutes, per year, or per career
        (career stats would be useless for comparing, say, Steve Francis to Isiah
        Thomas, because Francis' career stats are still so low). I'd do straight
        Mahalanobis distance at first, throwing in all variables (FTA, FTM, AND
        FT%) and see if the results looked reasonable. If not, then at least I'd
        have some coefficients to start with, and could start doubling or halving
        some.

        Also, after the initial analysis, I'd want to put in some sort of
        correction for era or game pace. Bob Cousy's 43% career FG% (or whatever
        it was, I'm saying this off the top of my head) reminds me more of Isiah
        Thomas's 46% than it does Alan Iverson's 43%. Despite the superficial
        similarity of Cousy's and Iverson's FG%. (Again I'm not vouching for
        those specific numbers, just saying that I'd rather see the numbers in
        context, i.e. corrected for era and/or game pace.)

        For Hall of Fame purposes, I think discriminant analysis or logistic or
        probit regressions are better than merely measuring distance. I did this
        once for NBA all-stars one season, the predictions were not 100% accurate
        but you could at least separate the players into three groups: clear
        all-stars, clear non-stars, and the "on the bubble" players.


        --MKT



        On Tue, 11 Sep 2001, Dean Oliver wrote:

        >
        > The concept of similarity scores is one that Bill James viewed as one
        > of his most important. His introduction to the method in the '86
        > Abstract:
        >
        > The most important new method to be introduced this year is that of
        > similarity scores. Similarity scores are a way of objectively fixing
        > the "degree of resemblance" between two players or between two teams.
        > Among all the methods that I have developed over the years, this
        > method is the most flexible, the most adaptable, the most useful in
        > many different contexts....
        >
        > The similarity scores begin with the assumption that players who are
        > identical in all respects considered will have a similarity score of
        > 1000. For each difference between the two, there is a "penalty", or
        > reduction from the 1000. Similarity scores are designed so that:
        >
        > <500 -- players who would not usually be perceived as being
        > essentially similar.
        >
        > ~600 -- Slight similarities, but major differences
        >
        > ~700 -- Important, easily identifiable similarities but also
        > significant and obvious differences
        >
        > ~800 -- very prominent, obvious similarities, but easily identifiable
        > distinctions.
        >
        > >850 -- substantially similar
        >
        > >900 -- very similar
        >
        > >950 -- rare, indicating that true similarities are emphasized by
        > random chance.
        >
        > Uses:
        >
        > 1. When discussing whether or not a player should be elected to the
        > Hall of Fame, one of the key questions to focus on -- probably the
        > most important question -- is who are the most similar other players
        > and are they in the Hall?
        >
        > 2. How to measure consistency from season to season
        >
        > 3. How do we measure the accuracy of career projection methods?
        >
        > 4. How to make career projections by comparing players of similar
        > age to others
        >
        > 5. Salary negotiations
        >
        > 6. (A baseball specific thing, involving park factors)
        >
        > 7. Setting control groups for studies
        >
        > 8. Constructing theoretical models of players/teams and identifying
        > real players/teams similar to the model.
        >
        > ------------------------------------------
        >
        > We started discussing this over in APBR, but I think the details of
        > making this work can get technical, so I brought it here. I do think
        > this is a major missing factor in basketball and it frustrates me
        > that, as easy as it seems to do, I haven't been able to develop
        > something like this.
        >
        > Baseballreference.com has a list of players and who they are similar
        > to -- something that Robert and I have talked about doing for
        > basketball eventually. For instance, here is Roberto Alomar
        >
        > http://www.baseballreference.com/a/alomaro01.shtml
        >
        > Within that page are the list of players similar to him overall, 3 of
        > which are in the Hall. When comparing players at age 32 to Alomar,
        > the list shows 8 HOFers, the other two being Pete Rose and Ryne
        > Sandberg, suggesting that Alomar is on track to be in the Hall (even
        > as a batter, since these scores don't account for defense).
        >
        > This page
        >
        > http://www.baseballreference.com/about/similarity.shtml
        >
        > describes how the scores are calculated for baseball career #s. I'd
        > think that we could come up with a similar method for basketball.
        > The 86 book has the method for comparing seasons.
        >
        > One of the problems I had was with redundancy of stats. FG% is
        > reflected in FG and FGA, for example. James didn't worry about it
        > too much, but I do in basketball.
        >
        > To be clear, this is not a rating tool. It doesn't tell you who is
        > better or worse; it tells you who is similar. In the old argument of
        > Shawn Kemp, perhaps we find that the most similar players to him are
        > all out of the HOF -- then that suggests he isn't that great. Maybe
        > his best season compares with those put up by Wilt, KMalone, etc.,
        > suggesting great seasons.
        >
        > Finally, Greg Thomas took a stab at a method for player-careers back
        > in the spring Cage Chronicles:
        >
        > http://members.aol.com/bradleyrd/feb2001.html
        >
        > He did something like MikeT was suggesting, scaling points, etc. by
        > some average, then subtracting differences. Kinda interesting and
        > not a bad attempt, but some things I'd change/review:
        >
        > 1. Different scale than specified by James, I think. The scores are
        > particularly high (Wilt Chamberlain and Arvydas Sabonis have
        > similarity score of 952!!)
        >
        > 2. Uses only points, assists, and rebounds.
        >
        > 3. I think he looks at per minute #'s, not career totals.
        >
        > 4. He standardized by era.
        >
        > It's a good first attempt, but I think there is room for improvement.
        >
        > Has anyone done anything like this?
        >
        > (Another difficulty I've had is in making my Access db do these
        > calculations easily.)
        >
        >
        > Dean Oliver
        > Journal of Basketball Studies
        >
        >
        >
        > To unsubscribe from this group, send an email to:
        > APBR_analysis-unsubscribe@yahoogroups.com
        >
        >
        >
        > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
        >
        >
      • harlanzo@yahoo.com
        To me, the problem that most jumps out in triyng to create similarity scores in basketball is that the sum total of a basketball player s contributions are not
        Message 3 of 16 , Sep 10, 2001
        View Source
        • 0 Attachment
          To me, the problem that most jumps out in triyng to create similarity
          scores in basketball is that the sum total of a basketball player's
          contributions are not necessarily reflected in his stats. In
          baseball each hit can be neatly quantified (ie a double is worth .5 a
          single .25 etc.). This being the case a study might still be
          interesting. I would suggest that because of how the game has
          changed that any model should really be limited to a specific era (ie
          since 90-91 season). The problem with cross-era comparisons is
          evident if you take an example of say Rolando Blackman and Allan
          Houston. They both seem like good shooter types who are very good at
          what they do but lacking in some other areas. Indeed, both scored in
          18-20 ppg on average and both had similar assist and rebound numbers
          per game. However, because (I think) the style of play in eras was
          different, Houston has a little lower shooting pct and many more 3s
          (before 1989, Blackman never hit more than 6 threes). It is
          conceivable that if Houston played in the 80s or Blackman in the late
          90s, their numbers in these categories would be similar. Whiel the
          points, assists, rebounds numbers might seem similar superficially,
          the road on which they went to acheive these stats is very different
          and I would think that any model that called them truly similar
          (without era adjustment) is not particularly accurate.

          So the two important issues that jump out at me is deciding what
          areas are pertinent to weight when deciding whether player's are
          similar. The second issue it the cross-era comparison which is a
          whole another thorny issue. I will think about them but no answer
          jumps out to me right this second.



          --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
          > Just some thoughts off the top of my head: I agree that this is
          something
          > that ought to be done for basketball. I'm a little leery of the
          work
          > that's been done in baseball, as Bill James' formula looks like a
          > hand-cooked one. I'm sure its results are reasonable, but I'd like
          to see
          > an approach that's more systematic.
          >
          > There are several statistical techniques which I think are directly
          useful
          > here. Cluster analysis (the computer looks at the data and divides
          them
          > into groups); discriminant analysis and logistical regression
          (determine
          > the factors which predict which group a player will be in, e.g.
          Hall of
          > Fame vs non Hall of Fame); and probably the most useful of all:
          > Mahalanobis distance, or some variation thereof.
          >
          > Without going deeply into the nitty gritty, here's an intuitive
          > description of it: it's easy enough to understand Euclidean
          distance,
          > sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the difference
          > between, say, Magic Johnson and Larry Bird in whatever variables we
          choose
          > to look at.
          >
          > But there are problems with Euclidean distance, specfically one that
          > Dean Oliver alludes to: some variables are redundant or
          > partially redundant with each other,
          > e.g. FG Made and Points Scored, or even Off Rebds and Def Rebds.
          Another
          > problem is that not all variables are equally important: some
          probably
          > should be given greater weight than others (or maybe not; even
          something
          > like technical fouls might be of interest; maybe the most similar
          player
          > to Rasheed Wallace would turn out to be Charles Barkley thanks to
          their
          > techs).
          >
          > Mahalanobis distance corrects for the first problem by measuring
          the
          > extent to which the variables are correlated with each other, and
          reducing
          > their weight in the distance measure. I.e. once you've got FT Made
          and FG
          > Made and 3PT FG made in the measure, you don't want to be adding PTs
          > scored as yet another variable with full weight.
          >
          > However, Mahalonobis distance does NOT take into account the second
          > problem, that certain variables might deserve more weight than
          others.
          > Maybe we WOULD want to "double count" PTs scored, given that it's
          > probably the single most important statistic for a player, or at
          least one
          > of the most important ones for distinguishing all-stars and hall of
          famers
          > from journeymen.
          >
          > Anyway, that's how I would approach the problem. For a first pass,
          I'd do
          > statistics per game, rather than per 48 minutes, per year, or per
          career
          > (career stats would be useless for comparing, say, Steve Francis to
          Isiah
          > Thomas, because Francis' career stats are still so low). I'd do
          straight
          > Mahalanobis distance at first, throwing in all variables (FTA, FTM,
          AND
          > FT%) and see if the results looked reasonable. If not, then at
          least I'd
          > have some coefficients to start with, and could start doubling or
          halving
          > some.
          >
          > Also, after the initial analysis, I'd want to put in some sort of
          > correction for era or game pace. Bob Cousy's 43% career FG% (or
          whatever
          > it was, I'm saying this off the top of my head) reminds me more of
          Isiah
          > Thomas's 46% than it does Alan Iverson's 43%. Despite the
          superficial
          > similarity of Cousy's and Iverson's FG%. (Again I'm not vouching
          for
          > those specific numbers, just saying that I'd rather see the numbers
          in
          > context, i.e. corrected for era and/or game pace.)
          >
          > For Hall of Fame purposes, I think discriminant analysis or
          logistic or
          > probit regressions are better than merely measuring distance. I
          did this
          > once for NBA all-stars one season, the predictions were not 100%
          accurate
          > but you could at least separate the players into three groups:
          clear
          > all-stars, clear non-stars, and the "on the bubble" players.
          >
          >
          > --MKT
          >
          >
          >
          > On Tue, 11 Sep 2001, Dean Oliver wrote:
          >
          > >
          > > The concept of similarity scores is one that Bill James viewed as
          one
          > > of his most important. His introduction to the method in the '86
          > > Abstract:
          > >
          > > The most important new method to be introduced this year is that
          of
          > > similarity scores. Similarity scores are a way of objectively
          fixing
          > > the "degree of resemblance" between two players or between two
          teams.
          > > Among all the methods that I have developed over the years, this
          > > method is the most flexible, the most adaptable, the most useful
          in
          > > many different contexts....
          > >
          > > The similarity scores begin with the assumption that players who
          are
          > > identical in all respects considered will have a similarity score
          of
          > > 1000. For each difference between the two, there is a "penalty",
          or
          > > reduction from the 1000. Similarity scores are designed so that:
          > >
          > > <500 -- players who would not usually be perceived as being
          > > essentially similar.
          > >
          > > ~600 -- Slight similarities, but major differences
          > >
          > > ~700 -- Important, easily identifiable similarities but also
          > > significant and obvious differences
          > >
          > > ~800 -- very prominent, obvious similarities, but easily
          identifiable
          > > distinctions.
          > >
          > > >850 -- substantially similar
          > >
          > > >900 -- very similar
          > >
          > > >950 -- rare, indicating that true similarities are emphasized by
          > > random chance.
          > >
          > > Uses:
          > >
          > > 1. When discussing whether or not a player should be elected to
          the
          > > Hall of Fame, one of the key questions to focus on -- probably
          the
          > > most important question -- is who are the most similar other
          players
          > > and are they in the Hall?
          > >
          > > 2. How to measure consistency from season to season
          > >
          > > 3. How do we measure the accuracy of career projection methods?
          > >
          > > 4. How to make career projections by comparing players of
          similar
          > > age to others
          > >
          > > 5. Salary negotiations
          > >
          > > 6. (A baseball specific thing, involving park factors)
          > >
          > > 7. Setting control groups for studies
          > >
          > > 8. Constructing theoretical models of players/teams and
          identifying
          > > real players/teams similar to the model.
          > >
          > > ------------------------------------------
          > >
          > > We started discussing this over in APBR, but I think the details
          of
          > > making this work can get technical, so I brought it here. I do
          think
          > > this is a major missing factor in basketball and it frustrates me
          > > that, as easy as it seems to do, I haven't been able to develop
          > > something like this.
          > >
          > > Baseballreference.com has a list of players and who they are
          similar
          > > to -- something that Robert and I have talked about doing for
          > > basketball eventually. For instance, here is Roberto Alomar
          > >
          > > http://www.baseballreference.com/a/alomaro01.shtml
          > >
          > > Within that page are the list of players similar to him overall,
          3 of
          > > which are in the Hall. When comparing players at age 32 to
          Alomar,
          > > the list shows 8 HOFers, the other two being Pete Rose and Ryne
          > > Sandberg, suggesting that Alomar is on track to be in the Hall
          (even
          > > as a batter, since these scores don't account for defense).
          > >
          > > This page
          > >
          > > http://www.baseballreference.com/about/similarity.shtml
          > >
          > > describes how the scores are calculated for baseball career #s.
          I'd
          > > think that we could come up with a similar method for
          basketball.
          > > The 86 book has the method for comparing seasons.
          > >
          > > One of the problems I had was with redundancy of stats. FG% is
          > > reflected in FG and FGA, for example. James didn't worry about
          it
          > > too much, but I do in basketball.
          > >
          > > To be clear, this is not a rating tool. It doesn't tell you who
          is
          > > better or worse; it tells you who is similar. In the old
          argument of
          > > Shawn Kemp, perhaps we find that the most similar players to him
          are
          > > all out of the HOF -- then that suggests he isn't that great.
          Maybe
          > > his best season compares with those put up by Wilt, KMalone,
          etc.,
          > > suggesting great seasons.
          > >
          > > Finally, Greg Thomas took a stab at a method for player-careers
          back
          > > in the spring Cage Chronicles:
          > >
          > > http://members.aol.com/bradleyrd/feb2001.html
          > >
          > > He did something like MikeT was suggesting, scaling points, etc.
          by
          > > some average, then subtracting differences. Kinda interesting
          and
          > > not a bad attempt, but some things I'd change/review:
          > >
          > > 1. Different scale than specified by James, I think. The scores
          are
          > > particularly high (Wilt Chamberlain and Arvydas Sabonis have
          > > similarity score of 952!!)
          > >
          > > 2. Uses only points, assists, and rebounds.
          > >
          > > 3. I think he looks at per minute #'s, not career totals.
          > >
          > > 4. He standardized by era.
          > >
          > > It's a good first attempt, but I think there is room for
          improvement.
          > >
          > > Has anyone done anything like this?
          > >
          > > (Another difficulty I've had is in making my Access db do these
          > > calculations easily.)
          > >
          > >
          > > Dean Oliver
          > > Journal of Basketball Studies
          > >
          > >
          > >
          > > To unsubscribe from this group, send an email to:
          > > APBR_analysis-unsubscribe@y...
          > >
          > >
          > >
          > > Your use of Yahoo! Groups is subject to
          http://docs.yahoo.com/info/terms/
          > >
          > >
        • Dean Oliver
          ... something ... work ... to see ... I ve asked Bill about the approach he took and why he took it. I don t expect to hear back anytime soon. ... useful ...
          Message 4 of 16 , Sep 10, 2001
          View Source
          • 0 Attachment
            --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
            > Just some thoughts off the top of my head: I agree that this is
            something
            > that ought to be done for basketball. I'm a little leery of the
            work
            > that's been done in baseball, as Bill James' formula looks like a
            > hand-cooked one. I'm sure its results are reasonable, but I'd like
            to see
            > an approach that's more systematic.
            >

            I've asked Bill about the approach he took and why he took it. I
            don't expect to hear back anytime soon.

            > There are several statistical techniques which I think are directly
            useful
            > here. Cluster analysis (the computer looks at the data and divides
            them
            > into groups); discriminant analysis and logistical regression
            (determine
            > the factors which predict which group a player will be in, e.g.
            Hall of
            > Fame vs non Hall of Fame); and probably the most useful of all:
            > Mahalanobis distance, or some variation thereof.

            > However, Mahalonobis distance does NOT take into account the second
            > problem, that certain variables might deserve more weight than
            others.
            > Maybe we WOULD want to "double count" PTs scored, given that it's
            > probably the single most important statistic for a player, or at
            least one
            > of the most important ones for distinguishing all-stars and hall of
            famers
            > from journeymen.

            First -- I don't know of Mahalonobis distance stuff. Sounds like
            multivariate regression, though. You may have to do this analysis or
            point me at software that does it. What is it going to spit out?
            Weights on the different stats?

            >
            > Anyway, that's how I would approach the problem. For a first pass,
            I'd do
            > statistics per game, rather than per 48 minutes, per year, or per
            career
            > (career stats would be useless for comparing, say, Steve Francis to
            Isiah
            > Thomas, because Francis' career stats are still so low).

            We could compare Thomas' first 2 years to Francis', a very useful
            comparison.

            >I'd do
            straight
            > Mahalanobis distance at first, throwing in all variables (FTA, FTM,
            AND
            > FT%) and see if the results looked reasonable. If not, then at
            least I'd
            > have some coefficients to start with, and could start doubling or
            halving
            > some.
            >

            I agree, I think (not knowing exactly what the coefficients mean).
            Getting a common starting point is the most important thing I want to
            get out of this discussion. Similarity scores are ultimately
            somewhat subjective, but if we can all start with the same set of
            numbers, at least we have a foundation.

            > Also, after the initial analysis, I'd want to put in some sort of
            > correction for era or game pace. Bob Cousy's 43% career FG% (or
            whatever
            > it was, I'm saying this off the top of my head) reminds me more of
            Isiah
            > Thomas's 46% than it does Alan Iverson's 43%. Despite the
            superficial
            > similarity of Cousy's and Iverson's FG%. (Again I'm not vouching
            for
            > those specific numbers, just saying that I'd rather see the numbers
            in
            > context, i.e. corrected for era and/or game pace.)
            >

            I think we want to keep the era correction separate. I can't find
            where he said it, but I know James wanted to keep it separate.

            My one attempt at basketball similarity scores is buried somewhere.
            I looked at player-season comparisons back in '98. The motivation
            was identifying who was similar to Kobe Bryant, since there was so
            much controversy at the time about how good he was going to be.
            I remember finding a lot more self-similarity across players than
            cross-similarity (Bryant's 2nd year resembled his first more than
            it resembled a lot of other players' seasons, for example). The
            player who seemed most similar to Bryant at the time was Allen
            Iverson, but my #'s were weird. I'd say now that Bryant and Iverson
            aren't as similar as Bryant and Jordan or Bryant and VCarter, which I
            think harks at factors I didn't consider -- height and position. (It
            may also hark at false impressions. Jordan's early career numbers
            are MUCH better than Bryant's, even if you account for the 10% drop
            in pace and the 5-8% drop in offensive efficiency.)

            For kicks, here is Iverson 99 and Bryant 2001:

            GS MIN FG FGA FG% fg3m fg3a fg3%
            Iverson 1.0 41.5 9.1 22.0 0.412 1.2 4.1 0.291
            Bryant 1.0 40.9 10.3 22.2 0.464 0.9 2.9 0.305

            FT FTA FT% OR DR TR AST PF DQ
            7.4 9.9 0.751 1.4 3.5 4.9 4.6 2.0 0.0
            7.0 8.2 0.853 1.5 4.3 5.9 5.0 3.3 0.0

            STL TO BLK PTS
            2.29 3.48 0.15 26.8
            1.68 3.24 0.63 28.5

            A priori, I'd like to call these two seasons in the 700-750 range on
            similarity scores. Easily identifiable similarities, but significant
            and obvious differences.

            For my #'s, I have (in per game stats) for Iverson and Bryant, resp:

            Defensive Stops Def. Net
            ScPoss Poss Fl% Ortg PtsProd /Min /Poss Rtg. Win%
            12.8 25.0 0.511 106.6 26.7 0.182 0.484 97.3 0.820
            12.8 24.4 0.524 110.9 27.0 0.191 0.494 103.0 0.773

            At about the same age, Bryant's offensive skills are more efficient
            than Iverson's, which we would probably all agree on. Defensively,
            it's hard to say.

            Dean Oliver
            Journal of Basketball Studies
          • Dean Oliver
            ... This being the case a study might still be ... (ie ... I think we should start here, but not limit it this way. Your example of Houston and Blackman is
            Message 5 of 16 , Sep 10, 2001
            View Source
            • 0 Attachment
              --- In APBR_analysis@y..., harlanzo@y... wrote:
              This being the case a study might still be
              > interesting. I would suggest that because of how the game has
              > changed that any model should really be limited to a specific era
              (ie
              > since 90-91 season).

              I think we should start here, but not limit it this way.

              Your example of Houston and Blackman is interesting. I think players
              like Blackman did evolve into players like Houston, but their styles
              were/are different. There weren't many Houston-types in the '80's.
              We want to show that. In some cases, we may want to hide that, but
              we don't want to hide it all the time. It points out the problem we
              always have -- that players from the '60's are more similar to
              themselves than they are to today's players. As much as we might
              like to compare Bob Pettit to Karl Malone, I'm sure more players in
              the '50's-'60's are similar to Pettit than Malone is.

              I would call the similarity (on a per game basis) between Houston and
              Blackman about an 800, just gut feel. Career-wise, Houston has a
              ways to go to get to Blackman's level. Even at age 30, it appears
              that Blackman had a bit better numbers.

              Here is the list of comparisons done for the newsletter with the
              original scores assigned and some of my subjective scores:

              Orig MyEst Players
              990 850 Isiah Thomas & Tim Hardaway
              986 800 Julius Erving & Elgin Baylor
              985 Mark Aguirre & Alex English
              984 850 Patrick Ewing & Alonzo Mourning
              976 Kareem Abdul-Jabbar & Bob Pettit
              976 800 David Robinson & Tim Duncan
              969 Willis Reed & Walt Bellamy
              968 850 Reggie Miller & Allan Houston
              968 Kevin Johnson & Stephon Marbury
              967 Oscar Robertson & Sam Cassell
              967 800 Bill Russell & Wes Unseld
              964 750 Karl Malone & Kareem Abdul-Jabaar
              964 Kareem Abdul-Jabbar & Bob Lanier
              963 David Robinson & Hakeem Olajuwon
              961 Isiah Thomas & Kevin Johnson
              959 Jo Jo White & Hal Greer
              958 Jerry West & Pete Maravich
              955 Walt Frazier & Penny Hardaway
              952 600 Wilt Chamberlain & Arvydas Sabonis
              951 Dominique Wilkins & John Drew
              949 Vince Carter & Kobe Bryant
              949 800 Isiah Thomas & Stephon Marbury
              943 Larry Bird & Chris Webber
              943 Kobe Bryant & Alan Iverson
              942 Rick Barry & John Havlicek
              938 Karl Malone & David Robinson
              938 Bill Laimbeer & Dikembe Mutombo
              935 Jerry West & Paul Westphal
              930 Larry Bird & Billy Cunningham
              929 Kareem Abdul-Jabbar & Charles Barkley
              929 Charles Barkley & Kareem Abdul-Jabaar
              927 Walt Frazier & Gary Payton
              927 Karl Malone and Bob Petit
              924 Vince Carter & Alan Iverson
              922 Grant Hill & Elgin Baylor
              919 Bill Russell & Bill Walton
              918 Shaquille O'Neal & David Robinson
              917 Wilt Chamberlain & Kareem Abdul-Jabaar
              914 George Gervin & David Thompson
              903 Wilt Chamberlain & David Robinson
              902 Larry Bird & Elgin Baylor
              897 750 Jason Kidd & Magic Johnson
              888 Shaquille O'Neal & Hakeem Olajuwon
              887 Michael Jordan & Alan Iverson
              885 Charles Barkley & Karl Malone
              884 Michael Jordan & Vince Carter
              882 John Stockton & Larry Brown
              875 Jerry West & Oscar Robertson
              858 850 Shaquille O'Neal & Wilt Chamberlain
              852 800 Oscar Robertson & Magic Johnson
              848 750 Michael Jordan & Kobe Bryant
              830 750 Michael Jordan & Julius Erving
              263 Shaquille O'Neal & John Stockton
            • Michael K. Tamada
              ... [...] ... The weights are not scalar, but are instead implicitly contained in a matrix. I usually use a stat package called SPSS but I just checked and
              Message 6 of 16 , Sep 12, 2001
              View Source
              • 0 Attachment
                On Tue, 11 Sep 2001, Dean Oliver wrote:

                > --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:

                [...]

                > First -- I don't know of Mahalonobis distance stuff. Sounds like
                > multivariate regression, though. You may have to do this analysis or
                > point me at software that does it. What is it going to spit out?
                > Weights on the different stats?

                The weights are not scalar, but are instead implicitly contained in a
                matrix. I usually use a stat package called SPSS but I just checked and
                surprisingly, although Mahalanobis distance is calculated and used in a
                number of statistics that it calculates, it doesn't have a command for
                simply computing good ol' Mahalanobis distance.

                However, the formula for Mahalanobis distance is pretty simple. Let x and
                y be vectors of the variables that we're measuring, for two different
                players. E.g. "x" might be Magic's pts per game, assists per game, FG%,
                asst/TO ratio, min/game, etc. etc etc. "y" would be the same stats, but
                for Larry Bird.

                Let S stand for the covariance matrix of all players' stats. (For an
                example of how to calculate the elements of the covariance matrix, see

                http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm

                ).


                Then the Mahalanobis distance is simply (x-y)S^-1(x-y) in matrix
                notation. (The "x-y" are vectors, and S^-1 is the inverse of S.)

                Here's a web-page with some other distance metrics:

                http://www.mathworks.com/access/helpdesk/help/toolbox/stats/pdist.shtml


                However all of these view all the variables as being essentially equally
                important. Hence the possible need for weighting. Or the use of some
                outside rating system or external criteria (e.g. Hall of Fame vs non Hall
                of Fame status, and we could use discriminant analysis or logistic
                regression to calculate the coefficients for predicting HoF status).


                [...career comparisons]

                >We could compare Thomas' first 2 years to Francis', a very useful
                >comparison.

                Yes, good point. Although if we're doing career totals, we'd presumably
                still want a correction for 82-game seasons vs 72-game seasons.

                > I think we want to keep the era correction separate. I can't find
                > where he said it, but I know James wanted to keep it separate.

                Yes, probably best done, as you suggest elsewhere, by having two sets of
                similarity stats: "absolute" and "relative" (or "corrected", or
                "standardized" or whatever we want to call them).


                --MKT
              • Mike Goodman
                ... Excellent move, Dean ... This is one reason I have concentrated on combining all scoring- related data into one scoring ability number. It seems quite
                Message 7 of 16 , Sep 12, 2001
                View Source
                • 0 Attachment
                  --- In APBR_analysis@y..., "Dean Oliver" <deano@t...> wrote:
                  > We started discussing this over in APBR, but I think the details of
                  > making this work can get technical, so I brought it here.

                  Excellent move, Dean
                  >
                  > One of the problems I had was with redundancy of stats. FG% is
                  > reflected in FG and FGA, for example. James didn't worry about it
                  > too much, but I do in basketball.
                  >
                  This is one reason I have concentrated on combining all scoring-
                  related data into one "scoring ability" number. It seems quite clear
                  to me that "points is points", and likewise, attempts are attempts
                  (or possessions used up). Thus the "scoring efficiency", which I
                  believe is also a term used in another way, and which implies to me
                  that it includes turnovers incurred while attempting to score,
                  offensive fouls, and the "ability to get a shot off"; so I might
                  prefer to call Pts/(Attempts*2) something like "scoring percentage".
                  I also feel comfortable with using a player's ScoPct/.527
                  (historical standard ScoPct) as a number to factor into a player's
                  points-per-minute rate. I justify this by noting that a high-
                  scoring, low-percent scorer on a weak team would just have to shoot
                  less (and take higher-percentage shots) on a better team.
                  Conversely, a low-scoring, high-percentage shooter on a good team
                  would almost certainly be asked to take more shots on a weaker team.
                  Generally, his percentage would go down, but possibly his "scoring
                  ability" number would be fairly constant as he moves from team to
                  team.
                  Ty Corbin had such a career spell, as he went from a go-to guy on
                  the woeful Wolves, to a contributor on the contending Jazz; his
                  minutes and ppg went rollercoastering, but his measurable 'scoring
                  ability' was pretty constant.

                  >....In the old argument
                  of
                  > Shawn Kemp, perhaps we find that the most similar players to him
                  are
                  > all out of the HOF -- then that suggests he isn't that great.
                  Maybe
                  > his best season compares with those put up by Wilt, KMalone, etc.,
                  > suggesting great seasons.
                  >
                  In one member's standardized numbers, Kemp's career 'abilities'
                  are :
                  21 pts, 12 reb, 2 ast, 2 blk. This compares to Artis Gilmore, Moses
                  Malone. But many fewer minutes for Kemp, and lesser totals.
                • Mike Goodman
                  ... ....discriminant analysis and logistical regression .... Euclidean distance, ... distance does NOT take into account the second ... others. ... A friend of
                  Message 8 of 16 , Sep 12, 2001
                  View Source
                  • 0 Attachment
                    --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
                    ....discriminant analysis and logistical regression .... Euclidean
                    distance,
                    > sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the difference
                    > between, say, Magic Johnson and Larry Bird ....., Mahalonobis
                    distance does NOT take into account the second
                    > problem, that certain variables might deserve more weight than
                    others.
                    >
                    A friend of mine says "Anyone who drives faster than me is a fukkin
                    maniac, and anyone who drives slower is a goddamn asshole".
                    Similarly, I say, anyone who uses less math than me is some kind of
                    moron, and whoever uses more must be some kind of geek.

                    > Also, after the initial analysis, I'd want to put in some sort of
                    > correction for era or game pace. Bob Cousy's 43% career FG% (or
                    whatever
                    > it was, I'm saying this off the top of my head) reminds me more of
                    Isiah
                    > Thomas's 46% than it does Alan Iverson's 43%. Despite the
                    superficial
                    > similarity of Cousy's and Iverson's FG%. (Again I'm not vouching
                    for
                    > those specific numbers, just saying that I'd rather see the numbers
                    in
                    > context, i.e. corrected for era and/or game pace.)

                    Cousy never once managed to make 40% of his FG during a season; his
                    career scoring pct. was .440. (Iverson's is .500; Isiah's was .508).
                    >
                    > For Hall of Fame purposes, I think discriminant analysis or
                    logistic or
                    > probit regressions are better than merely measuring distance. I
                    did this
                    > once for NBA all-stars one season, the predictions were not 100%
                    accurate
                    > but you could at least separate the players into three groups:
                    clear
                    > all-stars, clear non-stars, and the "on the bubble" players.
                    >
                    >
                    > --MKT
                    >
                    >
                    Last season, the West selected my top 11 Western players to the
                    allstar team, but skipped #12 Nowitzki in favor of teammate Michael
                    Finley (#30 or thereabouts).
                    Meanwhile the East seemed to pick at random, ignoring most forwards
                    as they had ignored all point guards the year before.
                  • Mike Goodman
                    ... choose ... Another ... probably ... I tried my hand at a variation of the Euclidian distance, since I can understand the formula (and pronounce it, too). I
                    Message 9 of 16 , Sep 14, 2001
                    View Source
                    • 0 Attachment
                      --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
                      >.... Euclidean distance,
                      > sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the difference
                      > between, say, Magic Johnson and Larry Bird in whatever variables we
                      choose
                      > to look at.
                      >
                      > But there are problems with Euclidean distance, specfically one that
                      > Dean Oliver alludes to: some variables are redundant or
                      > partially redundant with each other,
                      > e.g. FG Made and Points Scored, or even Off Rebds and Def Rebds.
                      Another
                      > problem is that not all variables are equally important: some
                      probably
                      > should be given greater weight than others ...

                      I tried my hand at a variation of the Euclidian distance, since I can
                      understand the formula (and pronounce it, too).
                      I took 5 stats: scoring, rebounding, assists, steals, blocks. I used
                      my normalized (standardized) versions. Because points are much more
                      abundant than, say, steals, I reduced this difference by taking the
                      square root of each stat. I compared the top 31 players on my
                      infamous "alltime" list to the other 514 in the list. (I actually
                      ran out of columns in Excel, for the first time.)
                      The formula is drudgery to type, but it starts like this:
                      E = (sqrt(a1)-sqrt(b1))^2 + (sqrt(a2)-sqrt(b2))^2 +... and so on, up
                      to a5 and b5, for players a and b, and variables 1-5.
                      I did not take the square root of the whole thing, since everything
                      was already square-rooted once.
                      Not surprisingly, the best players only correspond to other great
                      players, but some players have much more unique statistical profiles.
                      In order of "greatest distance from the next-closest profile", we
                      have:
                      Sco Reb Ast Stl Blk E
                      Michael Jordan 33.5 6.5 5.1 2.3 .9
                      Jerry West 25.1 4.2 6.0 (2.7 .9) .945 (estimated)
                      No real surprise that Jordan is the "most unique" statistically.
                      Others scored more than West, but didn't have quality numbers beyond
                      that.
                      (Iverson is next, then Karl Malone(!), Kobe, Gervin, Erving, Bird,
                      Wilkins, Dantley, Barry)

                      Bill Russell 11.8 14.6 3.8 (1.5 4.0)
                      Bill Walton 15.9 12.8 4.0 1.0 2.7 .743
                      Really not very similar, but as close as anyone comes to Russell's
                      combination of skills.
                      (Thurmond is close 2nd, then Sam Lacey, Elmore Smith, Mutombo)

                      Magic Johnson 20.6 7.5 10.4 1.9 .4
                      Oscar Robertson 22.4 5.3 8.0 (1.5 .3) .644
                      Magic was "the next Oscar", and then some.
                      (Grant Hill, Payton, Penny, Strickland, Isiah, Drexler, KJ, Frazier)

                      John Stockton 17.1 3.3 11.9 2.4 .2
                      Isiah Thomas 18.0 3.7 8.8 2.0 .3 .543
                      Stockton is just a giant in the assists category.
                      (Tim Hardaway, KJ, Strickland, Cousy, Kenny Anderson, Brandon)

                      Jerry West 25.1 4.2 6.0 (2.7 .9)
                      Allen Iverson 25.1 3.9 5.5 2.1 .2 .517
                      Now we have some real across-the-board similarity.
                      (Barry, Penny, Kobe, Drexler, Maravich, Oscar, Westphal)

                      Oscar Robertson 22.4 5.3 8.0 (1.5 .3)
                      Penny Hardaway 20.2 5.1 6.2 1.9 .6 .486
                      (KJ, Payton, Frazier, Cassell, Tim Hardaway, Price, Brandon, Magic)

                      Moses Malone 21.6 13.2 1.3 .9 1.4
                      Shawn Kemp 20.9 11.8 2.2 1.4 1.6 .470
                      (Parish, Gilmore, Reed, McDyess, Ewing, Hayes, Haywood, McAdoo)

                      Shaquille O'Neal 29.7 12.7 2.8 .7 2.6
                      Tim Duncan 25.1 12.0 3.0 .8 2.3 .466
                      (Kareem, Robinson, Mikan, Pettit, Ewing, Mourning, Wilt, Hakeem)

                      Artis Gilmore 20.3 11.9 2.3 .6 2.3
                      Patrick Ewing 23.5 11.1 2.0 1.0 2.6 .446
                      (Hayes, Parish, Derrick Coleman, Sabonis, McDyess, Kemp, Gallatin)

                      The remainder of the top 31 (and their closest match)

                      Kareem AbdulJab. 25.9 10.6 3.4 1.0 2.7
                      Tim Duncan 25.1 12.0 3.0 .8 2.3 .288
                      (Robinson, Pettit, Mikan, Ewing, Neil Johnston, Shaq, Hakeem)

                      Wilt Chamberlain 23.5 14.7 3.5 (1.5 3.0)
                      George Mikan 24.8 13.1 2.9 (1.3 2.0) .432
                      (Hakeem, Robinson, Duncan, Pettit, Kareem, Ewing)

                      Karl Malone 28.1 11.2 3.4 1.4 .8
                      Charles Barkley 24.2 12.4 3.8 1.6 .8 .444
                      (Pettit, Johnston, Mikan, Baylor, Jeff Ruland, Bird, Duncan, McAdoo)

                      Hakeem Olajuwon 23.7 11.7 2.6 1.8 3.2
                      David Robinson 26.1 11.8 2.8 1.5 3.3 .275

                      Julius Erving 23.0 7.8 4.0 1.9 1.7
                      Elgin Baylor 22.5 9.6 3.9 (1.6 1.5) .347
                      (Webber, Marques Johnson, Shareef, Johnston, Lanier, Ed Macauley,
                      Schayes, Garnett, Bird, Drexler)

                      Patrick Ewing 23.5 11.1 2.0 1.0 2.6
                      Alonzo Mourning 24.5 10.9 1.6 .7 3.2 .332

                      Bob Pettit 24.2 11.7 2.8 (1.3 1.8)
                      George Mikan 24.8 13.1 2.9 (1.3 2.0) .231

                      Elgin Baylor 22.5 9.6 3.9 (1.6 1.5)
                      Chris Webber 21.1 10.1 4.2 1.5 1.8 .215
                      (Lanier, Erving, Schayes, Johnston, Shareef, Garnett, Pettit, McAdoo)

                      Scottie Pippen 18.4 7.5 5.4 2.1 .9
                      Clyde Drexler 20.6 6.7 5.5 2.1 .7 .306
                      (Alvan Adams, Connie Hawkins, Toni Kukoc, Billy C., Grant Hill,
                      Antoine Walker, Marques Johnson, Penny, Cliff Hagan)

                      Clyde-Scottie likewise

                      Robert Parish 18.1 11.4 1.5 .9 1.8
                      Elvin Hayes 17.8 10.9 1.7 1.0 2.6 .161
                      (Gallatin, McDyess, Seikaly, Reed, Larry Foust, Dan Roundfield,
                      Sampson, Haywood, Brian Grant)

                      Bob Lanier 21.4 10.5 3.3 1.2 1.7
                      Dolph Schayes 20.0 10.1 3.1 (1.4 1.6) .194

                      (Elvin Hayes-Robert Parish match)

                      Rick Barry 21.9 5.5 4.5 2.1 .5
                      Kobe Bryant 23.0 5.2 4.2 1.4 .8 .345
                      (Chris Mullin, Drexler, Hagan, Moncrief, Penny, Ray Allen)

                      Kevin McHale 22.1 8.6 1.8 .4 2.0
                      Rik Smits 19.9 8.3 1.8 .6 1.6 .306
                      (Lovellete, Darryl Dawkins, Haywood, McAdoo, Yardley, McDyess)

                      (George Mikan-Bob Pettit)

                      Dan Issel 21.1 8.5 2.2 1.1 .6
                      Terry Cummings 19.1 9.3 2.2 1.3 .7 .280
                      (Chambers, Ceballos, Calvin Natt, Shareef, Yardley, Glenn Robinson)

                      Clearly, as one goes down the list into more "ordinary" players,
                      there is a proliferation of close profiles.


                      Mike Goodman

                      > >
                      > >
                    • harlanzo@yahoo.com
                      It occurred to me that when comparing players through their statistics should we be weighting the comparisons so that some statistics are more important based
                      Message 10 of 16 , Sep 15, 2001
                      View Source
                      • 0 Attachment
                        It occurred to me that when comparing players through their
                        statistics should we be weighting the comparisons so that some
                        statistics are more important based on positions? For example, when
                        comparing point guards the assist category might be more important
                        for weighing similarity than rebound category. Conversely, do we
                        really care whether two centers have similar assist numbers if their
                        points, rebounds, and fg % are similar? I think this sounds somewhat
                        right with some notable exceptions. The counter argument of course
                        is that centers who pass well (a la Walton) or shoot 3s well
                        (Laimbeer and Sikma) are unique and the similarity scores will help
                        identify players with similar rare skill sets. (To digress, I wonder
                        if Jason Kidd and some of the Darrell Walker early 90s seasons are
                        comparable). I am beginning to babble but I think that the question
                        I am asking is whether positional demands should change how we weight
                        statistical categories when we try to apply similarity scores?


                        --- In APBR_analysis@y..., "Mike Goodman" <msg_53@h...> wrote:
                        > --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
                        > >.... Euclidean distance,
                        > > sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the
                        difference
                        > > between, say, Magic Johnson and Larry Bird in whatever variables
                        we
                        > choose
                        > > to look at.
                        > >
                        > > But there are problems with Euclidean distance, specfically one
                        that
                        > > Dean Oliver alludes to: some variables are redundant or
                        > > partially redundant with each other,
                        > > e.g. FG Made and Points Scored, or even Off Rebds and Def Rebds.
                        > Another
                        > > problem is that not all variables are equally important: some
                        > probably
                        > > should be given greater weight than others ...
                        >
                        > I tried my hand at a variation of the Euclidian distance, since I
                        can
                        > understand the formula (and pronounce it, too).
                        > I took 5 stats: scoring, rebounding, assists, steals, blocks. I
                        used
                        > my normalized (standardized) versions. Because points are much
                        more
                        > abundant than, say, steals, I reduced this difference by taking the
                        > square root of each stat. I compared the top 31 players on my
                        > infamous "alltime" list to the other 514 in the list. (I actually
                        > ran out of columns in Excel, for the first time.)
                        > The formula is drudgery to type, but it starts like this:
                        > E = (sqrt(a1)-sqrt(b1))^2 + (sqrt(a2)-sqrt(b2))^2 +... and so on,
                        up
                        > to a5 and b5, for players a and b, and variables 1-5.
                        > I did not take the square root of the whole thing, since everything
                        > was already square-rooted once.
                        > Not surprisingly, the best players only correspond to other great
                        > players, but some players have much more unique statistical
                        profiles.
                        > In order of "greatest distance from the next-closest profile", we
                        > have:
                        > Sco Reb Ast Stl Blk E
                        > Michael Jordan 33.5 6.5 5.1 2.3 .9
                        > Jerry West 25.1 4.2 6.0 (2.7 .9) .945 (estimated)
                        > No real surprise that Jordan is the "most unique" statistically.
                        > Others scored more than West, but didn't have quality numbers
                        beyond
                        > that.
                        > (Iverson is next, then Karl Malone(!), Kobe, Gervin, Erving, Bird,
                        > Wilkins, Dantley, Barry)
                        >
                        > Bill Russell 11.8 14.6 3.8 (1.5 4.0)
                        > Bill Walton 15.9 12.8 4.0 1.0 2.7 .743
                        > Really not very similar, but as close as anyone comes to Russell's
                        > combination of skills.
                        > (Thurmond is close 2nd, then Sam Lacey, Elmore Smith, Mutombo)
                        >
                        > Magic Johnson 20.6 7.5 10.4 1.9 .4
                        > Oscar Robertson 22.4 5.3 8.0 (1.5 .3) .644
                        > Magic was "the next Oscar", and then some.
                        > (Grant Hill, Payton, Penny, Strickland, Isiah, Drexler, KJ, Frazier)
                        >
                        > John Stockton 17.1 3.3 11.9 2.4 .2
                        > Isiah Thomas 18.0 3.7 8.8 2.0 .3 .543
                        > Stockton is just a giant in the assists category.
                        > (Tim Hardaway, KJ, Strickland, Cousy, Kenny Anderson, Brandon)
                        >
                        > Jerry West 25.1 4.2 6.0 (2.7 .9)
                        > Allen Iverson 25.1 3.9 5.5 2.1 .2 .517
                        > Now we have some real across-the-board similarity.
                        > (Barry, Penny, Kobe, Drexler, Maravich, Oscar, Westphal)
                        >
                        > Oscar Robertson 22.4 5.3 8.0 (1.5 .3)
                        > Penny Hardaway 20.2 5.1 6.2 1.9 .6 .486
                        > (KJ, Payton, Frazier, Cassell, Tim Hardaway, Price, Brandon, Magic)
                        >
                        > Moses Malone 21.6 13.2 1.3 .9 1.4
                        > Shawn Kemp 20.9 11.8 2.2 1.4 1.6 .470
                        > (Parish, Gilmore, Reed, McDyess, Ewing, Hayes, Haywood, McAdoo)
                        >
                        > Shaquille O'Neal 29.7 12.7 2.8 .7 2.6
                        > Tim Duncan 25.1 12.0 3.0 .8 2.3 .466
                        > (Kareem, Robinson, Mikan, Pettit, Ewing, Mourning, Wilt, Hakeem)
                        >
                        > Artis Gilmore 20.3 11.9 2.3 .6 2.3
                        > Patrick Ewing 23.5 11.1 2.0 1.0 2.6 .446
                        > (Hayes, Parish, Derrick Coleman, Sabonis, McDyess, Kemp, Gallatin)
                        >
                        > The remainder of the top 31 (and their closest match)
                        >
                        > Kareem AbdulJab. 25.9 10.6 3.4 1.0 2.7
                        > Tim Duncan 25.1 12.0 3.0 .8 2.3 .288
                        > (Robinson, Pettit, Mikan, Ewing, Neil Johnston, Shaq, Hakeem)
                        >
                        > Wilt Chamberlain 23.5 14.7 3.5 (1.5 3.0)
                        > George Mikan 24.8 13.1 2.9 (1.3 2.0) .432
                        > (Hakeem, Robinson, Duncan, Pettit, Kareem, Ewing)
                        >
                        > Karl Malone 28.1 11.2 3.4 1.4 .8
                        > Charles Barkley 24.2 12.4 3.8 1.6 .8 .444
                        > (Pettit, Johnston, Mikan, Baylor, Jeff Ruland, Bird, Duncan, McAdoo)
                        >
                        > Hakeem Olajuwon 23.7 11.7 2.6 1.8 3.2
                        > David Robinson 26.1 11.8 2.8 1.5 3.3 .275
                        >
                        > Julius Erving 23.0 7.8 4.0 1.9 1.7
                        > Elgin Baylor 22.5 9.6 3.9 (1.6 1.5) .347
                        > (Webber, Marques Johnson, Shareef, Johnston, Lanier, Ed Macauley,
                        > Schayes, Garnett, Bird, Drexler)
                        >
                        > Patrick Ewing 23.5 11.1 2.0 1.0 2.6
                        > Alonzo Mourning 24.5 10.9 1.6 .7 3.2 .332
                        >
                        > Bob Pettit 24.2 11.7 2.8 (1.3 1.8)
                        > George Mikan 24.8 13.1 2.9 (1.3 2.0) .231
                        >
                        > Elgin Baylor 22.5 9.6 3.9 (1.6 1.5)
                        > Chris Webber 21.1 10.1 4.2 1.5 1.8 .215
                        > (Lanier, Erving, Schayes, Johnston, Shareef, Garnett, Pettit,
                        McAdoo)
                        >
                        > Scottie Pippen 18.4 7.5 5.4 2.1 .9
                        > Clyde Drexler 20.6 6.7 5.5 2.1 .7 .306
                        > (Alvan Adams, Connie Hawkins, Toni Kukoc, Billy C., Grant Hill,
                        > Antoine Walker, Marques Johnson, Penny, Cliff Hagan)
                        >
                        > Clyde-Scottie likewise
                        >
                        > Robert Parish 18.1 11.4 1.5 .9 1.8
                        > Elvin Hayes 17.8 10.9 1.7 1.0 2.6 .161
                        > (Gallatin, McDyess, Seikaly, Reed, Larry Foust, Dan Roundfield,
                        > Sampson, Haywood, Brian Grant)
                        >
                        > Bob Lanier 21.4 10.5 3.3 1.2 1.7
                        > Dolph Schayes 20.0 10.1 3.1 (1.4 1.6) .194
                        >
                        > (Elvin Hayes-Robert Parish match)
                        >
                        > Rick Barry 21.9 5.5 4.5 2.1 .5
                        > Kobe Bryant 23.0 5.2 4.2 1.4 .8 .345
                        > (Chris Mullin, Drexler, Hagan, Moncrief, Penny, Ray Allen)
                        >
                        > Kevin McHale 22.1 8.6 1.8 .4 2.0
                        > Rik Smits 19.9 8.3 1.8 .6 1.6 .306
                        > (Lovellete, Darryl Dawkins, Haywood, McAdoo, Yardley, McDyess)
                        >
                        > (George Mikan-Bob Pettit)
                        >
                        > Dan Issel 21.1 8.5 2.2 1.1 .6
                        > Terry Cummings 19.1 9.3 2.2 1.3 .7 .280
                        > (Chambers, Ceballos, Calvin Natt, Shareef, Yardley, Glenn Robinson)
                        >
                        > Clearly, as one goes down the list into more "ordinary" players,
                        > there is a proliferation of close profiles.
                        >
                        >
                        > Mike Goodman
                        >
                        > > >
                        > > >
                      • deano@tsoft.com
                        ... Yes and No. What we re trying to come up with here is a general set of rules that can be applied at default (as a basis for studies, that can be
                        Message 11 of 16 , Sep 16, 2001
                        View Source
                        • 0 Attachment
                          --- In APBR_analysis@y..., harlanzo@y... wrote:
                          > It occurred to me that when comparing players through their
                          > statistics should we be weighting the comparisons so that some
                          > statistics are more important based on positions?

                          Yes and No. What we're trying to come up with here is a general set
                          of rules that can be applied at default (as a basis for studies,
                          that can be modified). James always said that the method's blessing
                          and curse was its flexibility. We SHOULD modify it for specific
                          comparisons -- perhaps among point guards. There will always be a
                          lot of different versions around, but we want one set for general
                          comparisons, in part because, using your example, we can't
                          necessarily identify who point guards are.

                          I also thought of a reason not to use Euclidean distance -- it
                          weights big differences too much. At least that is the subjective
                          opinion a lot of times. It's the old argument between standard
                          deviation and mean absolute difference -- the first weights big
                          differences a lot but is mathematically easier, but the second seems
                          to reflect more of what we want. The similarity scores, as James did
                          them and as I modified them, fit into the mean absolute difference
                          category. In Mike's categories, then, this implies that there is
                          likely one very big difference between Jordan's numbers and everyone
                          else (probably scoring average) -- that gets emphasized, making him
                          the most unique player. I'd like to take a stab at career similarity
                          scores using the approach I've outlined to see whether it id's Jordan
                          as most unique, too.

                          MikeG -- While I like the comparisons you did, there are 2 comments I
                          would make:

                          1. I'd like to see some non-standardized comparisons. I do like the
                          standardized because they make some sense, but I think
                          non-standardized will also tell a story.

                          2. You really need some comparison of shooting percentages and
                          turnovers. It really caught my eye with the Duncan-Kareem
                          comparison. I see some similarity between these two, but there are
                          big differences in offensive efficiency. Kareem was nearly
                          unstoppable offensively - my floor%'s and offensive efficiencies
                          reflect that. Duncan is very stoppable, his offensive rating and
                          floor percentage blending in to be about average. Kareem fell to
                          average offensively only in his last year. (I also don't think that
                          Kareem was the defensive force that Duncan is, but my memories are
                          biased by the Kareem post-'80, when he wasn't as good as he was when
                          younger.)

                          Dean Oliver
                          Journal of Basketball Studies


                          > For example,
                          when
                          > comparing point guards the assist category might be more important
                          > for weighing similarity than rebound category. Conversely, do we
                          > really care whether two centers have similar assist numbers if
                          their
                          > points, rebounds, and fg % are similar? I think this sounds
                          somewhat
                          > right with some notable exceptions. The counter argument of course
                          > is that centers who pass well (a la Walton) or shoot 3s well
                          > (Laimbeer and Sikma) are unique and the similarity scores will help
                          > identify players with similar rare skill sets. (To digress, I
                          wonder
                          > if Jason Kidd and some of the Darrell Walker early 90s seasons are
                          > comparable). I am beginning to babble but I think that the
                          question
                          > I am asking is whether positional demands should change how we
                          weight
                          > statistical categories when we try to apply similarity scores?
                          >
                          >
                          > --- In APBR_analysis@y..., "Mike Goodman" <msg_53@h...> wrote:
                          > > --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...>
                          wrote:
                          > > >.... Euclidean distance,
                          > > > sqrt( X^2 + Y^2 + Z^2 + ...) where X, Y, Z, etc. are the
                          > difference
                          > > > between, say, Magic Johnson and Larry Bird in whatever
                          variables
                          > we
                          > > choose
                          > > > to look at.
                          > > >
                          > > > But there are problems with Euclidean distance, specfically one
                          > that
                          > > > Dean Oliver alludes to: some variables are redundant or
                          > > > partially redundant with each other,
                          > > > e.g. FG Made and Points Scored, or even Off Rebds and Def
                          Rebds.
                          > > Another
                          > > > problem is that not all variables are equally important: some
                          > > probably
                          > > > should be given greater weight than others ...
                          > >
                          > > I tried my hand at a variation of the Euclidian distance, since I
                          > can
                          > > understand the formula (and pronounce it, too).
                          > > I took 5 stats: scoring, rebounding, assists, steals, blocks. I
                          > used
                          > > my normalized (standardized) versions. Because points are much
                          > more
                          > > abundant than, say, steals, I reduced this difference by taking
                          the
                          > > square root of each stat. I compared the top 31 players on my
                          > > infamous "alltime" list to the other 514 in the list. (I
                          actually
                          > > ran out of columns in Excel, for the first time.)
                          > > The formula is drudgery to type, but it starts like this:
                          > > E = (sqrt(a1)-sqrt(b1))^2 + (sqrt(a2)-sqrt(b2))^2 +... and so on,
                          > up
                          > > to a5 and b5, for players a and b, and variables 1-5.
                          > > I did not take the square root of the whole thing, since
                          everything
                          > > was already square-rooted once.
                          > > Not surprisingly, the best players only correspond to other great
                          > > players, but some players have much more unique statistical
                          > profiles.
                          > > In order of "greatest distance from the next-closest profile", we
                          > > have:
                          > > Sco Reb Ast Stl Blk E
                          > > Michael Jordan 33.5 6.5 5.1 2.3 .9
                          > > Jerry West 25.1 4.2 6.0 (2.7 .9) .945 (estimated)
                          > > No real surprise that Jordan is the "most unique" statistically.

                          > > Others scored more than West, but didn't have quality numbers
                          > beyond
                          > > that.
                          > > (Iverson is next, then Karl Malone(!), Kobe, Gervin, Erving,
                          Bird,
                          > > Wilkins, Dantley, Barry)
                          > >
                          > > Bill Russell 11.8 14.6 3.8 (1.5 4.0)
                          > > Bill Walton 15.9 12.8 4.0 1.0 2.7 .743
                          > > Really not very similar, but as close as anyone comes to
                          Russell's
                          > > combination of skills.
                          > > (Thurmond is close 2nd, then Sam Lacey, Elmore Smith, Mutombo)
                          > >
                          > > Magic Johnson 20.6 7.5 10.4 1.9 .4
                          > > Oscar Robertson 22.4 5.3 8.0 (1.5 .3) .644
                          > > Magic was "the next Oscar", and then some.
                          > > (Grant Hill, Payton, Penny, Strickland, Isiah, Drexler, KJ,
                          Frazier)
                          > >
                          > > John Stockton 17.1 3.3 11.9 2.4 .2
                          > > Isiah Thomas 18.0 3.7 8.8 2.0 .3 .543
                          > > Stockton is just a giant in the assists category.
                          > > (Tim Hardaway, KJ, Strickland, Cousy, Kenny Anderson, Brandon)
                          > >
                          > > Jerry West 25.1 4.2 6.0 (2.7 .9)
                          > > Allen Iverson 25.1 3.9 5.5 2.1 .2 .517
                          > > Now we have some real across-the-board similarity.
                          > > (Barry, Penny, Kobe, Drexler, Maravich, Oscar, Westphal)
                          > >
                          > > Oscar Robertson 22.4 5.3 8.0 (1.5 .3)
                          > > Penny Hardaway 20.2 5.1 6.2 1.9 .6 .486
                          > > (KJ, Payton, Frazier, Cassell, Tim Hardaway, Price, Brandon,
                          Magic)
                          > >
                          > > Moses Malone 21.6 13.2 1.3 .9 1.4
                          > > Shawn Kemp 20.9 11.8 2.2 1.4 1.6 .470
                          > > (Parish, Gilmore, Reed, McDyess, Ewing, Hayes, Haywood, McAdoo)
                          > >
                          > > Shaquille O'Neal 29.7 12.7 2.8 .7 2.6
                          > > Tim Duncan 25.1 12.0 3.0 .8 2.3 .466
                          > > (Kareem, Robinson, Mikan, Pettit, Ewing, Mourning, Wilt, Hakeem)
                          > >
                          > > Artis Gilmore 20.3 11.9 2.3 .6 2.3
                          > > Patrick Ewing 23.5 11.1 2.0 1.0 2.6 .446
                          > > (Hayes, Parish, Derrick Coleman, Sabonis, McDyess, Kemp,
                          Gallatin)
                          > >
                          > > The remainder of the top 31 (and their closest match)
                          > >
                          > > Kareem AbdulJab. 25.9 10.6 3.4 1.0 2.7
                          > > Tim Duncan 25.1 12.0 3.0 .8 2.3 .288
                          > > (Robinson, Pettit, Mikan, Ewing, Neil Johnston, Shaq, Hakeem)
                          > >
                          > > Wilt Chamberlain 23.5 14.7 3.5 (1.5 3.0)
                          > > George Mikan 24.8 13.1 2.9 (1.3 2.0) .432
                          > > (Hakeem, Robinson, Duncan, Pettit, Kareem, Ewing)
                          > >
                          > > Karl Malone 28.1 11.2 3.4 1.4 .8
                          > > Charles Barkley 24.2 12.4 3.8 1.6 .8 .444
                          > > (Pettit, Johnston, Mikan, Baylor, Jeff Ruland, Bird, Duncan,
                          McAdoo)
                          > >
                          > > Hakeem Olajuwon 23.7 11.7 2.6 1.8 3.2
                          > > David Robinson 26.1 11.8 2.8 1.5 3.3 .275
                          > >
                          > > Julius Erving 23.0 7.8 4.0 1.9 1.7
                          > > Elgin Baylor 22.5 9.6 3.9 (1.6 1.5) .347
                          > > (Webber, Marques Johnson, Shareef, Johnston, Lanier, Ed Macauley,
                          > > Schayes, Garnett, Bird, Drexler)
                          > >
                          > > Patrick Ewing 23.5 11.1 2.0 1.0 2.6
                          > > Alonzo Mourning 24.5 10.9 1.6 .7 3.2 .332
                          > >
                          > > Bob Pettit 24.2 11.7 2.8 (1.3 1.8)
                          > > George Mikan 24.8 13.1 2.9 (1.3 2.0) .231
                          > >
                          > > Elgin Baylor 22.5 9.6 3.9 (1.6 1.5)
                          > > Chris Webber 21.1 10.1 4.2 1.5 1.8 .215
                          > > (Lanier, Erving, Schayes, Johnston, Shareef, Garnett, Pettit,
                          > McAdoo)
                          > >
                          > > Scottie Pippen 18.4 7.5 5.4 2.1 .9
                          > > Clyde Drexler 20.6 6.7 5.5 2.1 .7 .306
                          > > (Alvan Adams, Connie Hawkins, Toni Kukoc, Billy C., Grant Hill,
                          > > Antoine Walker, Marques Johnson, Penny, Cliff Hagan)
                          > >
                          > > Clyde-Scottie likewise
                          > >
                          > > Robert Parish 18.1 11.4 1.5 .9 1.8
                          > > Elvin Hayes 17.8 10.9 1.7 1.0 2.6 .161
                          > > (Gallatin, McDyess, Seikaly, Reed, Larry Foust, Dan Roundfield,
                          > > Sampson, Haywood, Brian Grant)
                          > >
                          > > Bob Lanier 21.4 10.5 3.3 1.2 1.7
                          > > Dolph Schayes 20.0 10.1 3.1 (1.4 1.6) .194
                          > >
                          > > (Elvin Hayes-Robert Parish match)
                          > >
                          > > Rick Barry 21.9 5.5 4.5 2.1 .5
                          > > Kobe Bryant 23.0 5.2 4.2 1.4 .8 .345
                          > > (Chris Mullin, Drexler, Hagan, Moncrief, Penny, Ray Allen)
                          > >
                          > > Kevin McHale 22.1 8.6 1.8 .4 2.0
                          > > Rik Smits 19.9 8.3 1.8 .6 1.6 .306
                          > > (Lovellete, Darryl Dawkins, Haywood, McAdoo, Yardley, McDyess)
                          > >
                          > > (George Mikan-Bob Pettit)
                          > >
                          > > Dan Issel 21.1 8.5 2.2 1.1 .6
                          > > Terry Cummings 19.1 9.3 2.2 1.3 .7 .280
                          > > (Chambers, Ceballos, Calvin Natt, Shareef, Yardley, Glenn
                          Robinson)
                          > >
                          > > Clearly, as one goes down the list into more "ordinary" players,
                          > > there is a proliferation of close profiles.
                          > >
                          > >
                          > > Mike Goodman
                          > >
                          > > > >
                          > > > >
                        • msg_53@hotmail.com
                          Personally, I don t ever consider position to be a quantifiable statistic. Many forwards have been forced to play center; many forwards are not clearly
                          Message 12 of 16 , Sep 16, 2001
                          View Source
                          • 0 Attachment
                            Personally, I don't ever consider 'position' to be a quantifiable
                            statistic. Many forwards have been forced to play center; many
                            forwards are not clearly 'power' or 'small' forwards; many players
                            are not exclusively guards or forwards; many versatile guards do
                            plenty of scoring and passing, and rebounding.
                            The possible fragmenting of these lists is virtually infinite. An
                            assist from a center is exactly as important as an assist from a
                            guard. A rebounding guard, a center who gets steals as well as
                            blocks, all these things make a player unique, or at least
                            differentiate him from the norm.
                            The issue of 3-point shooting might be worth looking into. How one
                            goes about racking up one's scoring totals is of some interest. Then
                            again, it might invite breaking down points into dunks, layups, etc.
                            In the end, points are points. A player's scoring may come from
                            inside moves when he is young, and from outside shots later. The
                            contribution is still the same.
                            One thing these similarity indexes do reveal, is that there are
                            some 'classic' profiles by position. Wilt, Kareem, Hakeem, Shaq,
                            Robinson, Ewing, Moses, Gilmore, all averaged 22-28 pts, 12-15 reb, 2-
                            3 blocks. But the well-rounded centers seem to have enjoyed more
                            success.
                            The demands of one's position are somewhat situational. The best
                            players can usually do whatever is most needed.

                            --- In APBR_analysis@y..., harlanzo@y... wrote:
                            > It occurred to me that when comparing players through their
                            > statistics should we be weighting the comparisons so that some
                            > statistics are more important based on positions? For example,
                            when
                            > comparing point guards the assist category might be more important
                            > for weighing similarity than rebound category. Conversely, do we
                            > really care whether two centers have similar assist numbers if
                            their
                            > points, rebounds, and fg % are similar? I think this sounds
                            somewhat
                            > right with some notable exceptions. The counter argument of course
                            > is that centers who pass well (a la Walton) or shoot 3s well
                            > (Laimbeer and Sikma) are unique and the similarity scores will help
                            > identify players with similar rare skill sets. (To digress, I
                            wonder
                            > if Jason Kidd and some of the Darrell Walker early 90s seasons are
                            > comparable). I am beginning to babble but I think that the
                            question
                            > I am asking is whether positional demands should change how we
                            weight
                            > statistical categories when we try to apply similarity scores?
                            >
                            >
                          • msg_53@hotmail.com
                            ... seems ... I operate under the assumption that points and rebounds are equally important as contributions; so are steals and blocks, but almost everyone
                            Message 13 of 16 , Sep 16, 2001
                            View Source
                            • 0 Attachment
                              --- In APBR_analysis@y..., deano@t... wrote:
                              >..... a reason not to use Euclidean distance -- it
                              > weights big differences too much. At least that is the subjective
                              > opinion a lot of times. It's the old argument between standard
                              > deviation and mean absolute difference -- the first weights big
                              > differences a lot but is mathematically easier, but the second
                              seems
                              > to reflect more of what we want.

                              I operate under the assumption that points and rebounds are equally
                              important as contributions; so are steals and blocks, but almost
                              everyone gets fewer than 2-3 of these, so it seems fair to weigh them
                              less. Taking the standard deviation from the mean gives you the
                              burden of assigning a weight to the statistical category. I avoid
                              this by presuming that bigger numbers implies bigger weights. That
                              is, scoring is and should be more important than, say, steals.
                              (I did reduce the 'difference' factor by taking their square roots.)

                              > The similarity scores, as James did
                              > them and as I modified them, fit into the mean absolute difference
                              > category. In Mike's categories, then, this implies that there is
                              > likely one very big difference between Jordan's numbers and
                              everyone
                              > else (probably scoring average) -- that gets emphasized, making him
                              > the most unique player. I'd like to take a stab at career
                              similarity
                              > scores using the approach I've outlined to see whether it id's
                              Jordan
                              > as most unique, too.
                              >
                              > MikeG -- While I like the comparisons you did, there are 2 comments
                              I
                              > would make:
                              >
                              > 1. I'd like to see some non-standardized comparisons. I do like
                              the
                              > standardized because they make some sense, but I think
                              > non-standardized will also tell a story.

                              Dean, you could do raw averages, but players from the 60s would only
                              compare to players in the 60s. Actually, a great rebounder in the
                              90s would seem to compare to an average rebounder in the 60s, for
                              example.
                              I don't have a ready database of raw averages.

                              > 2. You really need some comparison of shooting percentages and
                              > turnovers. It really caught my eye with the Duncan-Kareem
                              > comparison. I see some similarity between these two, but there are
                              > big differences in offensive efficiency. Kareem was nearly
                              > unstoppable offensively - my floor%'s and offensive efficiencies
                              > reflect that. Duncan is very stoppable, his offensive rating and
                              > floor percentage blending in to be about average. Kareem fell to
                              > average offensively only in his last year. (I also don't think
                              that
                              > Kareem was the defensive force that Duncan is, but my memories are
                              > biased by the Kareem post-'80, when he wasn't as good as he was
                              when
                              > younger.)
                              >
                              > Dean Oliver
                              > Journal of Basketball Studies

                              Shooting percentages are part of what determines my standardized
                              scoring rate, along with game pace (defined as points allowed). I
                              only did career totals, so Kareem's incredibly long career has been
                              smoothed over, and his very dominant early seasons are not truly
                              reflected. Maybe Duncan has peaked, and his career averages really
                              won't rank close to Kareem's.
                              Further, Duncan's offensive numbers, in my system, get a big boost
                              from his being on a great defensive team. You have to agree his
                              offensive strength is way above average on his team. In other words,
                              the go-to guy on the championship Spurs is going to rate favorably to
                              the go-to guy on the champion Bucks from 30 years before, in my
                              system.

                              Mike Goodman
                              >
                              >
                              > > > > >
                            • Dean Oliver
                              ... only ... I think this is what I was interested in. I was curious who from today would fit in the 60 s. Or, more interestingly, who from the 70 s might
                              Message 14 of 16 , Sep 17, 2001
                              View Source
                              • 0 Attachment
                                --- In APBR_analysis@y..., msg_53@h... wrote:
                                > > 1. I'd like to see some non-standardized comparisons. I do like
                                > the
                                > > standardized because they make some sense, but I think
                                > > non-standardized will also tell a story.
                                >
                                > Dean, you could do raw averages, but players from the 60s would
                                only
                                > compare to players in the 60s. Actually, a great rebounder in the
                                > 90s would seem to compare to an average rebounder in the 60s, for
                                > example.
                                > I don't have a ready database of raw averages.
                                >

                                I think this is what I was interested in. I was curious who from
                                today would fit in the '60's. Or, more interestingly, who from the
                                '70's might fit in today's game. Are West's raw #'s similar to
                                Iverson's or to Richmond's? What happens in baseball is that
                                outstanding players tend to be dissimilar to other players in their
                                era, but similar to outstanding players of other eras. I have doubt
                                that this would happen in basketball, using raw #'s, because of the
                                style change. You seem to be saying the same thing.

                                (I didn't realize that you don't have a db of raw#'s.)

                                > > 2. You really need some comparison of shooting percentages and
                                > > turnovers. It really caught my eye with the Duncan-Kareem
                                > > comparison. I see some similarity between these two, but there
                                are
                                > > big differences in offensive efficiency. Kareem was nearly
                                > > unstoppable offensively - my floor%'s and offensive efficiencies
                                > > reflect that. Duncan is very stoppable, his offensive rating and
                                > > floor percentage blending in to be about average. Kareem fell to
                                > > average offensively only in his last year. (I also don't think
                                > that
                                > > Kareem was the defensive force that Duncan is, but my memories
                                are
                                > > biased by the Kareem post-'80, when he wasn't as good as he was
                                > when
                                > > younger.)
                                >
                                > Shooting percentages are part of what determines my standardized
                                > scoring rate, along with game pace (defined as points allowed). I
                                > only did career totals, so Kareem's incredibly long career has been
                                > smoothed over, and his very dominant early seasons are not truly
                                > reflected.

                                One of my personal quibbles with all the tendex-like rating systems
                                out there is there is that they do combine offensive with defensive
                                contributions. There is a big difference in my mind between Moses
                                Malone, who was an offensive force, and Hakeem Olajuwon, who has been
                                dominant defensively. Both were good in the other thing, but
                                dominant in just one. Kareem was dominant offensively (and probably
                                defensively) early on. Duncan has been dominant defensively, not
                                offensively. (Duncan appears to have more of the competitive fight
                                than Kareem, but, again, I missed the early Kareem.)

                                > Maybe Duncan has peaked, and his career averages really
                                > won't rank close to Kareem's.

                                I don't think I'd say that Duncan's peaked. He's been pretty
                                remarkably consistent since entering the league. Maybe it's only
                                remarkable that he stayed in school long enough to actually be ready
                                for the league when entering.

                                > Further, Duncan's offensive numbers, in my system, get a big boost
                                > from his being on a great defensive team. You have to agree his
                                > offensive strength is way above average on his team.

                                Depending on how you define "average", but, yeah, Duncan looks better
                                offensively than he really is because he plays on a great defensive
                                team. (He would make most teams better defensively, too.)

                                > Personally, I don't ever consider 'position' to be a quantifiable
                                > statistic.

                                James defined numbers to positions for defensive purposes (a
                                shortstop is much more valuable to a defense than a 1st baseman, for
                                example). That might be necessary for some of the older guys because
                                defensive stats really don't exist in the '60's and early '70's. But
                                we can probably still assume that a center was the most important
                                defensive player back then, as he is now. This gets adequately
                                reflected in blocks, steals, and defensive boards, but you do need
                                those #'s.

                                > assist from a center is exactly as important as an assist from a
                                > guard.

                                Only a minor point here -- this is not precisely true (though
                                probably true enough for government work). Assists from guards tend
                                to be more valuable. This is because they often have to make the
                                tougher pass than big men. The weight on an assist is proportional
                                to the expected FG% of the guy he passes to. Historically, big men
                                have had higher FG% than guards -- hence their assists are weighted
                                less. (The assists of the best shooting player on a team are less
                                valuable than the assists of the guys getting him the ball.) This
                                has changed with the 3 pt shot, but it's a conversion from FG% to
                                effective FG%...

                                Dean Oliver
                                Journal of Basketball Studies
                              • Mike Goodman
                                ... My raw totals and per-game averages are contained in my season files, along with team totals and averages for that season. My composite lists only have
                                Message 15 of 16 , Sep 18, 2001
                                View Source
                                • 0 Attachment
                                  --- In APBR_analysis@y..., "Dean Oliver" <deano@t...> wrote:
                                  > (I didn't realize that you don't have a db of raw#'s.)
                                  >
                                  My raw totals and per-game averages are contained in my 'season'
                                  files, along with team totals and averages for that season. My
                                  composite lists only have the 'standardized' rates. From those
                                  rates, I can generate 'equivalent totals'. For 'average'
                                  scoring/rebounding teams, these would be equal to raw season totals.

                                  >
                                  > One of my personal quibbles with all the tendex-like rating systems
                                  > out there is there is that they do combine offensive with defensive
                                  > contributions. There is a big difference in my mind between Moses
                                  > Malone, who was an offensive force, and Hakeem Olajuwon, who has
                                  been
                                  > dominant defensively. Both were good in the other thing, but
                                  > dominant in just one. Kareem was dominant offensively (and
                                  probably
                                  > defensively) early on. Duncan has been dominant defensively, not
                                  > offensively. (Duncan appears to have more of the competitive fight
                                  > than Kareem, but, again, I missed the early Kareem.)

                                  I get your point, Dean, but your examples don't seem the clearest.
                                  Olajuwan is better than Malone because he has all the offense Malone
                                  had PLUS defense. Never seen the Dream shake?
                                  Duncan has virtually all the offense Kareem had, averaged over their
                                  careers, according to my numbers. Kareem did maintain a great
                                  shooting pct., but Duncan plays in an era of universally-tough D.

                                  > I don't think I'd say that Duncan's peaked. He's been pretty
                                  > remarkably consistent since entering the league. Maybe it's only
                                  > remarkable that he stayed in school long enough to actually be
                                  ready
                                  > for the league when entering.

                                  Some guys enter the league at full strength: Wilt, Oscar, Kareem,
                                  Robinson, never improved beyond their first 3 years. Others start as
                                  near- superstars, then several years along suddenly shift into true
                                  superstar mode: Magic, Bird, Olajuwon, ...

                                  >
                                  > Depending on how you define "average", but, yeah, Duncan looks
                                  better
                                  > offensively than he really is because he plays on a great defensive
                                  > team. (He would make most teams better defensively, too.)

                                  Don't know how a guy 'looks better than he really is', DeanO.

                                  >Assists from guards tend
                                  > to be more valuable. This is because they often have to make the
                                  > tougher pass than big men. The weight on an assist is proportional
                                  > to the expected FG% of the guy he passes to. Historically, big men
                                  > have had higher FG% than guards -- hence their assists are weighted
                                  > less. (The assists of the best shooting player on a team are less
                                  > valuable than the assists of the guys getting him the ball.) This
                                  > has changed with the 3 pt shot, but it's a conversion from FG% to
                                  > effective FG%...
                                  >
                                  > Dean Oliver
                                  > Journal of Basketball Studies

                                  This is fun, splitting hairs!
                                  If your center kicks out 3 nice passes to guards, who only hit one of
                                  the 3 shots, the center only gets one assist.
                                  The guard can make 3 nice passes inside, 2 of which may be converted,
                                  so he gets 2 assists.
                                  So an equally valid argument is that assists from guards
                                  are 'easier', and assists from centers are 'undercounted'.
                                  I say they are equal.

                                  Perhaps more to the issue, evaluate which players make those
                                  practical passes which may or may not get them an assist, versus
                                  those who will not give up the ball unless it gets them an assist. I
                                  can't discern the 2 types from the statistics, but I know it when I
                                  see it. (It might be partly discernible in that old assist/turnover
                                  ratio.)


                                  Mike Goodman
                                • Dean Oliver
                                  ... systems ... defensive ... fight ... Olajuwon was very solid offensively (not stellar, like Kareem) -- I didn t mean to imply otherwise. Malone was just
                                  Message 16 of 16 , Sep 18, 2001
                                  View Source
                                  • 0 Attachment
                                    --- In APBR_analysis@y..., "Mike Goodman" <msg_53@h...> wrote:
                                    > > One of my personal quibbles with all the tendex-like rating
                                    systems
                                    > > out there is there is that they do combine offensive with
                                    defensive
                                    > > contributions. There is a big difference in my mind between Moses
                                    > > Malone, who was an offensive force, and Hakeem Olajuwon, who has
                                    > been
                                    > > dominant defensively. Both were good in the other thing, but
                                    > > dominant in just one. Kareem was dominant offensively (and
                                    > probably
                                    > > defensively) early on. Duncan has been dominant defensively, not
                                    > > offensively. (Duncan appears to have more of the competitive
                                    fight
                                    > > than Kareem, but, again, I missed the early Kareem.)
                                    >
                                    > I get your point, Dean, but your examples don't seem the clearest.
                                    > Olajuwan is better than Malone because he has all the offense Malone
                                    > had PLUS defense. Never seen the Dream shake?
                                    > Duncan has virtually all the offense Kareem had, averaged over their
                                    > careers, according to my numbers. Kareem did maintain a great
                                    > shooting pct., but Duncan plays in an era of universally-tough D.

                                    Olajuwon was very solid offensively (not stellar, like Kareem) -- I
                                    didn't mean to imply otherwise. Malone was just the epitome of a good
                                    offensive center who wasn't that good defensively. Rik Smits is
                                    another example of the poor defensive type who can score (not as well
                                    as Olajuwon/Moses). Olajuwon is very DISSIMILAR to these guys because
                                    he is much better defensively. Similarity is all I'm trying to
                                    capture, not quality.

                                    I looked at Duncan's offensive #'s last night and his offensive rating
                                    has been between about 104 and 108 since entering the league, when
                                    average offensive ratings have been between about 100 and 103. He's a
                                    little more efficient than average. My recollection of Kareem's #'s
                                    were about 115 in the early '80s, when average was about 106-108 --
                                    relatively higher than Duncan's. Again, these two players just don't
                                    seem very SIMILAR to me. I would think of David Robinson as more
                                    similar to Kareem. Or possibly Olajuwon. Probably Wilt. Not
                                    Russell.

                                    > >
                                    > > Depending on how you define "average", but, yeah, Duncan looks
                                    > better
                                    > > offensively than he really is because he plays on a great
                                    defensive
                                    > > team. (He would make most teams better defensively, too.)
                                    >
                                    > Don't know how a guy 'looks better than he really is', DeanO.
                                    >

                                    Another way of saying that the hype on Duncan has been a little
                                    extreme. Put him on the Hawks last year and, while he's better than
                                    Mutombo offensively, the team still wouldn't have scored much. They
                                    would have been pretty close to as good defensively as they were with
                                    Mutombo (or better), but they wouldn't be an offensive threat. I
                                    don't think Kareem ever played on a weak offensive team.

                                    > This is fun, splitting hairs!
                                    > If your center kicks out 3 nice passes to guards, who only hit one
                                    of
                                    > the 3 shots, the center only gets one assist.
                                    > The guard can make 3 nice passes inside, 2 of which may be
                                    converted,
                                    > so he gets 2 assists.
                                    > So an equally valid argument is that assists from guards
                                    > are 'easier', and assists from centers are 'undercounted'.
                                    > I say they are equal.
                                    >
                                    > Perhaps more to the issue, evaluate which players make those
                                    > practical passes which may or may not get them an assist, versus
                                    > those who will not give up the ball unless it gets them an assist.

                                    The goal is to identify when a good pass is made. Generally a better
                                    pass is one made to a better shooter. That's all I try to capture. I
                                    capture it in formulas with teammate FG%. For years, I didn't worry
                                    about it and it really didn't matter much. Now I've got more
                                    sophisticated calculation devices. I've actually found that this
                                    adjustment makes the most difference when evaluating different levels
                                    of basketball (high school, college, women's).

                                    Dean Oliver
                                    Journal of Basketball Studies
                                  Your message has been successfully submitted and would be delivered to recipients shortly.