- Mar 14, 2003On Fri, 14 Mar 2003, Michael Tamada wrote:

> -----Original Message-----

It was just a "game time decision", so to speak. Blocks stop a shot that

> From: igorkupfer@... [mailto:igorkupfer@...]

> Sent: Friday, March 14, 2003 2:41 PM

>

>

> >From: "John W. Craven" <john1974@...>

>

> [...]

>

> >> Okay, time for me to enter the fray here. I'm going to use a stat that I

> >> just made up and for that reason will call Craven Scores. Basically, they

> >> use the same methodology as Bill James Similarity Scores, meaning that for

> >> every game you begin with a score of 1000 and deduct the following number

> >> of points for every difference (positive OR negative) in numbers from a

> >> player's seasonal average:

> >>

> >> FG made: 2

> >> 3FG made: 1

> >> FT made: 1

> >> Rebounds: 1

> >> Assists: 1

> >> Blocks: 2

> >> Steals: 1

> >> Turnovers: 1

>

> One thing that bugs me about the similarity scores that have sprung up for basketball the past year or so: I haven't seen any theoretical nor statistical foundation for them...well there's a bit of a statistical foundation for MikeG's Euclidean distance, but I've got a couple of concerns about that metric as well.

>

> The one above has double weight for FGM. Reasonable enough, but why double weight for blocks? And single weight for everything else? If Derrick McKey and Bob Love are both identical to Jerome Kersey, but Mckey gets two blocks a game whereas Kersey only gets one, and Kersey gets 2 assists per game but Love only gets one, would we want to say that McKey is more distant from Kersey than Love is (indeed twice as distant, if one chooses to interpret the numbers that way)? What makes blocks such a key determinant in measuring similarity?

was probably going to go in from even having a chance at going in, so I

gave it 2. Probably I could have given it a weight of 1.7535321 or

something, but again I was interested in simplicity, not absolute

accuracy.

>

As reflected in the brief comment about Allen Iverson's score, not

> >If you included missed FTs and FGs here, it would give you a better handle on FG%.

>

> Yes, I think that's pretty huge. To ignore FGMissed is to throw away valuable information.

including them was a conscious decision on my part. To me, a guy like

Iverson, who consistently scores 20+ a night, no matter how many times he

misses, is a more consistent player than a guy like Paul Pierce, who will

occasionally stop shooting once he gets on a cold streak.

Like all statistics, this one includes the creator's bias.

>

Well, then feel free to design your own metric. I've outlined my reasons

> I don't recall if MikeG's Euclidean distance measures include FGMiss; my impression is that they do not because he doesn't seem to include them when he lists players' similar stats.

>

> But even after FGMisses and FTMisses are included, these measure are still leaving out a hugely important variable for measuring player quality: FG% (and though less important, FT% also).

>

> And simply counting a player's total FGA and FGM is NOT an adequate substitute for looking at FG%.

for why I chose not to include something for low-percentage shooters.

>

p!

> This was my beef with the Tony Minkoff's Minkoff Player Ratings from several years ago (ingenious regressions with minutes played as the dependent variable and standard NBA counting stats as the explanatory variables). They included only the stats which represented counts of actions such as FGM, rebounds, etc. Minkoff (actually he was away for a period of time so I was emailling one of his friends or co-researchers) resisted the idea of putting FG% in as an explanatory variable because it's not something which represents a cumulative count. But they did agree to try adding FG% as an explanatory variable to the regression, as an experiment. It had a highly significant coefficient and improved the regression's r-squared. In other words, it belonged in the regression -- because FG% is an important factor in player quality, big surprise. But because they pre-emptively refused to look at ratio variables such as FG% or TO/Min or PTS/TO, they never did make it part of their!

> layer ratings.

Actually, these were season-to-date totals. Sorry if that wasn't clear.

>

> Same with similarity scores. Career totals

> are fine things to include in the regression, but key ratios such as FG%

Actually, I did control for minutes. Without that control, Brent Barry

> and min/game

would have been near the top of the list and Paul Pierce would have been

at the bottom.

> are also important stats for evaluating players or

On the other side of things, players like Ricky Davis or Bonzi Wells are

> determining similarity.

more likely to get extra minutes *because* they are on a hot streak or

whatever (actually, "hot streak" is not precisely accurate - Bonzi's more

likely to get minutes when matchups dictate that he'll outproduce Ruben

Patterson or Derek Anderson - but the outcome's the same). Essentially,

then, by looking at minutes per game as a factor to reduce similarity,

you're counting things twice.

>

Depending on what you're looking for, sure.

> If I'm comparing various trips and routes, it simply is not adequate to

> only look at distance travelled and time used; it's also important to

> look at distance/time, i.e. speed. And for basketball, it's not enough

> to look at FGM and FGMiss, we have to look at FG% too.

>

Again, if that's what you want to do, fine. Here, I was going for 1. ease

> I suspect the reason that many people are hesitant or resistant is

> because FG% is measured in units which are on a totally different scale

> from FGM, rebounds, etc. But the seeming equivalence of counting FGM and

> assists and rebounds is illusory to begin with; we're comparing apples to

> oranges and in order to properly do this comparison we need some way of

> translating or equivalencing apples and oranges. In other words,

> determing what weights to put on apples, oranges, rebounds, assists, etc.

of use, 2. ease of assessing results, and only 3. straight accuracy. My

take on things is that if I make a complex Tendex or vector-based system

and judge players along those lines, the rankings of the players at the

end of the studies would be approximately if not exactly the same, anyway.

>

I've got to say that one of the things that I *really* like about

> Given that FG% is always between 0 and 1, whereas career points can be

> 20,000 or more, the optimum weight on FG% will undoubtedly be a large

> one. There's a variety of possible approaches here: z-scores (i.e. use

> standard deviations as the units of measurement); making the weight on

> FG% a function of FGA (which often leads to FG% dropping out of the

> similarity formula and FGM appearing with a higher weight); etc. Best

> would be an approach using multivariate statistical methods: principal

> components analysis, factor analysis, cluster analysis, etc. This is

> especially easy for another new (to basketball) measure which is related

> to Similarity Scores: Hall of Fame Monitor scores. Discriminant

> analysis, probit or logistic (aka logit) regression, or even regular

> multivariate regression can be used to easily come up with weights which

> can be used to measure Hall of Fame probability, and which have some

> optimal statistical properties (such as least squared error or maximum

> likelihood) as well.

baseball's HOF Monitor scores in particular and Bill James statistics in

particular is that they are very accessible. I know that that's not always

a popular thing among statheads (accessibility), but I truly believe that

the primary difference between James and the guys who came immediately

before and after him is that his methods were straightforward and easy to

comprehend. As time went on, he factored in more and more things in stats

like Runs Created, but without that ease of use baseball stats would

essentially be where secondary basketball stats are now: outside of a

couple that have been used for eons, nobody trusts 'em.

> >> Well, there you go. If Allen Iverson looks remarkably consistent, that's

Payton's one of the more consistent players according to my list,

> >> because the system makes him that way; my opinion watching him is that

> >> he'll nearly always get his 25 points a night, whether it takes him 15 or

> >> 30 attempts to do so. That's not an error so much as a known bias of this

> >> metric.

> >>

> >> Here they are in order:

> >>

>

> [...]

>

> >> Payton 4.31

>

> >> Bryant 4.84

>

> [...]

>

> >Here's my list, using the same players. Note that Iverson sits in the middle. Shown in

> >order of decreasing consistency.

> >

> >Gary Payton 20.8 4.7 8.7 21.7 2.6

> >Kobe Bryant 30.7 6.9 6.2 28.8 2.9

>

> In addition to the Iverson discrepancy, Ed's list has Payton and Bryant

> as two of the most consistent players whereas JohnC's list has them as

> two of the less consistent players.

actually. He isn't close to Cuttino Mobley, but then again, nobody is.

>

I wonder... it seems to me that year-to-year inconsistency is actually

> I don't have much to say about these discrepancies; I haven't thought

> much about how to measure consistency, except for my comments a couple of

> weeks ago as Ed was tinkering with his consistency formula. As Ed says:

>

> >I still don't know what a consistency rating would be good for, except maybe debunking

> >the notion that player consistency is valuable.

>

> I.e. it's a kind of fun but not real critical thing to investigate.

probably *more* valuable to have in terms of achieving the Ultimate

Goal. Again, I'll refer to a study James talked about in "Politics of

Glory" where he ran a simulation pitting a couple of 200-win pitchers, one

of whom had a high, brief peak, the other with a lower but longer one. The

guy with the higher peak ended up winning a few more titles over the 1000+

seasons he ran.

That, of course, begs several questions, including, to me, the biggie:

what simulation truly simulates any professional team sport? Still, the

logical progression makes sense.

As this applies to game-to-game consistency, that's tougher to say. As

Dean said a long while back, inconsistency is more valuable to bad teams

than to good ones, and the same would seem to apply in the playoffs; a

very inconsistent team like the Celtics, for example, could ride a couple

of favorable matchups into the Eastern Conference Finals (or be bounced in

the first round), while a very consistent team of the same talent level

couldn't reasonably expect to advance past the second. On the other hand,

the 1993-4 Sonics were, to me, the epitome of a great but woefully

inconsistent team, and we all saw what happened to them.

>

On basketballreference.com, he doesn't even appear in the Top 10 on

> Similarity scores however have a variety of potentially important and

> useful applications, but I do not find the similarity measures that I've

> seen to be very convincing. E.g. MikeG's list showing Steve Hawes to be

> the 4th most similar player to Charles Oakley.

Oakley's list. Different methods of achieving similarity, I know, but

you've got to bear in mind that:

a. Basketball has been around nowhere near as long as baseball has, so

there simply isn't a huge pool of retired players each person can be

similar to, which is compounded by

b. The game itself has changed dramatically over the past 30 years, and

even more dramatically over the past 50. I am *not* attempting to rehash

the "which era was better" debate; I am simply pointing out that George

Mikan's career numbers really are somewhat similar to, say, Larry Jones'

or Geoff Petries' (okay, really only barely to Petries'), even if they

achieved them in radically different ways. Once we get a larger database

of players to choose from, we'll be able to tweak those scores and come up

with guys who are truly similar to each other.

c. In addition, several key statistics (3-pointers, turnovers, blocks,

steals) have only been tracked for the last 20-30 years, depending on the

metric. Turnovers alone, for example, make it very, very hard to

compare pre-1979 players to their post-1979 counterparts *even if* we

hadn't entered a dead-ball era around 1995-6.

> His list also included

Mike's stat is far from perfect. However, we're still in relative infancy

> the likes of Curtis Perry, Horace Grant, AC Green, etc. -- all instantly

> recognizable as kindred of the PF banger Oakley. But Hawes was an

> spot-up-shooting (79% on FTs -- few non-Malone PFs achieve that),

> mediocre-rebounding center, nothing like the Oakleys, Silases, Grants,

> Greens, etc. on that list.

regarding basketball sabermetrics (there has to be a better word for

this).

John Craven - << Previous post in topic Next post in topic >>