## RE: [APBR_analysis] Re: similarity scores

Expand Messages
• That pretty well sums up the question, yes. --MKT ... From: Gabe Farkas [mailto:gabefark@yahoo.com] Sent: Wednesday, December 08, 2004 7:51 AM To:
Message 1 of 42 , Dec 8, 2004
• 0 Attachment
That pretty well sums up the question, yes.

--MKT

-----Original Message-----
From: Gabe Farkas [mailto:gabefark@...]
Sent: Wednesday, December 08, 2004 7:51 AM
To: APBR_analysis@yahoogroups.com
Subject: RE: [APBR_analysis] Re: similarity scores

I think then we have to first define our question. Are

"What are the standard deviations for statistics
accumulated by NBA players?"

or

"What are the standard deviations for statistics
accumulated in NBA games?"

> The problem is that for every Kobe, there's
> not just one, there's more like,oh, maybe
> FIVE scrubs like Sundov. (It's always easier
> to find mass quantities of the untalented,
> whereas the most talented are extremely rare.)
> So the unweighted average will heavily reflect
> those scrubby players, because there's a lot
> more of them than there are Kobes in the league.
>
> But when you look at who's actually on the
> court, Kobe probably plays MORE minutes
> than all of those five PUT TOGETHER!
>
> So the "average player", in terms of what the
> fans see, in terms of who the opponents
> play against, in terms of who's generating
> the statistics on the court, should reflect
> Kobe's stats more heavily (assuming that he
> indeed plays more minutes than the five Sundovs
> combined), and the combined scrubs' stats less
> heavily.
>
> And the individual scrubs would get their
> weights divided by about 5 (if there's five
> of them, playing approximately equal minutes).
> That's a small weight indeed, but one which
> properly reflects their small impact on what
> happens on the court, due to their small minutes.
>
> --MKT
>
>
> -----Original Message-----
> From: Gabe Farkas [mailto:gabefark@...]
> Sent: Wednesday, December 08, 2004 6:12 AM
> To: APBR_analysis@yahoogroups.com
> Subject: RE: [APBR_analysis] Re: similarity scores
>
>
>
> I definitely understand your example, but I'm still
> not sure I agree.
>
> If our original challenge is to establish a standard
> dev for all players in the NBA, then what would
> weighting by minute accomplish? I would think that
> for
> every 28 ppg scorer like Kobe, a 2 ppg scrub like
> Bruno Sundov would balance him out on the other
> tail.
> Of course, until anyone does the analysis, we won't
> see which way makes more sense...
>
>
>
> > -----Original Message-----
> > From: Gabe Farkas [mailto:gabefark@...]
> > Sent: Wednesday, December 08, 2004 4:54 AM
> >
> > >--- Justin Kubatko
> > <jkubatko@...>
> > >wrote:
> > >
> > >> My idea is to do what you suggested: weight by
> > >> minutes played. You
> > >> must use some weighting factor, otherwise you
> > have
> > >> the big problem
> > >> Michael mentioned: a 10-day replacement carries
> > the
> > >> same weight as
> > >> Kobe Bryant.
> > >
> > >
> > >Isn't that part of the point of a normal
> > distribution?
> > >That all measures carry the same weight?
> >
> > No, normality doesn't have anything to do with
> > weights, not
> > directly. Sometimes in fact it is necessary to
> > weight the
> > observations in order to achieve normality, or
> > something
> > along the lines of normality (homogeneity, or
> > homoscedasticity).
> > Not that normality in and of itself is usually a
> > critical
> > characteristic to achieve.
> >
> >
> > The problem with not weighting the players is
> this:
> > out of the
> > 400 or so people who will play in the NBA this
> year,
> > only
> > 24 will be literal all-stars. A little over 1/3
> > will be
> > what we would call "starters". The other 2/3 will
> > be subs,
> > benchwarmers, replacement players, etc.
> >
> > If we calculate unweighted averages (and standard
> > deviations)
> > based on those 400 observations, the average
> player
> > will
> > be a pretty crummy one. For example, the median
> > player
> > (out of 400, this would be the one who's about the
> > 200th
> > best) won't even be of starting quality, being
> > ranked 200th,
> > with only about 150 starters in the league.
> >
> > Now, as you say, maybe that's what we want.
> Compare
> > the Kobes
> > to that mediocre median guy.
> >
> > But the problem with this is that this yields
> > player". Because
> > those players below the median, indeed those
> players
> > below the
> > level of quality of the starters, are all playing
> > significantly
> > fewer minutes on average than the top 1/3 players
> > are. More minutes
> > go to the starters.
> >
> > So most of the time, what we see on an NBA court
> are
> > not guys whose
> > average quality is that of the 200th best player.
> > Rather we see
> > guys whose average quality is better than that.
> >
> >
> > If that's not clear, here's an example. Suppose
> the
> > Sonics have 12
> > players on their roster, 5 starters and 7 subs.
> > Moreover, assume
> > that they give the starters ALL of the minutes,
> and
> > never play the
> > subs at all.
> >
> > Question: what is the quality of the median Sonic
> > player? Using
> > unweighted statistics, we'd find the 6th and 7th
> > players out of
> > the 12, and the mean of their stats would yield
> the
> > median Sonic
> > -- the middle guy out of the 12.
> >
> > But, under the extreme minutes assumption, that
> > median Sonic
> > NEVER plays! What the Sonics put onto the court
> are
> > their 5
> > starters, and the median quality of the Sonics
> > players who
> > are actually playing is the median of those 5,
> i.e.
> > their
> > 3rd best player (who arguably could be Ridnour,
> > Fortson,
> > or Antonio Daniels).
> >
> >
> > Of course, teams are not that extreme in
> allocating
> > minutes
> > to their players. But this example illustrates
> the
> > problem with
> > looking at all 400 players weighted equally. It
> > gives too
> > much prominence to players who are rarely on the
> > court, and
> > who are rarely doing their lousy shooting, high
> > turnovers,
> > lack of blocks, etc. The resulting means and
> > standard
> > deviations would not reflect what's actually
> > happening on
> > the court.
> >
> >
> > I guess I've convinced myself not only that
> > weighting is
> > the way to go, but the weights should be based on
> > minutes
> > played, in order for the means to correspond to
>
=== message truncated ===

__________________________________
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
http://promotions.yahoo.com/new_mail

• Thank you. I am an idiot sometimes. It is a challenge sometimes to write out correct and clear instructions. And to think this is what I do for a living.
Message 42 of 42 , Dec 20, 2004
• 0 Attachment
Thank you. I am an idiot sometimes. It is a challenge sometimes to
write out correct and clear instructions. And to think this is what
I do for a living. :)

1. Compute the weighted mean.
2. Compute the squared deviations from the weighted mean.
3. Weight the squared deviations by minutes played.
4. Sum the weighted squared deviations.
5. Divide this sum by the sum of minutes played.
6. Take the square root of this weighted average of the squared
deviations.

And I do not think there is any need to square the weights. I do
not believe this is what is done in most typical regression packages.

Best wishes,
Dan

wrote:
> Yup, although if it's a standard deviation that we're
> calculating, I think what you want in step 2 is to
> SQUARE the deviations.
>
> And in 5, although dividing by the sum of minutes played
> is good, arguably better might be to reduce that
> figure slightly to correct for degrees of freedom,
> by multiplying the minutes played by (N-1)/N.
>
> And given the squaring that I describe in step 2,
> then there of course needs to be a step 6: take
> the square root, after you finish step 5.
>
> The procedure that I describe is, e.g., the one
> described in the National Institute of Standards
> and Technology's nifty statistics website:
>
http://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weightsd
.pdf
>
> Come to think of it, in a regression framework,
> wouldn't we square the weights too, in Step 2?
> Ah, I'll worry about that later. DanR's procedure
> is the one I'd follow, but with the amendments listed
> above.
>
>
> --MKT
>
>
> -----Original Message-----
> From: dan_t_rosenbaum [mailto:rosenbaum@u...]
> Sent: Saturday, December 18, 2004 10:04 AM
>
>
> I just compute standard deviations weighted by minutes played. I
> could not find where Excel does this, but what you could do is the
> following.
>
> 1. Compute the weighted mean.
> 2. Compute the deviations from the weighted mean.
> 3. Weight the deviations by minutes played.
> 4. Sum the weighted deviations.
> 5. Divide this sum by the sum of minutes played.
>
> --- In APBR_analysis@yahoogroups.com, "thedawgsareout"
> <kpelton08@h...> wrote:
> >
> > > The notion of the average representing the range of
> > > players that you'd actually see play is an interesting
> > > one.
> > >
> > > I think what it comes down to is this: do we want an
> > > average of what happens during NBA games, or an
> > > average of what NBA players do? You're advocating the
> > > former, and I guess I am asking about the latter.
> > >
> > > Either way is fine, I guess it comes down to
> > > semantics.
> >
> > Maybe someone's mentioned this and I've missed it, but what do
you
> > guys plan to do about standard deviation if you use some sort of
> > weighted system?
> >
> > I would argue that in this case, standard deviation is far more
> > important than average. You're not going to change average very
> much
> > depending on what population you use, but standard deviation
> changes
> > quite significantly. The reason you don't use low-minutes guys
> isn't
> > because they're not NBA players; it's because their stats are
> > obviously not significant.
> >
> > Let's use rebounds per 48 minutes last year as an example.
> >
> > If you take the pure average of everyone in the league, you get
> > 8.38. If you weight by minutes, you get 8.38. If you cut off at
> 250
> > minutes and take the pure average (which is what I do), you get
> 8.52.
> >
> > There's a difference there, but not an enormous one.
> >
> > If you take the standard deviation of guys with 250 minutes or
> more,
> > it's 3.52. The standard deviation of everyone is 3.76. That's a
> > bigger difference (though you could argue that because changing
> > average takes guys from above average to below it, it's more
> > significant).
>
>
>
>
>
>
>