Loading ...
Sorry, an error occurred while loading the content.
 

Re: similarity scores

Expand Messages
  • Justin Kubatko
    ... The sample size issue was the reason I multiplied 3FG by 3FG%. So you re saying you would go with straight two-point shooting percentage, correct? How
    Message 1 of 42 , Dec 6, 2004
      Kevin Pelton (I think) wrote:

      > I'm not sure, however, that it makes sense to do the same
      > for two-point percentage, in that you don't have the kind
      > of sample size issues with twos you have with threes (i.e.
      > Player X goes 2-3 -- that obviously isn't sufficient to
      > conclude he's a 67% three-poitn shooter) and in that
      > two-point attempts are already contained somewhat in
      > possession rate.

      The sample size issue was the reason I multiplied 3FG by 3FG%. So
      you're saying you would go with straight two-point shooting
      percentage, correct? How about free throw shooting?

      > I'd also suggest reporting your scores out of 100 instead
      > of 1000, since that's the format we're generally used to.

      That would be easy enough. As I think I mentioned here before, I know
      much more about baseball analysis than basketball analysis, which is
      where the 1000 for a perfect similarity score comes from. I've been
      trying to catch up as fast as I can in the basketball analysis world,
      though.

      --
      Regards,
      Justin Kubatko
      Basketball Stats! http://www.basketball-reference.com
    • dan_t_rosenbaum
      Thank you. I am an idiot sometimes. It is a challenge sometimes to write out correct and clear instructions. And to think this is what I do for a living.
      Message 42 of 42 , Dec 20, 2004
        Thank you. I am an idiot sometimes. It is a challenge sometimes to
        write out correct and clear instructions. And to think this is what
        I do for a living. :)

        1. Compute the weighted mean.
        2. Compute the squared deviations from the weighted mean.
        3. Weight the squared deviations by minutes played.
        4. Sum the weighted squared deviations.
        5. Divide this sum by the sum of minutes played.
        6. Take the square root of this weighted average of the squared
        deviations.

        And I do not think there is any need to square the weights. I do
        not believe this is what is done in most typical regression packages.

        Best wishes,
        Dan

        --- In APBR_analysis@yahoogroups.com, "Michael Tamada" <tamada@o...>
        wrote:
        > Yup, although if it's a standard deviation that we're
        > calculating, I think what you want in step 2 is to
        > SQUARE the deviations.
        >
        > And in 5, although dividing by the sum of minutes played
        > is good, arguably better might be to reduce that
        > figure slightly to correct for degrees of freedom,
        > by multiplying the minutes played by (N-1)/N.
        >
        > And given the squaring that I describe in step 2,
        > then there of course needs to be a step 6: take
        > the square root, after you finish step 5.
        >
        > The procedure that I describe is, e.g., the one
        > described in the National Institute of Standards
        > and Technology's nifty statistics website:
        >
        http://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weightsd
        .pdf
        >
        > Come to think of it, in a regression framework,
        > wouldn't we square the weights too, in Step 2?
        > Ah, I'll worry about that later. DanR's procedure
        > is the one I'd follow, but with the amendments listed
        > above.
        >
        >
        > --MKT
        >
        >
        > -----Original Message-----
        > From: dan_t_rosenbaum [mailto:rosenbaum@u...]
        > Sent: Saturday, December 18, 2004 10:04 AM
        >
        >
        > I just compute standard deviations weighted by minutes played. I
        > could not find where Excel does this, but what you could do is the
        > following.
        >
        > 1. Compute the weighted mean.
        > 2. Compute the deviations from the weighted mean.
        > 3. Weight the deviations by minutes played.
        > 4. Sum the weighted deviations.
        > 5. Divide this sum by the sum of minutes played.
        >
        > --- In APBR_analysis@yahoogroups.com, "thedawgsareout"
        > <kpelton08@h...> wrote:
        > >
        > > > The notion of the average representing the range of
        > > > players that you'd actually see play is an interesting
        > > > one.
        > > >
        > > > I think what it comes down to is this: do we want an
        > > > average of what happens during NBA games, or an
        > > > average of what NBA players do? You're advocating the
        > > > former, and I guess I am asking about the latter.
        > > >
        > > > Either way is fine, I guess it comes down to
        > > > semantics.
        > >
        > > Maybe someone's mentioned this and I've missed it, but what do
        you
        > > guys plan to do about standard deviation if you use some sort of
        > > weighted system?
        > >
        > > I would argue that in this case, standard deviation is far more
        > > important than average. You're not going to change average very
        > much
        > > depending on what population you use, but standard deviation
        > changes
        > > quite significantly. The reason you don't use low-minutes guys
        > isn't
        > > because they're not NBA players; it's because their stats are
        > > obviously not significant.
        > >
        > > Let's use rebounds per 48 minutes last year as an example.
        > >
        > > If you take the pure average of everyone in the league, you get
        > > 8.38. If you weight by minutes, you get 8.38. If you cut off at
        > 250
        > > minutes and take the pure average (which is what I do), you get
        > 8.52.
        > >
        > > There's a difference there, but not an enormous one.
        > >
        > > If you take the standard deviation of guys with 250 minutes or
        > more,
        > > it's 3.52. The standard deviation of everyone is 3.76. That's a
        > > bigger difference (though you could argue that because changing
        > > average takes guys from above average to below it, it's more
        > > significant).
        >
        >
        >
        >
        >
        >
        >
        > Yahoo! Groups Links
      Your message has been successfully submitted and would be delivered to recipients shortly.