- First of all, please excuse the inarticulate nature with which I'm

about to explain my problem:

I'm tracking team performances from year to year in the NBA and NHL.

The methods I'm using are simple but have some problems. The first is

winning percentage minus the mean (.500) divided by the standard

deviation for winning percentages that year. The second is point (or

goal) differential divided by the standard deviation for point

differential that year. Rob Neyer and Eddie Epstein did this for their

"Baseball Dynasties" book, although they made the mistake of doing

seperate deviations for runs scored and runs allowed and then adding

them together.

I know that the maximum standard deviations for any team in any given

year is square root (n-1) where n = the number of teams in the league

that year, and I've factored that in.

But there's another factor, and I don't know how to resolve it.

There's also a maximum SD possible for a team depending on how high or

low the league standard deviation is. Um, right?

For winning percentage, I think it's one divided by the league SD. Is

that right?

But for linear numbers like point and goal differential, I have no

clue where to start. In 1976, the standard deviation for point

differential in the league was historically low, making the Golden

State Warriors of that year look like one of the best teams of all

time. While the Warriors were really good, they have no control over

how even the rest of the league is, so I'm trying to account for that.

Is there a way to account for that? Preferably in the form of a

formula I can toss into Excel?

Hope the question is clear.

Moné ----- Original Message -----

From: "monepeterson" <mone@...>

To: <APBR_analysis@yahoogroups.com>

Sent: Thursday, November 27, 2003 3:38 PM

Subject: [APBR_analysis] Maximum standard deviations (math help needed)

>

>First of all, please excuse the inarticulate nature with which I'm

>about to explain my problem:

>

>I'm tracking team performances from year to year in the NBA and NHL.

>

>The methods I'm using are simple but have some problems. The first is

>winning percentage minus the mean (.500) divided by the standard

>deviation for winning percentages that year. The second is point (or

>goal) differential divided by the standard deviation for point

>differential that year. Rob Neyer and Eddie Epstein did this for their

>"Baseball Dynasties" book, although they made the mistake of doing

>seperate deviations for runs scored and runs allowed and then adding

>them together.

>

Point differential is directly related to winning percentages, a la the

Pythagorean method. Since the Pyth is scaled to the overall points scored

environment, you'd probably be better off using that instead of pts diff.

>I know that the maximum standard deviations for any team in any given

>year is square root (n-1) where n = the number of teams in the league

>that year, and I've factored that in.

>

I'm not sure what you mean here. Standard deviations are a measure of

variation, which is a property of groups (like a league), not individual

teams. You can have a maximum possible SD of winning percentages for a

league (~ .500 I think for a 29 team league), but there is no maximum

possible SD for a team.

>But there's another factor, and I don't know how to resolve it.

>There's also a maximum SD possible for a team depending on how high or

>low the league standard deviation is. Um, right?

>

I think I see what you mean now. Are you using "standard deviation" to mean

the result to your equation (win% - .500)/SD ? If so, the maximum possible

result in an 82-game schedule is a little more than 5. (That number comes

from a 29 team league in which one team has 82 wins, 15 teams have 40 wins,

and 13 teams have 39 wins. That is, a league with a low SD and one team with

an extreme win%.)

>For winning percentage, I think it's one divided by the league SD. Is

>that right?

>

>But for linear numbers like point and goal differential, I have no

>clue where to start. In 1976, the standard deviation for point

>differential in the league was historically low, making the Golden

>State Warriors of that year look like one of the best teams of all

>time. While the Warriors were really good, they have no control over

>how even the rest of the league is, so I'm trying to account for that.

>Is there a way to account for that? Preferably in the form of a

>formula I can toss into Excel?

>

>Hope the question is clear.

>

>Moné

>

Perhaps you can tell us why you need to know the maximum SD?

ed- -----Original Message-----

From: igor eduardo küpfer [mailto:igorkupfer@...]

Sent: Friday, November 28, 2003 9:13 PM

----- Original Message -----

>From: "monepeterson" <mone@...>

To: <APBR_analysis@yahoogroups.com>

Sent: Thursday, November 27, 2003 3:38 PM

>>I know that the maximum standard deviations for any team in any given

>>year is square root (n-1) where n = the number of teams in the league

>>that year, and I've factored that in.

>>

>

>I'm not sure what you mean here. Standard deviations are a measure of

>variation, which is a property of groups (like a league), not individual

>teams. You can have a maximum possible SD of winning percentages for a

>league (~ .500 I think for a 29 team league), but there is no maximum

>possible SD for a team.

Ed's answer is right on.

>>But there's another factor, and I don't know how to resolve it.

>>There's also a maximum SD possible for a team depending on how high or

>>low the league standard deviation is. Um, right?

>>

>

>I think I see what you mean now. Are you using "standard deviation" to mean

>the result to your equation (win% - .500)/SD ? If so, the maximum possible

I think again Ed is probably correctly interepreting what Monet is trying to

do. The (win%-.500)/SD formula is known as a "standardized score" or "z-score".

It provides a way to statistically mix apples and oranges by putting them all

on a single, standardized scale, namely the number of standard deviations above

or below the mean. It would indeed be a pretty good way of looking at teams'

performances over the years -- simply looking at simple win percentages would

be pretty good as well, but some people might be concerned that a .600 record

may be more meaningful in a competitive, low standard deviation league (the

NBA in 1977 e.g., where no team won more than 53 games nor won fewer than 22)

than in a league with several teams around 60 wins.

>result in an 82-game schedule is a little more than 5. (That number comes

>from a 29 team league in which one team has 82 wins, 15 teams have 40 wins,

>and 13 teams have 39 wins. That is, a league with a low SD and one team with

>an extreme win%.)

Again a very good answer, that's the way to get the maximum number of standard

deviations above the mean. BTW, the Nobel prize-winning economist Paul

Samuelson once published an article with a title something like "How Many

Standard Deviations Above the Mean Can You Be?" which worked through that

and other similar calculations. The rumor was that Samuelson was inspired

to write the article because he was fond of dismissing inferior thinkers by

proclaiming that they had IQs which were "a million standard deviations

below the mean". Eventually he wondered if, even with x billion human

beings, it was mathematically possible to be one million standard deviations

below the mean.

--MKT

P.S. On a different note, I got back from vacation to find two delightful

deliveries at my door: JohnH's _Basketball Prospectus_ and DeanO's

_Basketball on Paper_. Due to returning late Monday night, I absolutely

did not and still do not have time to really look at them, but DeanO's

chapter on Cummings, Kemp, and Sikma was one I couldn't resist reading

(I think it's fair to say the chapter was very much inspired by our

APBR discussions of these players, especially Cummings). I also wanted

to see how he used the defensive scoresheets from his WNBA Defensive

Scoresheet project; there's been some interesting work by Michael

Humphreys which he's partially revealed at baseballprimer.com on

defense by baseball players and it could be an interesting contest to

see which sport is able to make more progress in shedding light into

one of their biggest black holes: evaluating defense by individual

players. Baseball of course does have zone ratings and other related

detailed observational measures ... but DeanO has defensive

scoresheets, at least for one WNBA season anyway. - --- In APBR_analysis@yahoogroups.com, "Michael Tamada" <tamada@o...>

wrote:> ...DeanO's

reading

> chapter on Cummings, Kemp, and Sikma was one I couldn't resist

> (I think it's fair to say the chapter was very much inspired by our

Well, I hope he dedicated the chapter to moi !

> APBR discussions of these players, especially Cummings)...

Sikma hasn't been hammered (as a subject) like the other 2. Maybe

he's due.

Without the book, I can only guess DeanO doesn't like Kemp's

turnovers nor Cummings' so-so shooting %. Hey, Kemp got plenty of

MVP votes in spite of his middling minutes.

My own efforts have been to give credit where it's due, and

popularity be damned. It's more gratifying to recognize and

acknowledge, than to join a chorus of yeas or nays.

It's worth noting that in a couple of years, DeanO may have come to

recognize that backward-looking analysis is actually fun and

interesting to many people. It doesn't predict anything useful,

perhaps. Maybe it helps sell books. - --- In APBR_analysis@yahoogroups.com, "Mike G" <msg_53@h...> wrote:
> --- In APBR_analysis@yahoogroups.com, "Michael Tamada" <tamada@o...>

You are certainly thanked for putting out lists that people hammer on.

> wrote:

> > ...DeanO's

> > chapter on Cummings, Kemp, and Sikma was one I couldn't resist

> reading

> > (I think it's fair to say the chapter was very much inspired by our

> > APBR discussions of these players, especially Cummings)...

>

> Well, I hope he dedicated the chapter to moi !

>

You should probably read it. It's a lot easier to see where my take

> Sikma hasn't been hammered (as a subject) like the other 2. Maybe

> he's due.

>

> Without the book, I can only guess DeanO doesn't like Kemp's

> turnovers nor Cummings' so-so shooting %. Hey, Kemp got plenty of

> MVP votes in spite of his middling minutes.

comes from with the full context. At least add it to you Christmas

list...

>

In the book, you'll see why I looked at these players. Specifically,

> My own efforts have been to give credit where it's due, and

> popularity be damned. It's more gratifying to recognize and

> acknowledge, than to join a chorus of yeas or nays.

>

> It's worth noting that in a couple of years, DeanO may have come to

> recognize that backward-looking analysis is actually fun and

> interesting to many people. It doesn't predict anything useful,

> perhaps. Maybe it helps sell books.

I looked at different classes of players to understand how large a

contribution they could make to teams. I _know_ that Kalb's book on

the 50 Greatest is going to sell a helluva lot more than mine because

his focus is on who is better than who (pub debate material), whereas

mine is how to build a better team (front office debate material).

When I look back, I do so to understand how best to go forward, not

just to discuss who's better. In doing that, identifying the types of

players that add wins and losses in various amounts is quite useful.

But rank 'em? I provide wins and losses (which I think is better than

Bill James' win shares), you can rank how you like.

DeanO