> Ah, the disadvantage of using purely empirical numbers. They're

numbers

> from the real world, but such numbers are always contaminated with

random

> errors, hence spikes. The theoretical numbers would almost

certainly show

> a smooth pattern. If our theories are good enough (I don't know if

they

> are in this case) then the best result is usually obtained by

combining

> theory and data: start with the raw data but then smooth it in

accordance

> with theory.

the

>

> But if we don't have a good theoretical reason for smoothing (maybe

> real NBA percentages really are supposed to show a spike) then we

No, I think smoothing is OK and that the spiking is random. I just

> shouldn't.

>

haven't done the smoothing. One way we could get a sense for whether

the spiking is random is to have DeanL generate the curve for a

different set of bins and see if the spikes move. That would also

give us a fair way to smooth, rather than my arbitrary hand.

> > numbers don't add up because of rounding (though I'll double

check

> > later). The win% makes sense to me. If a win streak of 1G leads

to

> > an expected win% of just over 0.500 (the average of the prior),

every

> > longer streak goes a little higher. This doesn't seem out of

norm.

>

means

> We may be using different definitions of win streaks. I'm thinking

> that if I'm told that a team had a 15 game winning streak, that

> that that was the LONGEST winning streak that the team achieved.

And if,

> over an 82-game season, I am told that a team's longest winning

streak was

> 1 game, then I'd expect that that was a very very bad team,

not .500 one.

> Even the Bulls this year have had a 2-game winning streak (Dec 29

and Dec

> 31).

We're going at the same thing, but the method has to be used

>

different ways to get at the answers we're looking for. I wouldn't

use the method to test what a 1G winning streak means. I would look

at the team's longest losing streak and plug that in. If a team's

longest winning streak is 5 G and its longest losing streak is 5G,

that generally brackets its likely win% pretty well. If a team's

longest winning streak is 5G, but its longest losing streak is 10G, I

use the 10G streak to ascertain that the team is likely a 27 win

team. But I'd also assume that they are independent events and

multiply probabilities together. That would suggest that the most

likely range of winning %'s is 35-40%, which is a little better than

a 27 win season. (This assumes that the tables I gave you can be

flipped to work with losing streaks, which, strictly, they cannot. I

would strictly have to smooth the distribution so that its

symmetric. I'm lazy.)

>

like

> I suppose you might be using a definition of a win streak something

> this: if we're told that a team had a 15 game winning streak, then

we

> know that it had at least one streak of at least 15 games. That

seems to

> me to be lead to harder probability calculations. If nothing else,

the

> number contains less information now. That 15G team could for all

we know

> also have had a 33 game winning streak that same season. Whereas

if we

> know that 15G was their longest winning streak, we've got a much

better

> idea of what sort of team it's likely to be.

The longest streak has the most info.

DeanO