## Winning streak / team strength question

Expand Messages
• If one were told that a certain NBA team had an X-game winning streak where X is the length of that winning streak, would one have enough information to make a
Message 1 of 12 , Jan 7, 2002
• 0 Attachment
If one were told that a certain NBA team had an X-game winning streak
where X is the length of that winning streak, would one have enough
information to make a reasonable calculation of that team's minimum
strength? For instance, if I learned that the Spurs had a 15-game
winning streak in one season, I would tend to infer that they had 60+
wins in that season. How do I arrive at a precise figure, if possible?
• I took a stab at this but didn t have time to come close to getting the correct formula. There s one key part of the inference which I think requires some
Message 2 of 12 , Jan 14, 2002
• 0 Attachment
I took a stab at this but didn't have time to come close to getting the
correct formula. There's one key part of the inference which I think
requires some assumptions or actual data. Anyway, there's about three
steps (I couldn't even finish step 1):

1. For a team that plays 82 games, wins W of them (for simplicity, let's
assume that it has an equal chance of winning each game, p = W/82,
although realistically a team will have a higher probability of beating
the 2002 Bulls than the 2002 Lakers -- but how would one incorporate THAT
into the calculation, without knowing every single team's schedule?).
Anyway, for any such team with W wins, what is the probability that that
team will win at most X games in a row at some
point during the season. For X=2 this is fairly easy to calculate...
extremely easy for a 2 game season, not bad for a 3 game season, a little
worse for a 4 game season but you can see a pattern that develops. So
there's the formula for X=2, once you take it out to 82 games.

But I haven't even begun to figure it our for X=3, X=4, or arbitrary X.

All of that is the solid, deductive part. But then there's step 2.

2. With the formula above, you can calculate for any team with W wins,
what is the probability that they at some point won at most X games in a
row (I assume in your question about a team with a 15-game winning streak,
that that was the team's LONGEST winning streak. Obviously if a team won
15 games in a row, and later in the season won 33 games in a row, the
likely win total is going to be very different.)

Unfortunately, this is not the probability that you need. What you're
really asking for is the opposite: for a team that won X games in a row,
what is the most likely W associated with that team?

And those are very different probabilities. These are called conditional
probabilities, usually denoted p(W|X) and p(X|W). Another example:
what's the probability that a crack cocaine addict started out by first
smoking marijuana? Very high. But before we start talking about how
marijuana puts you on the fast track to crack h*ll, we need to know the
truly relevant probability: what percent of marijuana smokers later go on

[None of the above should be construed as an endorsement of either
activity.]

So all that work in Step 1, though important, is giving us the "wrong"
probability, so to speak it gave us p(X|W) instead of p(W|X).

How do we find the most likely W? It's an example of Bayes Rule, for
those of you who have studied probability. Here's a key example: start
by looking at the "wrong" probabilities. A team with W=82 wins will have
at its longest winning streak X=82. So clearly it is no candidate, ditto
a team with W=81 wins, etc., down to W=78. A team with W=77 probably has
some really lengthy winning streaks, but it does have a chance (a small
one) that its longest winning streak was only X=15, if its losses were
distributed just right.

Etc. etc. A team with W=15 of course has almost no chance of having an
X=15 streak. A team with W=14 has, like the W=78 team, literally no
chance of having X=15 be its longest winning streak.

Maybe we find that teams with W=55 have the highest probability of
having an X=15 win streak. Remember however that this is not the
conditional probability that we need!

Because an important point is that whereas teams with say W=53 wins
may have a lower probability of winning X=15 games in a row, it is also
true that W=53 teams are more common (have a higher probability of
occuring) than W=55 teams. And that larger probability of occurence can
override the fact that they may have a harder time (lower probability) of
achieving X=15.

That tradeoff, between teams such as W=55 with high "wrong"probabilities
but low probabilities of existing at all, vs teams such as W=53, is what
Bayes Rule allows us to precisely measure.

But you may see the problem: what IS the probability of a team achieving
W=53? W=55? W=72?

That brings us to step 3.

3. You've either got to make assumptions about the likelihoods of the
various W levels; or make assumptions about the distribution of teams'
underlying probabilities of winning p, and deduce the resulting W's we'd
observe; or use direct observation on actual NBA data and see how often
W=53, W=55, etc occured.

it's real data, not theoretical calculations. The disadvantage is that it
doesn't give you the real probabilities, it only gives you estimates of
the probabilities. E.g., if you literally rely on the data, the
probability of an NBA team winning exactly W=71 games is 0 -- it's never
been done before. Realistically there is SOME probability of it occuring;
it's just that that probability is so low we've never seen it happen.

Anyway, by hook or by crook you need to come up with estimates of how
likely it is to observe probabilities of W=55, W=69, etc.

One you've got those, you combine them with the probabilities of the
various X streaks from step 1, apply Bayes' Rule, and you've got your

Here's a highly simplified example: suppose there are only two kinds of W
teams: team that win W=55 games and teams that win W=41 games.
Assume that teams that win W=55 games have a probability p=.1 of winning
at most X=15 games in a row. In other words, p(X=15|W=55) = .1

In contrast, the W=41 teams have only half the chance of achieving X=15:
p(X=15|W=41) = .05.

But suppose that we know, or assume, or find out, that W=41 teams are
three times more common than W=55 teams: p(W=41) = .75
whereas p(W=55) = .25.

Now suppose we observe that a team has X=15. Now we want to know: what
is the probability that it's a W=41 team, and what is the probability that
it's a W=55 team? Bayes Rule says that

p(W=41|X=15) =

p(X=15|W=41)*p(W=41) / ( p(X=15|W=41)*p(W=41) + p(X=15|W=55)*p(W=55) ) =

.05*.75 / .05*.75 + .1*.25 =

.0375 / (.0375+.0250) = .6

So even though 55-win teams are much more likely to win 15 games in a row
than 41-win teams, the fact that 41-win teams are so much more common make
it more likely that the X=15 team you observed was in fact a 41-win team:
p = .6.

Anyway, THAT'S how you'd find a team's probability of having any given W
value, based on observing their longest winning streak X. Easy? No.
Do-able? Yes I think, by using the Three Steps.

It occurs to me that given that you'd almost certainly have to look at
actual NBA data to estimate the probability of W=41m W=69, etc. you might
as well look up each team's longest winning streak while you're at it.
And the resulting data set would show you all the W's and all the X's, and
you could make good estimates straight from that data.

--MKT

On Mon, 7 Jan 2002, dlirag wrote:

> If one were told that a certain NBA team had an X-game winning streak
> where X is the length of that winning streak, would one have enough
> information to make a reasonable calculation of that team's minimum
> strength? For instance, if I learned that the Spurs had a 15-game
> winning streak in one season, I would tend to infer that they had 60+
> wins in that season. How do I arrive at a precise figure, if possible?
>
>
>
> To unsubscribe from this group, send an email to:
> APBR_analysis-unsubscribe@yahoogroups.com
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
• ... I didn t fully read MikeT s explanation, but I saw Bayes and I saw a formula that looked almost exactly like one I worked on during my last plane flight,
Message 3 of 12 , Jan 14, 2002
• 0 Attachment

I didn't fully read MikeT's explanation, but I saw Bayes and I saw a
formula that looked almost exactly like one I worked on during my
last plane flight, so I think he's right. Here's what we need.

1. Someone find out the historical distribution of winning
percentages in NBA history. What's the probability of a team winning

10-15% (0.100-0.150)
15.1-20%
20.1-25%
...
85.1-90% (0.851-0.900)

That will give us the prior distribution we need. It should be
clustered around the 0.500 record

2. I have an excel sheet that calculates the odds of a team with a
given winning percentage having at least 1 streak of X games. I will
look into reconstructing it with the prior distribution from step 1
so that it will automatically calculate the P(win% | streak). If I
can't, I'll generate a rough table.

Someone do step 1 for me. I'll try to find time to work on my part.

Dean Oliver

> I took a stab at this but didn't have time to come close to
getting the
> correct formula. There's one key part of the inference which I
think
> requires some assumptions or actual data. Anyway, there's about
three
> steps (I couldn't even finish step 1):
>
> 1. For a team that plays 82 games, wins W of them (for simplicity,
let's
> assume that it has an equal chance of winning each game, p = W/82,
> although realistically a team will have a higher probability of
beating
> the 2002 Bulls than the 2002 Lakers -- but how would one
incorporate THAT
> into the calculation, without knowing every single team's
schedule?).
> Anyway, for any such team with W wins, what is the probability that
that
> team will win at most X games in a row at some
> point during the season. For X=2 this is fairly easy to
calculate...
> extremely easy for a 2 game season, not bad for a 3 game season, a
little
> worse for a 4 game season but you can see a pattern that develops.
So
> there's the formula for X=2, once you take it out to 82 games.
>
> But I haven't even begun to figure it our for X=3, X=4, or
arbitrary X.
>
> All of that is the solid, deductive part. But then there's step 2.
>
>
> 2. With the formula above, you can calculate for any team with W
wins,
> what is the probability that they at some point won at most X games
in a
> row (I assume in your question about a team with a 15-game winning
streak,
> that that was the team's LONGEST winning streak. Obviously if a
team won
> 15 games in a row, and later in the season won 33 games in a row,
the
> likely win total is going to be very different.)
>
> Unfortunately, this is not the probability that you need. What
you're
> really asking for is the opposite: for a team that won X games in
a row,
> what is the most likely W associated with that team?
>
> And those are very different probabilities. These are called
conditional
> probabilities, usually denoted p(W|X) and p(X|W). Another example:
> what's the probability that a crack cocaine addict started out by
first
> smoking marijuana? Very high. But before we start talking about
how
> marijuana puts you on the fast track to crack h*ll, we need to know
the
> truly relevant probability: what percent of marijuana smokers
later go on
>
> [None of the above should be construed as an endorsement of either
> activity.]
>
> So all that work in Step 1, though important, is giving us
the "wrong"
> probability, so to speak it gave us p(X|W) instead of p(W|X).
>
>
> How do we find the most likely W? It's an example of Bayes Rule,
for
> those of you who have studied probability. Here's a key example:
start
> by looking at the "wrong" probabilities. A team with W=82 wins
will have
> at its longest winning streak X=82. So clearly it is no candidate,
ditto
> a team with W=81 wins, etc., down to W=78. A team with W=77
probably has
> some really lengthy winning streaks, but it does have a chance (a
small
> one) that its longest winning streak was only X=15, if its losses
were
> distributed just right.
>
> Etc. etc. A team with W=15 of course has almost no chance of
having an
> X=15 streak. A team with W=14 has, like the W=78 team, literally no
> chance of having X=15 be its longest winning streak.
>
> Maybe we find that teams with W=55 have the highest probability of
> having an X=15 win streak. Remember however that this is not the
> conditional probability that we need!
>
> Because an important point is that whereas teams with say W=53 wins
> may have a lower probability of winning X=15 games in a row, it is
also
> true that W=53 teams are more common (have a higher probability of
> occuring) than W=55 teams. And that larger probability of
occurence can
> override the fact that they may have a harder time (lower
probability) of
> achieving X=15.
>
> That tradeoff, between teams such as W=55 with
high "wrong"probabilities
> but low probabilities of existing at all, vs teams such as W=53, is
what
> Bayes Rule allows us to precisely measure.
>
> But you may see the problem: what IS the probability of a team
achieving
> W=53? W=55? W=72?
>
> That brings us to step 3.
>
> 3. You've either got to make assumptions about the likelihoods of
the
> various W levels; or make assumptions about the distribution of
teams'
> underlying probabilities of winning p, and deduce the resulting W's
we'd
> observe; or use direct observation on actual NBA data and see how
often
> W=53, W=55, etc occured.
>
is that
> it's real data, not theoretical calculations. The disadvantage is
that it
> doesn't give you the real probabilities, it only gives you
estimates of
> the probabilities. E.g., if you literally rely on the data, the
> probability of an NBA team winning exactly W=71 games is 0 -- it's
never
> been done before. Realistically there is SOME probability of it
occuring;
> it's just that that probability is so low we've never seen it
happen.
>
> Anyway, by hook or by crook you need to come up with estimates of
how
> likely it is to observe probabilities of W=55, W=69, etc.
>
> One you've got those, you combine them with the probabilities of the
> various X streaks from step 1, apply Bayes' Rule, and you've got
your
>
> Here's a highly simplified example: suppose there are only two
kinds of W
> teams: team that win W=55 games and teams that win W=41 games.
> Assume that teams that win W=55 games have a probability p=.1 of
winning
> at most X=15 games in a row. In other words, p(X=15|W=55) = .1
>
> In contrast, the W=41 teams have only half the chance of achieving
X=15:
> p(X=15|W=41) = .05.
>
> But suppose that we know, or assume, or find out, that W=41 teams
are
> three times more common than W=55 teams: p(W=41) = .75
> whereas p(W=55) = .25.
>
> Now suppose we observe that a team has X=15. Now we want to know:
what
> is the probability that it's a W=41 team, and what is the
probability that
> it's a W=55 team? Bayes Rule says that
>
> p(W=41|X=15) =
>
> p(X=15|W=41)*p(W=41) / ( p(X=15|W=41)*p(W=41) + p(X=15|W=55)*p
(W=55) ) =
>
> .05*.75 / .05*.75
+ .1*.25 =
>
> .0375 / (.0375+.0250) = .6
>
>
> So even though 55-win teams are much more likely to win 15 games in
a row
> than 41-win teams, the fact that 41-win teams are so much more
common make
> it more likely that the X=15 team you observed was in fact a 41-win
team:
> p = .6.
>
>
> Anyway, THAT'S how you'd find a team's probability of having any
given W
> value, based on observing their longest winning streak X. Easy?
No.
> Do-able? Yes I think, by using the Three Steps.
>
>
> It occurs to me that given that you'd almost certainly have to look
at
> actual NBA data to estimate the probability of W=41m W=69, etc. you
might
> as well look up each team's longest winning streak while you're at
it.
> And the resulting data set would show you all the W's and all the
X's, and
> you could make good estimates straight from that data.
>
>
>
> --MKT
>
>
>
> On Mon, 7 Jan 2002, dlirag wrote:
>
> > If one were told that a certain NBA team had an X-game winning
streak
> > where X is the length of that winning streak, would one have
enough
> > information to make a reasonable calculation of that team's
minimum
> > strength? For instance, if I learned that the Spurs had a 15-game
> > winning streak in one season, I would tend to infer that they had
60+
> > wins in that season. How do I arrive at a precise figure, if
possible?
> >
> >
> >
> > To unsubscribe from this group, send an email to:
> > APBR_analysis-unsubscribe@y...
> >
> >
> >
> > Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/
> >
> >
• 1. Someone find out the historical distribution of winning percentages in NBA history. What s the probability of a team winning 10 - 15 % 5 15.1 - 20
Message 4 of 12 , Jan 14, 2002
• 0 Attachment

1.  Someone find out the historical distribution of winning
percentages in NBA history.  What's the probability of a team winning

10    - 15 %      5
15.1 - 20 %     22
20.1 - 25 %     25
25.1 - 30 %     66
30.1 - 35 %     52
35.1 - 40 %    101
40.1 - 45 %     75
45.1 - 50 %   152
50.1 - 55 %   119
55.1 - 60 %   122
60.1 - 65 %     92
65.1 - 70 %     90
70.1 - 75 %     47
75.1 - 80 %     25
80.1 - 85 %     10
85.1 - 90 %       2

DeanL

• ... winning ... Quick calculation based on DeanL s stuff here. If a team has a 10 game winning streak in an 82 game season (number of games in a season
Message 5 of 12 , Jan 14, 2002
• 0 Attachment
--- In APBR_analysis@y..., "Dean LaVergne" <deanlav@y...> wrote:
>
> 1. Someone find out the historical distribution of winning
> percentages in NBA history. What's the probability of a team
winning
>
> 10 - 15 % 5
> 15.1 - 20 % 22
> 20.1 - 25 % 25
> 25.1 - 30 % 66
> 30.1 - 35 % 52
> 35.1 - 40 % 101
> 40.1 - 45 % 75
> 45.1 - 50 % 152
> 50.1 - 55 % 119
> 55.1 - 60 % 122
> 60.1 - 65 % 92
> 65.1 - 70 % 90
> 70.1 - 75 % 47
> 75.1 - 80 % 25
> 80.1 - 85 % 10
> 85.1 - 90 % 2

Quick calculation based on DeanL's stuff here. If a team has a 10
game winning streak in an 82 game season (number of games in a season
matters), then these are the probabilities of the team's winning
record at the end of the season:

Range Equiv% P(win% | 10 g win streak 82 g)
10    - 15 % 0.125 0%
15.1 - 20 %  0.175 0%
20.1 - 25 %  0.225 0%
25.1 - 30 %  0.275 0%
30.1 - 35 %  0.325 0%
35.1 - 40 %  0.375 0%
40.1 - 45 %  0.425 0%
45.1 - 50 %  0.475 0%
50.1 - 55 %  0.525 1%
55.1 - 60 %  0.575 3%
60.1 - 65 %  0.625 7%
65.1 - 70 %  0.675 12%
70.1 - 75 %  0.725 16%
75.1 - 80 %  0.775 18%
80.1 - 85 %  0.825 20%
85.1 - 90 %  0.875 21%

This is unlikely to be an under 0.500 team. Its expected winning
percentage is 0.762, or a 62 win ballclub.

It will take me longer to make the other tables (any requests?) or
make the spreadsheet make sense to others....

Dean Oliver
• ... Sorry. The above # s are wrong (that s why I said Quick ). I need to do some QC. I think the following are right. Range P(win% | 10 g win streak in 82
Message 6 of 12 , Jan 14, 2002
• 0 Attachment
--- In APBR_analysis@y..., "HoopStudies" <deano@r...> wrote:
>
> Range Equiv% P(win% | 10 g win streak 82 g)
> 10    - 15 % 0.125 0%
> 15.1 - 20 %  0.175 0%
> 20.1 - 25 %  0.225 0%
> 25.1 - 30 %  0.275 0%
> 30.1 - 35 %  0.325 0%
> 35.1 - 40 %  0.375 0%
> 40.1 - 45 %  0.425 0%
> 45.1 - 50 %  0.475 0%
> 50.1 - 55 %  0.525 1%
> 55.1 - 60 %  0.575 3%
> 60.1 - 65 %  0.625 7%
> 65.1 - 70 %  0.675 12%
> 70.1 - 75 %  0.725 16%
> 75.1 - 80 %  0.775 18%
> 80.1 - 85 %  0.825 20%
> 85.1 - 90 %  0.875 21%
>
> This is unlikely to be an under 0.500 team. Its expected winning
> percentage is 0.762, or a 62 win ballclub.
>

Sorry. The above #'s are wrong (that's why I said "Quick"). I need
to do some QC. I think the following are right.

Range P(win% | 10 g win streak in 82 g)
10   - 15 % 0%
15.1 - 20 %  0%
20.1 - 25 %  0%
25.1 - 30 %  0%
30.1 - 35 %  0%
35.1 - 40 %  0%
40.1 - 45 %  0%
45.1 - 50 %  3%
50.1 - 55 %  5%
55.1 - 60 %  12%
60.1 - 65 %  18%
65.1 - 70 %  28%
70.1 - 75 %  18%
75.1 - 80 %  10%
80.1 - 85 %  4%
85.1 - 90 %  1%

Expected win% = 0.666 (or 55-27).

We should expect that a winning streak of 5 games won't tell us much,
i.e., that the expected winning % is closer to 0.500 and the

Dean Oliver
• Hmmm, here are the implications of a 5-game winning streak: Range P(win% | 5 g win streak 82 g) 10    - 15 % 0% 15.1 - 20 %  0% 20.1 - 25 %  0% 25.1 - 30
Message 7 of 12 , Jan 14, 2002
• 0 Attachment
Hmmm, here are the implications of a 5-game winning streak:

Range P(win% | 5 g win streak 82 g)
10    - 15 % 0%
15.1 - 20 %  0%
20.1 - 25 %  0%
25.1 - 30 %  1%
30.1 - 35 %  2%
35.1 - 40 %  6%
40.1 - 45 %  7%
45.1 - 50 %  17%
50.1 - 55 %  15%
55.1 - 60 %  16%
60.1 - 65 %  12%
65.1 - 70 %  12%
70.1 - 75 %  6%
75.1 - 80 %  3%
80.1 - 85 %  1%
85.1 - 90 %  0%

Expected win% = 0.559 (46-36)

Here it is for a 15-game win streak:

Range P(win% | 15 g win streak 82 g)
10    - 15 % 0%
15.1 - 20 %  0%
20.1 - 25 %  0%
25.1 - 30 %  0%
30.1 - 35 %  0%
35.1 - 40 %  0%
40.1 - 45 %  0%
45.1 - 50 %  0%
50.1 - 55 %  1%
55.1 - 60 %  3%
60.1 - 65 %  7%
65.1 - 70 %  21%
70.1 - 75 %  27%
75.1 - 80 %  26%
80.1 - 85 %  13%
85.1 - 90 %  3%

Expected win% = 0.732 (60-22)

I think that, then, a 15-game losing streak flips all of these
around. So Houston's 15-game losing streak likely projects to a 22
win season. I frankly think they won't be that bad because they have
Steve Francis back. The assumption made in all of these calculations
is that the winning% is a constant thing. Basically, if the Rockets
win >40% of their games, that will mean that random chance is
unlikely to be involved, which we know....

Dean Oliver
• ... [...] ... Something did look suspicious about those numbers -- the way the probabilities monotonically increased for the better win-loss records.
Message 8 of 12 , Jan 14, 2002
• 0 Attachment
On Tue, 15 Jan 2002, HoopStudies wrote:

> --- In APBR_analysis@y..., "HoopStudies" <deano@r...> wrote:

[...]

> Sorry. The above #'s are wrong (that's why I said "Quick"). I need
> to do some QC. I think the following are right.

Something did look suspicious about those numbers -- the way the
probabilities monotonically increased for the better win-loss records.

Impressively fast work by Dean L and Dean O to get the empirical
parameters collected and the program running.

[...]

> 60.1 - 65 %� 18%
> 65.1 - 70 %� 28%
> 70.1 - 75 %� 18%

> Expected win% = 0.666 (or 55-27).

Sounds plausible. I'm assuming that you got the expected win% by finding
the expected value over all (non-zero probability) outcomes? The thing
that I notice is that the win-loss percent with the highest probability of
occuring also seems to be around 67% -- I wonder if this is a case where
we can use what statisticians call the Principle of Maximum Likelihood
and, rather than calculating the expected value, simply find the win-loss
percent with the highest likelihood. And use that as the best estimate of
the team's Won-Loss percentage.

> We should expect that a winning streak of 5 games won't tell us much,
> i.e., that the expected winning % is closer to 0.500 and the
> distribution will be more spread.

I haven't seen your spreadsheet which does the calculations; off the top
of my head I would expect that a team with a max win streak of 6 games
would be the most likely to be a .500 team. Because a .500 team has a
1/64 chance of winning all 6 of any set of 6 games. With 77 chances to
start such a 6-game winning streak during the season, chances seem good
that a .500 team will indeed achieve such a streak on average.

--MKT
• ... [...] These numbers look funny: the probabilities don t show the nice rising then falling pattern, and they only add up to 98%. Which could be rounding
Message 9 of 12 , Jan 14, 2002
• 0 Attachment
On Tue, 15 Jan 2002, HoopStudies wrote:

>
> Hmmm, here are the implications of a 5-game winning streak:
>
> Range P(win% | 5 g win streak 82 g)
> 10��� - 15 % 0%
> 15.1 - 20 %� 0%
> 20.1 - 25 %� 0%
> 25.1 - 30 %� 1%
> 30.1 - 35 %� 2%
> 35.1 - 40 %� 6%
> 40.1 - 45 %� 7%
> 45.1 - 50 %� 17%
> 50.1 - 55 %� 15%
> 55.1 - 60 %� 16%
> 60.1 - 65 %� 12%
> 65.1 - 70 %� 12%
> 70.1 - 75 %� 6%
> 75.1 - 80 %� 3%
> 80.1 - 85 %� 1%
> 85.1 - 90 %� 0%
>
> Expected win% = 0.559 (46-36)

[...]

These numbers look funny: the probabilities don't show the nice rising
then falling pattern, and they only add up to 98%. Which could be
rounding error but could be a typo; should the 50.1-55% probability be 17%
instead of 15%? The 55.9% expected win% looks a bit high too.

--MKT
• ... rising ... be 17% ... The reason they show spikes is because DeanL s numbers for the prior are spiky (notice the decrease in probability in the 50-55% grp)
Message 10 of 12 , Jan 14, 2002
• 0 Attachment
>
> > Range P(win% | 5 g win streak 82 g)
> > 10    - 15 % 0%
> > 15.1 - 20 %  0%
> > 20.1 - 25 %  0%
> > 25.1 - 30 %  1%
> > 30.1 - 35 %  2%
> > 35.1 - 40 %  6%
> > 40.1 - 45 %  7%
> > 45.1 - 50 %  17%
> > 50.1 - 55 %  15%
> > 55.1 - 60 %  16%
> > 60.1 - 65 %  12%
> > 65.1 - 70 %  12%
> > 70.1 - 75 %  6%
> > 75.1 - 80 %  3%
> > 80.1 - 85 %  1%
> > 85.1 - 90 %  0%
> >
> > Expected win% = 0.559 (46-36)
>
> [...]
>
> These numbers look funny: the probabilities don't show the nice
rising
> then falling pattern, and they only add up to 98%. Which could be
> rounding error but could be a typo; should the 50.1-55% probability
be 17%
> instead of 15%? The 55.9% expected win% looks a bit high too.

The reason they show spikes is because DeanL's numbers for the prior
are spiky (notice the decrease in probability in the 50-55% grp) and
I didn't try to smooth them. If I did smooth the prior (which I've
thought about), the above numbers wouldn't show the spikes. The
numbers don't add up because of rounding (though I'll double check
later). The win% makes sense to me. If a win streak of 1G leads to
an expected win% of just over 0.500 (the average of the prior), every
longer streak goes a little higher. This doesn't seem out of norm.

I will _try_ to get this in sendable form Tuesday, but I'm already
looking at tomorrow's schedule thinking it's unlikely.

DeanO
• On Tue, 15 Jan 2002, HoopStudies wrote: [...] ... Ah, the disadvantage of using purely empirical numbers. They re numbers from the real world, but such
Message 11 of 12 , Jan 16, 2002
• 0 Attachment
On Tue, 15 Jan 2002, HoopStudies wrote:

[...]

> > These numbers look funny: the probabilities don't show the nice
> rising
> > then falling pattern, and they only add up to 98%. Which could be
> > rounding error but could be a typo; should the 50.1-55% probability
> be 17%
> > instead of 15%? The 55.9% expected win% looks a bit high too.
>
> The reason they show spikes is because DeanL's numbers for the prior
> are spiky (notice the decrease in probability in the 50-55% grp) and
> I didn't try to smooth them. If I did smooth the prior (which I've
> thought about), the above numbers wouldn't show the spikes. The

Ah, the disadvantage of using purely empirical numbers. They're numbers
from the real world, but such numbers are always contaminated with random
errors, hence spikes. The theoretical numbers would almost certainly show
a smooth pattern. If our theories are good enough (I don't know if they
are in this case) then the best result is usually obtained by combining
theory and data: start with the raw data but then smooth it in accordance
with theory.

But if we don't have a good theoretical reason for smoothing (maybe the
real NBA percentages really are supposed to show a spike) then we
shouldn't.

I'm not sure which case this falls into. Can we think of a good
theoretical reason why there should or should not be that spike in the win
percentages? If we chalk it up just to plain old random chance, then
that's saying that random errors are messing up the data and smoothing
should be done.

> numbers don't add up because of rounding (though I'll double check
> later). The win% makes sense to me. If a win streak of 1G leads to
> an expected win% of just over 0.500 (the average of the prior), every
> longer streak goes a little higher. This doesn't seem out of norm.

We may be using different definitions of win streaks. I'm thinking
that if I'm told that a team had a 15 game winning streak, that means
that that was the LONGEST winning streak that the team achieved. And if,
over an 82-game season, I am told that a team's longest winning streak was
1 game, then I'd expect that that was a very very bad team, not .500 one.
Even the Bulls this year have had a 2-game winning streak (Dec 29 and Dec
31).

I suppose you might be using a definition of a win streak something like
this: if we're told that a team had a 15 game winning streak, then we
know that it had at least one streak of at least 15 games. That seems to
me to be lead to harder probability calculations. If nothing else, the
number contains less information now. That 15G team could for all we know
also have had a 33 game winning streak that same season. Whereas if we
know that 15G was their longest winning streak, we've got a much better
idea of what sort of team it's likely to be.

--MKT
• ... numbers ... random ... certainly show ... they ... combining ... accordance ... the ... No, I think smoothing is OK and that the spiking is random. I just
Message 12 of 12 , Jan 16, 2002
• 0 Attachment
> Ah, the disadvantage of using purely empirical numbers. They're
numbers
> from the real world, but such numbers are always contaminated with
random
> errors, hence spikes. The theoretical numbers would almost
certainly show
> a smooth pattern. If our theories are good enough (I don't know if
they
> are in this case) then the best result is usually obtained by
combining
> theory and data: start with the raw data but then smooth it in
accordance
> with theory.
>
> But if we don't have a good theoretical reason for smoothing (maybe
the
> real NBA percentages really are supposed to show a spike) then we
> shouldn't.
>

No, I think smoothing is OK and that the spiking is random. I just
haven't done the smoothing. One way we could get a sense for whether
the spiking is random is to have DeanL generate the curve for a
different set of bins and see if the spikes move. That would also
give us a fair way to smooth, rather than my arbitrary hand.

> > numbers don't add up because of rounding (though I'll double
check
> > later). The win% makes sense to me. If a win streak of 1G leads
to
> > an expected win% of just over 0.500 (the average of the prior),
every
> > longer streak goes a little higher. This doesn't seem out of
norm.
>
> We may be using different definitions of win streaks. I'm thinking
> that if I'm told that a team had a 15 game winning streak, that
means
> that that was the LONGEST winning streak that the team achieved.
And if,
> over an 82-game season, I am told that a team's longest winning
streak was
> 1 game, then I'd expect that that was a very very bad team,
not .500 one.
> Even the Bulls this year have had a 2-game winning streak (Dec 29
and Dec
> 31).
>

We're going at the same thing, but the method has to be used
different ways to get at the answers we're looking for. I wouldn't
use the method to test what a 1G winning streak means. I would look
at the team's longest losing streak and plug that in. If a team's
longest winning streak is 5 G and its longest losing streak is 5G,
that generally brackets its likely win% pretty well. If a team's
longest winning streak is 5G, but its longest losing streak is 10G, I
use the 10G streak to ascertain that the team is likely a 27 win
team. But I'd also assume that they are independent events and
multiply probabilities together. That would suggest that the most
likely range of winning %'s is 35-40%, which is a little better than
a 27 win season. (This assumes that the tables I gave you can be
flipped to work with losing streaks, which, strictly, they cannot. I
would strictly have to smooth the distribution so that its
symmetric. I'm lazy.)

>
> I suppose you might be using a definition of a win streak something
like
> this: if we're told that a team had a 15 game winning streak, then
we
> know that it had at least one streak of at least 15 games. That
seems to
> me to be lead to harder probability calculations. If nothing else,
the
> number contains less information now. That 15G team could for all
we know
> also have had a 33 game winning streak that same season. Whereas
if we
> know that 15G was their longest winning streak, we've got a much
better
> idea of what sort of team it's likely to be.

The longest streak has the most info.

DeanO
Your message has been successfully submitted and would be delivered to recipients shortly.