Loading ...
Sorry, an error occurred while loading the content.
 

Winning streak / team strength question

Expand Messages
  • dlirag
    If one were told that a certain NBA team had an X-game winning streak where X is the length of that winning streak, would one have enough information to make a
    Message 1 of 12 , Jan 7, 2002
      If one were told that a certain NBA team had an X-game winning streak
      where X is the length of that winning streak, would one have enough
      information to make a reasonable calculation of that team's minimum
      strength? For instance, if I learned that the Spurs had a 15-game
      winning streak in one season, I would tend to infer that they had 60+
      wins in that season. How do I arrive at a precise figure, if possible?
    • Michael K. Tamada
      I took a stab at this but didn t have time to come close to getting the correct formula. There s one key part of the inference which I think requires some
      Message 2 of 12 , Jan 14, 2002
        I took a stab at this but didn't have time to come close to getting the
        correct formula. There's one key part of the inference which I think
        requires some assumptions or actual data. Anyway, there's about three
        steps (I couldn't even finish step 1):

        1. For a team that plays 82 games, wins W of them (for simplicity, let's
        assume that it has an equal chance of winning each game, p = W/82,
        although realistically a team will have a higher probability of beating
        the 2002 Bulls than the 2002 Lakers -- but how would one incorporate THAT
        into the calculation, without knowing every single team's schedule?).
        Anyway, for any such team with W wins, what is the probability that that
        team will win at most X games in a row at some
        point during the season. For X=2 this is fairly easy to calculate...
        extremely easy for a 2 game season, not bad for a 3 game season, a little
        worse for a 4 game season but you can see a pattern that develops. So
        there's the formula for X=2, once you take it out to 82 games.

        But I haven't even begun to figure it our for X=3, X=4, or arbitrary X.

        All of that is the solid, deductive part. But then there's step 2.


        2. With the formula above, you can calculate for any team with W wins,
        what is the probability that they at some point won at most X games in a
        row (I assume in your question about a team with a 15-game winning streak,
        that that was the team's LONGEST winning streak. Obviously if a team won
        15 games in a row, and later in the season won 33 games in a row, the
        likely win total is going to be very different.)

        Unfortunately, this is not the probability that you need. What you're
        really asking for is the opposite: for a team that won X games in a row,
        what is the most likely W associated with that team?

        And those are very different probabilities. These are called conditional
        probabilities, usually denoted p(W|X) and p(X|W). Another example:
        what's the probability that a crack cocaine addict started out by first
        smoking marijuana? Very high. But before we start talking about how
        marijuana puts you on the fast track to crack h*ll, we need to know the
        truly relevant probability: what percent of marijuana smokers later go on
        to become crack addicts?

        [None of the above should be construed as an endorsement of either
        activity.]

        So all that work in Step 1, though important, is giving us the "wrong"
        probability, so to speak it gave us p(X|W) instead of p(W|X).


        How do we find the most likely W? It's an example of Bayes Rule, for
        those of you who have studied probability. Here's a key example: start
        by looking at the "wrong" probabilities. A team with W=82 wins will have
        at its longest winning streak X=82. So clearly it is no candidate, ditto
        a team with W=81 wins, etc., down to W=78. A team with W=77 probably has
        some really lengthy winning streaks, but it does have a chance (a small
        one) that its longest winning streak was only X=15, if its losses were
        distributed just right.

        Etc. etc. A team with W=15 of course has almost no chance of having an
        X=15 streak. A team with W=14 has, like the W=78 team, literally no
        chance of having X=15 be its longest winning streak.

        Maybe we find that teams with W=55 have the highest probability of
        having an X=15 win streak. Remember however that this is not the
        conditional probability that we need!

        Because an important point is that whereas teams with say W=53 wins
        may have a lower probability of winning X=15 games in a row, it is also
        true that W=53 teams are more common (have a higher probability of
        occuring) than W=55 teams. And that larger probability of occurence can
        override the fact that they may have a harder time (lower probability) of
        achieving X=15.

        That tradeoff, between teams such as W=55 with high "wrong"probabilities
        but low probabilities of existing at all, vs teams such as W=53, is what
        Bayes Rule allows us to precisely measure.

        But you may see the problem: what IS the probability of a team achieving
        W=53? W=55? W=72?

        That brings us to step 3.

        3. You've either got to make assumptions about the likelihoods of the
        various W levels; or make assumptions about the distribution of teams'
        underlying probabilities of winning p, and deduce the resulting W's we'd
        observe; or use direct observation on actual NBA data and see how often
        W=53, W=55, etc occured.

        The last has obvious advantages and disadvantages. The advantage is that
        it's real data, not theoretical calculations. The disadvantage is that it
        doesn't give you the real probabilities, it only gives you estimates of
        the probabilities. E.g., if you literally rely on the data, the
        probability of an NBA team winning exactly W=71 games is 0 -- it's never
        been done before. Realistically there is SOME probability of it occuring;
        it's just that that probability is so low we've never seen it happen.

        Anyway, by hook or by crook you need to come up with estimates of how
        likely it is to observe probabilities of W=55, W=69, etc.

        One you've got those, you combine them with the probabilities of the
        various X streaks from step 1, apply Bayes' Rule, and you've got your
        answer.

        Here's a highly simplified example: suppose there are only two kinds of W
        teams: team that win W=55 games and teams that win W=41 games.
        Assume that teams that win W=55 games have a probability p=.1 of winning
        at most X=15 games in a row. In other words, p(X=15|W=55) = .1

        In contrast, the W=41 teams have only half the chance of achieving X=15:
        p(X=15|W=41) = .05.

        But suppose that we know, or assume, or find out, that W=41 teams are
        three times more common than W=55 teams: p(W=41) = .75
        whereas p(W=55) = .25.

        Now suppose we observe that a team has X=15. Now we want to know: what
        is the probability that it's a W=41 team, and what is the probability that
        it's a W=55 team? Bayes Rule says that

        p(W=41|X=15) =

        p(X=15|W=41)*p(W=41) / ( p(X=15|W=41)*p(W=41) + p(X=15|W=55)*p(W=55) ) =

        .05*.75 / .05*.75 + .1*.25 =

        .0375 / (.0375+.0250) = .6


        So even though 55-win teams are much more likely to win 15 games in a row
        than 41-win teams, the fact that 41-win teams are so much more common make
        it more likely that the X=15 team you observed was in fact a 41-win team:
        p = .6.


        Anyway, THAT'S how you'd find a team's probability of having any given W
        value, based on observing their longest winning streak X. Easy? No.
        Do-able? Yes I think, by using the Three Steps.


        It occurs to me that given that you'd almost certainly have to look at
        actual NBA data to estimate the probability of W=41m W=69, etc. you might
        as well look up each team's longest winning streak while you're at it.
        And the resulting data set would show you all the W's and all the X's, and
        you could make good estimates straight from that data.



        --MKT



        On Mon, 7 Jan 2002, dlirag wrote:

        > If one were told that a certain NBA team had an X-game winning streak
        > where X is the length of that winning streak, would one have enough
        > information to make a reasonable calculation of that team's minimum
        > strength? For instance, if I learned that the Spurs had a 15-game
        > winning streak in one season, I would tend to infer that they had 60+
        > wins in that season. How do I arrive at a precise figure, if possible?
        >
        >
        >
        > To unsubscribe from this group, send an email to:
        > APBR_analysis-unsubscribe@yahoogroups.com
        >
        >
        >
        > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
        >
        >
      • HoopStudies
        ... I didn t fully read MikeT s explanation, but I saw Bayes and I saw a formula that looked almost exactly like one I worked on during my last plane flight,
        Message 3 of 12 , Jan 14, 2002
          --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:

          I didn't fully read MikeT's explanation, but I saw Bayes and I saw a
          formula that looked almost exactly like one I worked on during my
          last plane flight, so I think he's right. Here's what we need.

          1. Someone find out the historical distribution of winning
          percentages in NBA history. What's the probability of a team winning

          10-15% (0.100-0.150)
          15.1-20%
          20.1-25%
          ...
          85.1-90% (0.851-0.900)

          That will give us the prior distribution we need. It should be
          clustered around the 0.500 record

          2. I have an excel sheet that calculates the odds of a team with a
          given winning percentage having at least 1 streak of X games. I will
          look into reconstructing it with the prior distribution from step 1
          so that it will automatically calculate the P(win% | streak). If I
          can't, I'll generate a rough table.

          Someone do step 1 for me. I'll try to find time to work on my part.

          Dean Oliver


          > I took a stab at this but didn't have time to come close to
          getting the
          > correct formula. There's one key part of the inference which I
          think
          > requires some assumptions or actual data. Anyway, there's about
          three
          > steps (I couldn't even finish step 1):
          >
          > 1. For a team that plays 82 games, wins W of them (for simplicity,
          let's
          > assume that it has an equal chance of winning each game, p = W/82,
          > although realistically a team will have a higher probability of
          beating
          > the 2002 Bulls than the 2002 Lakers -- but how would one
          incorporate THAT
          > into the calculation, without knowing every single team's
          schedule?).
          > Anyway, for any such team with W wins, what is the probability that
          that
          > team will win at most X games in a row at some
          > point during the season. For X=2 this is fairly easy to
          calculate...
          > extremely easy for a 2 game season, not bad for a 3 game season, a
          little
          > worse for a 4 game season but you can see a pattern that develops.
          So
          > there's the formula for X=2, once you take it out to 82 games.
          >
          > But I haven't even begun to figure it our for X=3, X=4, or
          arbitrary X.
          >
          > All of that is the solid, deductive part. But then there's step 2.
          >
          >
          > 2. With the formula above, you can calculate for any team with W
          wins,
          > what is the probability that they at some point won at most X games
          in a
          > row (I assume in your question about a team with a 15-game winning
          streak,
          > that that was the team's LONGEST winning streak. Obviously if a
          team won
          > 15 games in a row, and later in the season won 33 games in a row,
          the
          > likely win total is going to be very different.)
          >
          > Unfortunately, this is not the probability that you need. What
          you're
          > really asking for is the opposite: for a team that won X games in
          a row,
          > what is the most likely W associated with that team?
          >
          > And those are very different probabilities. These are called
          conditional
          > probabilities, usually denoted p(W|X) and p(X|W). Another example:
          > what's the probability that a crack cocaine addict started out by
          first
          > smoking marijuana? Very high. But before we start talking about
          how
          > marijuana puts you on the fast track to crack h*ll, we need to know
          the
          > truly relevant probability: what percent of marijuana smokers
          later go on
          > to become crack addicts?
          >
          > [None of the above should be construed as an endorsement of either
          > activity.]
          >
          > So all that work in Step 1, though important, is giving us
          the "wrong"
          > probability, so to speak it gave us p(X|W) instead of p(W|X).
          >
          >
          > How do we find the most likely W? It's an example of Bayes Rule,
          for
          > those of you who have studied probability. Here's a key example:
          start
          > by looking at the "wrong" probabilities. A team with W=82 wins
          will have
          > at its longest winning streak X=82. So clearly it is no candidate,
          ditto
          > a team with W=81 wins, etc., down to W=78. A team with W=77
          probably has
          > some really lengthy winning streaks, but it does have a chance (a
          small
          > one) that its longest winning streak was only X=15, if its losses
          were
          > distributed just right.
          >
          > Etc. etc. A team with W=15 of course has almost no chance of
          having an
          > X=15 streak. A team with W=14 has, like the W=78 team, literally no
          > chance of having X=15 be its longest winning streak.
          >
          > Maybe we find that teams with W=55 have the highest probability of
          > having an X=15 win streak. Remember however that this is not the
          > conditional probability that we need!
          >
          > Because an important point is that whereas teams with say W=53 wins
          > may have a lower probability of winning X=15 games in a row, it is
          also
          > true that W=53 teams are more common (have a higher probability of
          > occuring) than W=55 teams. And that larger probability of
          occurence can
          > override the fact that they may have a harder time (lower
          probability) of
          > achieving X=15.
          >
          > That tradeoff, between teams such as W=55 with
          high "wrong"probabilities
          > but low probabilities of existing at all, vs teams such as W=53, is
          what
          > Bayes Rule allows us to precisely measure.
          >
          > But you may see the problem: what IS the probability of a team
          achieving
          > W=53? W=55? W=72?
          >
          > That brings us to step 3.
          >
          > 3. You've either got to make assumptions about the likelihoods of
          the
          > various W levels; or make assumptions about the distribution of
          teams'
          > underlying probabilities of winning p, and deduce the resulting W's
          we'd
          > observe; or use direct observation on actual NBA data and see how
          often
          > W=53, W=55, etc occured.
          >
          > The last has obvious advantages and disadvantages. The advantage
          is that
          > it's real data, not theoretical calculations. The disadvantage is
          that it
          > doesn't give you the real probabilities, it only gives you
          estimates of
          > the probabilities. E.g., if you literally rely on the data, the
          > probability of an NBA team winning exactly W=71 games is 0 -- it's
          never
          > been done before. Realistically there is SOME probability of it
          occuring;
          > it's just that that probability is so low we've never seen it
          happen.
          >
          > Anyway, by hook or by crook you need to come up with estimates of
          how
          > likely it is to observe probabilities of W=55, W=69, etc.
          >
          > One you've got those, you combine them with the probabilities of the
          > various X streaks from step 1, apply Bayes' Rule, and you've got
          your
          > answer.
          >
          > Here's a highly simplified example: suppose there are only two
          kinds of W
          > teams: team that win W=55 games and teams that win W=41 games.
          > Assume that teams that win W=55 games have a probability p=.1 of
          winning
          > at most X=15 games in a row. In other words, p(X=15|W=55) = .1
          >
          > In contrast, the W=41 teams have only half the chance of achieving
          X=15:
          > p(X=15|W=41) = .05.
          >
          > But suppose that we know, or assume, or find out, that W=41 teams
          are
          > three times more common than W=55 teams: p(W=41) = .75
          > whereas p(W=55) = .25.
          >
          > Now suppose we observe that a team has X=15. Now we want to know:
          what
          > is the probability that it's a W=41 team, and what is the
          probability that
          > it's a W=55 team? Bayes Rule says that
          >
          > p(W=41|X=15) =
          >
          > p(X=15|W=41)*p(W=41) / ( p(X=15|W=41)*p(W=41) + p(X=15|W=55)*p
          (W=55) ) =
          >
          > .05*.75 / .05*.75
          + .1*.25 =
          >
          > .0375 / (.0375+.0250) = .6
          >
          >
          > So even though 55-win teams are much more likely to win 15 games in
          a row
          > than 41-win teams, the fact that 41-win teams are so much more
          common make
          > it more likely that the X=15 team you observed was in fact a 41-win
          team:
          > p = .6.
          >
          >
          > Anyway, THAT'S how you'd find a team's probability of having any
          given W
          > value, based on observing their longest winning streak X. Easy?
          No.
          > Do-able? Yes I think, by using the Three Steps.
          >
          >
          > It occurs to me that given that you'd almost certainly have to look
          at
          > actual NBA data to estimate the probability of W=41m W=69, etc. you
          might
          > as well look up each team's longest winning streak while you're at
          it.
          > And the resulting data set would show you all the W's and all the
          X's, and
          > you could make good estimates straight from that data.
          >
          >
          >
          > --MKT
          >
          >
          >
          > On Mon, 7 Jan 2002, dlirag wrote:
          >
          > > If one were told that a certain NBA team had an X-game winning
          streak
          > > where X is the length of that winning streak, would one have
          enough
          > > information to make a reasonable calculation of that team's
          minimum
          > > strength? For instance, if I learned that the Spurs had a 15-game
          > > winning streak in one season, I would tend to infer that they had
          60+
          > > wins in that season. How do I arrive at a precise figure, if
          possible?
          > >
          > >
          > >
          > > To unsubscribe from this group, send an email to:
          > > APBR_analysis-unsubscribe@y...
          > >
          > >
          > >
          > > Your use of Yahoo! Groups is subject to
          http://docs.yahoo.com/info/terms/
          > >
          > >
        • Dean LaVergne
          1. Someone find out the historical distribution of winning percentages in NBA history. What s the probability of a team winning 10 - 15 % 5 15.1 - 20
          Message 4 of 12 , Jan 14, 2002

            1.  Someone find out the historical distribution of winning
            percentages in NBA history.  What's the probability of a team winning 

            10    - 15 %      5
            15.1 - 20 %     22
            20.1 - 25 %     25
            25.1 - 30 %     66
            30.1 - 35 %     52
            35.1 - 40 %    101
            40.1 - 45 %     75
            45.1 - 50 %   152
            50.1 - 55 %   119
            55.1 - 60 %   122
            60.1 - 65 %     92
            65.1 - 70 %     90
            70.1 - 75 %     47
            75.1 - 80 %     25
            80.1 - 85 %     10
            85.1 - 90 %       2
             
            DeanL
             
          • HoopStudies
            ... winning ... Quick calculation based on DeanL s stuff here. If a team has a 10 game winning streak in an 82 game season (number of games in a season
            Message 5 of 12 , Jan 14, 2002
              --- In APBR_analysis@y..., "Dean LaVergne" <deanlav@y...> wrote:
              >
              > 1. Someone find out the historical distribution of winning
              > percentages in NBA history. What's the probability of a team
              winning
              >
              > 10 - 15 % 5
              > 15.1 - 20 % 22
              > 20.1 - 25 % 25
              > 25.1 - 30 % 66
              > 30.1 - 35 % 52
              > 35.1 - 40 % 101
              > 40.1 - 45 % 75
              > 45.1 - 50 % 152
              > 50.1 - 55 % 119
              > 55.1 - 60 % 122
              > 60.1 - 65 % 92
              > 65.1 - 70 % 90
              > 70.1 - 75 % 47
              > 75.1 - 80 % 25
              > 80.1 - 85 % 10
              > 85.1 - 90 % 2

              Quick calculation based on DeanL's stuff here. If a team has a 10
              game winning streak in an 82 game season (number of games in a season
              matters), then these are the probabilities of the team's winning
              record at the end of the season:

              Range Equiv% P(win% | 10 g win streak 82 g)
              10    - 15 % 0.125 0%
              15.1 - 20 %  0.175 0%
              20.1 - 25 %  0.225 0%
              25.1 - 30 %  0.275 0%
              30.1 - 35 %  0.325 0%
              35.1 - 40 %  0.375 0%
              40.1 - 45 %  0.425 0%
              45.1 - 50 %  0.475 0%
              50.1 - 55 %  0.525 1%
              55.1 - 60 %  0.575 3%
              60.1 - 65 %  0.625 7%
              65.1 - 70 %  0.675 12%
              70.1 - 75 %  0.725 16%
              75.1 - 80 %  0.775 18%
              80.1 - 85 %  0.825 20%
              85.1 - 90 %  0.875 21%

              This is unlikely to be an under 0.500 team. Its expected winning
              percentage is 0.762, or a 62 win ballclub.

              It will take me longer to make the other tables (any requests?) or
              make the spreadsheet make sense to others....

              Dean Oliver
              Journal of Basketball Studies
            • HoopStudies
              ... Sorry. The above # s are wrong (that s why I said Quick ). I need to do some QC. I think the following are right. Range P(win% | 10 g win streak in 82
              Message 6 of 12 , Jan 14, 2002
                --- In APBR_analysis@y..., "HoopStudies" <deano@r...> wrote:
                >
                > Range Equiv% P(win% | 10 g win streak 82 g)
                > 10    - 15 % 0.125 0%
                > 15.1 - 20 %  0.175 0%
                > 20.1 - 25 %  0.225 0%
                > 25.1 - 30 %  0.275 0%
                > 30.1 - 35 %  0.325 0%
                > 35.1 - 40 %  0.375 0%
                > 40.1 - 45 %  0.425 0%
                > 45.1 - 50 %  0.475 0%
                > 50.1 - 55 %  0.525 1%
                > 55.1 - 60 %  0.575 3%
                > 60.1 - 65 %  0.625 7%
                > 65.1 - 70 %  0.675 12%
                > 70.1 - 75 %  0.725 16%
                > 75.1 - 80 %  0.775 18%
                > 80.1 - 85 %  0.825 20%
                > 85.1 - 90 %  0.875 21%
                >
                > This is unlikely to be an under 0.500 team. Its expected winning
                > percentage is 0.762, or a 62 win ballclub.
                >

                Sorry. The above #'s are wrong (that's why I said "Quick"). I need
                to do some QC. I think the following are right.

                Range P(win% | 10 g win streak in 82 g)
                10   - 15 % 0%
                15.1 - 20 %  0%
                20.1 - 25 %  0%
                25.1 - 30 %  0%
                30.1 - 35 %  0%
                35.1 - 40 %  0%
                40.1 - 45 %  0%
                45.1 - 50 %  3%
                50.1 - 55 %  5%
                55.1 - 60 %  12%
                60.1 - 65 %  18%
                65.1 - 70 %  28%
                70.1 - 75 %  18%
                75.1 - 80 %  10%
                80.1 - 85 %  4%
                85.1 - 90 %  1%

                Expected win% = 0.666 (or 55-27).

                We should expect that a winning streak of 5 games won't tell us much,
                i.e., that the expected winning % is closer to 0.500 and the
                distribution will be more spread.

                Dean Oliver
                Journal of Basketball Studies
              • HoopStudies
                Hmmm, here are the implications of a 5-game winning streak: Range P(win% | 5 g win streak 82 g) 10    - 15 % 0% 15.1 - 20 %  0% 20.1 - 25 %  0% 25.1 - 30
                Message 7 of 12 , Jan 14, 2002
                  Hmmm, here are the implications of a 5-game winning streak:

                  Range P(win% | 5 g win streak 82 g)
                  10    - 15 % 0%
                  15.1 - 20 %  0%
                  20.1 - 25 %  0%
                  25.1 - 30 %  1%
                  30.1 - 35 %  2%
                  35.1 - 40 %  6%
                  40.1 - 45 %  7%
                  45.1 - 50 %  17%
                  50.1 - 55 %  15%
                  55.1 - 60 %  16%
                  60.1 - 65 %  12%
                  65.1 - 70 %  12%
                  70.1 - 75 %  6%
                  75.1 - 80 %  3%
                  80.1 - 85 %  1%
                  85.1 - 90 %  0%

                  Expected win% = 0.559 (46-36)

                  Here it is for a 15-game win streak:

                  Range P(win% | 15 g win streak 82 g)
                  10    - 15 % 0%
                  15.1 - 20 %  0%
                  20.1 - 25 %  0%
                  25.1 - 30 %  0%
                  30.1 - 35 %  0%
                  35.1 - 40 %  0%
                  40.1 - 45 %  0%
                  45.1 - 50 %  0%
                  50.1 - 55 %  1%
                  55.1 - 60 %  3%
                  60.1 - 65 %  7%
                  65.1 - 70 %  21%
                  70.1 - 75 %  27%
                  75.1 - 80 %  26%
                  80.1 - 85 %  13%
                  85.1 - 90 %  3%

                  Expected win% = 0.732 (60-22)

                  I think that, then, a 15-game losing streak flips all of these
                  around. So Houston's 15-game losing streak likely projects to a 22
                  win season. I frankly think they won't be that bad because they have
                  Steve Francis back. The assumption made in all of these calculations
                  is that the winning% is a constant thing. Basically, if the Rockets
                  win >40% of their games, that will mean that random chance is
                  unlikely to be involved, which we know....

                  Dean Oliver
                  Journal of Basketball Studies
                • Michael K. Tamada
                  ... [...] ... Something did look suspicious about those numbers -- the way the probabilities monotonically increased for the better win-loss records.
                  Message 8 of 12 , Jan 14, 2002
                    On Tue, 15 Jan 2002, HoopStudies wrote:

                    > --- In APBR_analysis@y..., "HoopStudies" <deano@r...> wrote:

                    [...]

                    > Sorry. The above #'s are wrong (that's why I said "Quick"). I need
                    > to do some QC. I think the following are right.

                    Something did look suspicious about those numbers -- the way the
                    probabilities monotonically increased for the better win-loss records.

                    Impressively fast work by Dean L and Dean O to get the empirical
                    parameters collected and the program running.

                    [...]


                    > 60.1 - 65 %� 18%
                    > 65.1 - 70 %� 28%
                    > 70.1 - 75 %� 18%

                    > Expected win% = 0.666 (or 55-27).

                    Sounds plausible. I'm assuming that you got the expected win% by finding
                    the expected value over all (non-zero probability) outcomes? The thing
                    that I notice is that the win-loss percent with the highest probability of
                    occuring also seems to be around 67% -- I wonder if this is a case where
                    we can use what statisticians call the Principle of Maximum Likelihood
                    and, rather than calculating the expected value, simply find the win-loss
                    percent with the highest likelihood. And use that as the best estimate of
                    the team's Won-Loss percentage.


                    > We should expect that a winning streak of 5 games won't tell us much,
                    > i.e., that the expected winning % is closer to 0.500 and the
                    > distribution will be more spread.

                    I haven't seen your spreadsheet which does the calculations; off the top
                    of my head I would expect that a team with a max win streak of 6 games
                    would be the most likely to be a .500 team. Because a .500 team has a
                    1/64 chance of winning all 6 of any set of 6 games. With 77 chances to
                    start such a 6-game winning streak during the season, chances seem good
                    that a .500 team will indeed achieve such a streak on average.


                    --MKT
                  • Michael K. Tamada
                    ... [...] These numbers look funny: the probabilities don t show the nice rising then falling pattern, and they only add up to 98%. Which could be rounding
                    Message 9 of 12 , Jan 14, 2002
                      On Tue, 15 Jan 2002, HoopStudies wrote:

                      >
                      > Hmmm, here are the implications of a 5-game winning streak:
                      >
                      > Range P(win% | 5 g win streak 82 g)
                      > 10��� - 15 % 0%
                      > 15.1 - 20 %� 0%
                      > 20.1 - 25 %� 0%
                      > 25.1 - 30 %� 1%
                      > 30.1 - 35 %� 2%
                      > 35.1 - 40 %� 6%
                      > 40.1 - 45 %� 7%
                      > 45.1 - 50 %� 17%
                      > 50.1 - 55 %� 15%
                      > 55.1 - 60 %� 16%
                      > 60.1 - 65 %� 12%
                      > 65.1 - 70 %� 12%
                      > 70.1 - 75 %� 6%
                      > 75.1 - 80 %� 3%
                      > 80.1 - 85 %� 1%
                      > 85.1 - 90 %� 0%
                      >
                      > Expected win% = 0.559 (46-36)

                      [...]

                      These numbers look funny: the probabilities don't show the nice rising
                      then falling pattern, and they only add up to 98%. Which could be
                      rounding error but could be a typo; should the 50.1-55% probability be 17%
                      instead of 15%? The 55.9% expected win% looks a bit high too.


                      --MKT
                    • HoopStudies
                      ... rising ... be 17% ... The reason they show spikes is because DeanL s numbers for the prior are spiky (notice the decrease in probability in the 50-55% grp)
                      Message 10 of 12 , Jan 14, 2002
                        --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
                        >
                        > > Range P(win% | 5 g win streak 82 g)
                        > > 10    - 15 % 0%
                        > > 15.1 - 20 %  0%
                        > > 20.1 - 25 %  0%
                        > > 25.1 - 30 %  1%
                        > > 30.1 - 35 %  2%
                        > > 35.1 - 40 %  6%
                        > > 40.1 - 45 %  7%
                        > > 45.1 - 50 %  17%
                        > > 50.1 - 55 %  15%
                        > > 55.1 - 60 %  16%
                        > > 60.1 - 65 %  12%
                        > > 65.1 - 70 %  12%
                        > > 70.1 - 75 %  6%
                        > > 75.1 - 80 %  3%
                        > > 80.1 - 85 %  1%
                        > > 85.1 - 90 %  0%
                        > >
                        > > Expected win% = 0.559 (46-36)
                        >
                        > [...]
                        >
                        > These numbers look funny: the probabilities don't show the nice
                        rising
                        > then falling pattern, and they only add up to 98%. Which could be
                        > rounding error but could be a typo; should the 50.1-55% probability
                        be 17%
                        > instead of 15%? The 55.9% expected win% looks a bit high too.

                        The reason they show spikes is because DeanL's numbers for the prior
                        are spiky (notice the decrease in probability in the 50-55% grp) and
                        I didn't try to smooth them. If I did smooth the prior (which I've
                        thought about), the above numbers wouldn't show the spikes. The
                        numbers don't add up because of rounding (though I'll double check
                        later). The win% makes sense to me. If a win streak of 1G leads to
                        an expected win% of just over 0.500 (the average of the prior), every
                        longer streak goes a little higher. This doesn't seem out of norm.

                        I will _try_ to get this in sendable form Tuesday, but I'm already
                        looking at tomorrow's schedule thinking it's unlikely.

                        DeanO
                      • Michael K. Tamada
                        On Tue, 15 Jan 2002, HoopStudies wrote: [...] ... Ah, the disadvantage of using purely empirical numbers. They re numbers from the real world, but such
                        Message 11 of 12 , Jan 16, 2002
                          On Tue, 15 Jan 2002, HoopStudies wrote:

                          [...]

                          > > These numbers look funny: the probabilities don't show the nice
                          > rising
                          > > then falling pattern, and they only add up to 98%. Which could be
                          > > rounding error but could be a typo; should the 50.1-55% probability
                          > be 17%
                          > > instead of 15%? The 55.9% expected win% looks a bit high too.
                          >
                          > The reason they show spikes is because DeanL's numbers for the prior
                          > are spiky (notice the decrease in probability in the 50-55% grp) and
                          > I didn't try to smooth them. If I did smooth the prior (which I've
                          > thought about), the above numbers wouldn't show the spikes. The

                          Ah, the disadvantage of using purely empirical numbers. They're numbers
                          from the real world, but such numbers are always contaminated with random
                          errors, hence spikes. The theoretical numbers would almost certainly show
                          a smooth pattern. If our theories are good enough (I don't know if they
                          are in this case) then the best result is usually obtained by combining
                          theory and data: start with the raw data but then smooth it in accordance
                          with theory.

                          But if we don't have a good theoretical reason for smoothing (maybe the
                          real NBA percentages really are supposed to show a spike) then we
                          shouldn't.

                          I'm not sure which case this falls into. Can we think of a good
                          theoretical reason why there should or should not be that spike in the win
                          percentages? If we chalk it up just to plain old random chance, then
                          that's saying that random errors are messing up the data and smoothing
                          should be done.


                          > numbers don't add up because of rounding (though I'll double check
                          > later). The win% makes sense to me. If a win streak of 1G leads to
                          > an expected win% of just over 0.500 (the average of the prior), every
                          > longer streak goes a little higher. This doesn't seem out of norm.

                          We may be using different definitions of win streaks. I'm thinking
                          that if I'm told that a team had a 15 game winning streak, that means
                          that that was the LONGEST winning streak that the team achieved. And if,
                          over an 82-game season, I am told that a team's longest winning streak was
                          1 game, then I'd expect that that was a very very bad team, not .500 one.
                          Even the Bulls this year have had a 2-game winning streak (Dec 29 and Dec
                          31).


                          I suppose you might be using a definition of a win streak something like
                          this: if we're told that a team had a 15 game winning streak, then we
                          know that it had at least one streak of at least 15 games. That seems to
                          me to be lead to harder probability calculations. If nothing else, the
                          number contains less information now. That 15G team could for all we know
                          also have had a 33 game winning streak that same season. Whereas if we
                          know that 15G was their longest winning streak, we've got a much better
                          idea of what sort of team it's likely to be.



                          --MKT
                        • HoopStudies
                          ... numbers ... random ... certainly show ... they ... combining ... accordance ... the ... No, I think smoothing is OK and that the spiking is random. I just
                          Message 12 of 12 , Jan 16, 2002
                            --- In APBR_analysis@y..., "Michael K. Tamada" <tamada@o...> wrote:
                            > Ah, the disadvantage of using purely empirical numbers. They're
                            numbers
                            > from the real world, but such numbers are always contaminated with
                            random
                            > errors, hence spikes. The theoretical numbers would almost
                            certainly show
                            > a smooth pattern. If our theories are good enough (I don't know if
                            they
                            > are in this case) then the best result is usually obtained by
                            combining
                            > theory and data: start with the raw data but then smooth it in
                            accordance
                            > with theory.
                            >
                            > But if we don't have a good theoretical reason for smoothing (maybe
                            the
                            > real NBA percentages really are supposed to show a spike) then we
                            > shouldn't.
                            >

                            No, I think smoothing is OK and that the spiking is random. I just
                            haven't done the smoothing. One way we could get a sense for whether
                            the spiking is random is to have DeanL generate the curve for a
                            different set of bins and see if the spikes move. That would also
                            give us a fair way to smooth, rather than my arbitrary hand.


                            > > numbers don't add up because of rounding (though I'll double
                            check
                            > > later). The win% makes sense to me. If a win streak of 1G leads
                            to
                            > > an expected win% of just over 0.500 (the average of the prior),
                            every
                            > > longer streak goes a little higher. This doesn't seem out of
                            norm.
                            >
                            > We may be using different definitions of win streaks. I'm thinking
                            > that if I'm told that a team had a 15 game winning streak, that
                            means
                            > that that was the LONGEST winning streak that the team achieved.
                            And if,
                            > over an 82-game season, I am told that a team's longest winning
                            streak was
                            > 1 game, then I'd expect that that was a very very bad team,
                            not .500 one.
                            > Even the Bulls this year have had a 2-game winning streak (Dec 29
                            and Dec
                            > 31).
                            >

                            We're going at the same thing, but the method has to be used
                            different ways to get at the answers we're looking for. I wouldn't
                            use the method to test what a 1G winning streak means. I would look
                            at the team's longest losing streak and plug that in. If a team's
                            longest winning streak is 5 G and its longest losing streak is 5G,
                            that generally brackets its likely win% pretty well. If a team's
                            longest winning streak is 5G, but its longest losing streak is 10G, I
                            use the 10G streak to ascertain that the team is likely a 27 win
                            team. But I'd also assume that they are independent events and
                            multiply probabilities together. That would suggest that the most
                            likely range of winning %'s is 35-40%, which is a little better than
                            a 27 win season. (This assumes that the tables I gave you can be
                            flipped to work with losing streaks, which, strictly, they cannot. I
                            would strictly have to smooth the distribution so that its
                            symmetric. I'm lazy.)

                            >
                            > I suppose you might be using a definition of a win streak something
                            like
                            > this: if we're told that a team had a 15 game winning streak, then
                            we
                            > know that it had at least one streak of at least 15 games. That
                            seems to
                            > me to be lead to harder probability calculations. If nothing else,
                            the
                            > number contains less information now. That 15G team could for all
                            we know
                            > also have had a 33 game winning streak that same season. Whereas
                            if we
                            > know that 15G was their longest winning streak, we've got a much
                            better
                            > idea of what sort of team it's likely to be.

                            The longest streak has the most info.

                            DeanO
                          Your message has been successfully submitted and would be delivered to recipients shortly.