Loading ...
Sorry, an error occurred while loading the content.

Unbalanced Statistics

Expand Messages
  • Clifford Blau
    I m posting this to both Retrolist and Baseball Databank. Apologies to those who get it twice. I was aware of the problem of unbalanced statistics in distant
    Message 1 of 6 , May 6, 2006
    • 0 Attachment
      I'm posting this to both Retrolist and Baseball Databank. Apologies to
      those who get it twice.

      I was aware of the problem of unbalanced statistics in distant seasons, but
      still I was surprised today. Upon commencing some research into walks, I
      found some large discrepancies between batter and pitcher totals. Taking
      1911 for instance, there is a difference of 76 walks between the AL batting
      and pitching totals. In the NL, for which Retrosheet has most PBP and I
      believe all box scores, there is still a difference of 3. Other statistics
      show discrepancies as well. Retrosheet has 2 more runs allowed by NL
      pitchers than NL teams scored. Baseball-Reference.com, on the other hand,
      has the totals agreeing, with several teams having different totals than
      Retrosheet shows. BB-Ref has the same differences in other categories,
      though. Does this mean that someone has reconciled the differences in the
      runs columns, while leaving the others alone? Are incorrect numbers shown
      by Retrosheet because they are the official ones?

      Is anyone trying to reconcile these other differences? Is this something
      that relatively easy to do now, at least for the seasons Retrosheet has all
      box scores?


      Cliff Blau
      http://mysite.verizon.net/brak2.0
    • dsreyn
      Cliff - See post 2977 for some information on discrepancies in the data. In short, there are quite a few cases where individual totals don t add to the team
      Message 2 of 6 , May 9, 2006
      • 0 Attachment
        Cliff -

        See post 2977 for some information on discrepancies in the data. In
        short, there are quite a few cases where individual totals don't add
        to the team totals, and where batting and pitching totals for a season
        don't balance.

        I've been working on tracking down some of these problems (though I
        haven't done much of this lately). Posts 2930, 2931, 2932, 2933,
        2935, 2936, 2941, 2943, 2944, and 2951 have some proposed data
        corrections (and follow-up discussion).

        Doug

        --- In baseball-databank@yahoogroups.com, Clifford Blau <brak2.0@...>
        wrote:
        >
        > I'm posting this to both Retrolist and Baseball Databank. Apologies to
        > those who get it twice.
        >
        > I was aware of the problem of unbalanced statistics in distant
        seasons, but
        > still I was surprised today. Upon commencing some research into
        walks, I
        > found some large discrepancies between batter and pitcher totals.
        Taking
        > 1911 for instance, there is a difference of 76 walks between the AL
        batting
        > and pitching totals. In the NL, for which Retrosheet has most PBP
        and I
        > believe all box scores, there is still a difference of 3. Other
        statistics
        > show discrepancies as well. Retrosheet has 2 more runs allowed by NL
        > pitchers than NL teams scored. Baseball-Reference.com, on the other
        hand,
        > has the totals agreeing, with several teams having different totals
        than
        > Retrosheet shows. BB-Ref has the same differences in other categories,
        > though. Does this mean that someone has reconciled the differences
        in the
        > runs columns, while leaving the others alone? Are incorrect numbers
        shown
        > by Retrosheet because they are the official ones?
        >
        > Is anyone trying to reconcile these other differences? Is this
        something
        > that relatively easy to do now, at least for the seasons Retrosheet
        has all
        > box scores?
        >
        >
        > Cliff Blau
        > http://mysite.verizon.net/brak2.0
        >
      • cliffordblau
        ... So I see, and one of the earlier posts tells how you ve been going about this. (Not that I really understand what you mean by scripts.) Then my questions
        Message 3 of 6 , May 9, 2006
        • 0 Attachment
          --- In baseball-databank@yahoogroups.com, "dsreyn" <dreynolds@...> wrote:
          >
          > Cliff -
          >
          > See post 2977 for some information on discrepancies in the data. In
          > short, there are quite a few cases where individual totals don't add
          > to the team totals, and where batting and pitching totals for a season
          > don't balance.
          >
          > I've been working on tracking down some of these problems (though I
          > haven't done much of this lately). Posts 2930, 2931, 2932, 2933,
          > 2935, 2936, 2941, 2943, 2944, and 2951 have some proposed data
          > corrections (and follow-up discussion).
          >
          > Doug

          So I see, and one of the earlier posts tells how you've been going
          about this. (Not that I really understand what you mean by scripts.)
          Then my questions remain, is anyone besides you working on this
          problem, and what would it take to fix some of the large discrepancies
          in the earlier seasons?

          And don't let me forget: thanks for your efforts, Doug.

          Cliff
        • dsreyn
          Cliff - I m not sure if anyone else (other than Retrosheet efforts) is working on this. Regarding your other question, I think fixing problems in earlier
          Message 4 of 6 , May 12, 2006
          • 0 Attachment
            Cliff -

            I'm not sure if anyone else (other than Retrosheet efforts) is working
            on this.

            Regarding your other question, I think fixing problems in earlier
            seasons is going to be difficult. I've been focusing on recent years
            because these are generally easy to fix. For example, with the hits
            allowed by Texas pitching in 1981 (a discrepancy of 40 hits), it seems
            clear where the error lies. There are a number of published
            references available (plus Retrosheet and other on-line sources), and
            all of these agree, so I have no doubt that the BDB teams table was in
            error.

            With earlier seasons, things are often much less clear. There may be
            fewer sources to consult, but the biggest problem is that there may be
            different values for a particular stat from source to source (in many
            cases, several different values). Trying to figure out which source
            is correct is obviously not a straightforward process. Recomputing
            stats from box scores or play by play data might be the only solution.

            My thought is that it ought to be possible to get everything balanced
            back to perhaps around 1960 (give or take a few years). I'd expect
            another there to be another period where many, but not all of the
            problems may be fixable. Beyond some point (I'm not quite sure
            where), I think the chances of solving many of the problems drop
            significantly.

            Doug

            --- In baseball-databank@yahoogroups.com, "cliffordblau" <brak2.0@...>
            wrote:
            >
            > --- In baseball-databank@yahoogroups.com, "dsreyn" <dreynolds@> wrote:
            > >
            > > Cliff -
            > >
            > > See post 2977 for some information on discrepancies in the data. In
            > > short, there are quite a few cases where individual totals don't add
            > > to the team totals, and where batting and pitching totals for a season
            > > don't balance.
            > >
            > > I've been working on tracking down some of these problems (though I
            > > haven't done much of this lately). Posts 2930, 2931, 2932, 2933,
            > > 2935, 2936, 2941, 2943, 2944, and 2951 have some proposed data
            > > corrections (and follow-up discussion).
            > >
            > > Doug
            >
            > So I see, and one of the earlier posts tells how you've been going
            > about this. (Not that I really understand what you mean by scripts.)
            > Then my questions remain, is anyone besides you working on this
            > problem, and what would it take to fix some of the large discrepancies
            > in the earlier seasons?
            >
            > And don't let me forget: thanks for your efforts, Doug.
            >
            > Cliff
            >
          • Paul Wendt
            ... I think the question is how to set priorities, including where it is worth doing at all. We (bb-db) should have a guide to major league statistics that
            Message 5 of 6 , May 14, 2006
            • 0 Attachment
              Clifford Blau <brak2.0@...> wrote:
              > I'm posting this to both Retrolist and Baseball Databank.
              > Apologies to those who get it twice.
              >
              > I was aware of the problem of unbalanced statistics in distant
              > seasons, but still I was surprised today. Upon commencing some
              > research into walks, I found some large discrepancies between
              > batter and pitcher totals. Taking 1911 for instance, there is a
              > difference of 76 walks between the AL batting and pitching totals.

              I think the question is how to set priorities, including where it is
              worth doing at all.

              We (bb-db) should have a guide to major league statistics that shows
              and tells the sources, not player by player but year by year. In my
              mind's eye, it is largely graphical, relying on color- and line-
              shading. Clerical error in compilation of e-databases would be the
              unstated or once-for-all stated source of all data. For 1871-1968,
              ICI would be one general source but the guide would be finer grained:
              ICI daily newspaper research, ICI sum(by computer) of dailies, ICI sum
              of team pitchers (maybe nowhere used), etc. Historical research since
              1969 would be another general source --or two, one being research by
              Pete Palmer and those who worked directly with him.

              Paul Wendt
            • KJOK
              I ve looked at this off and on for a few years, mostly focusing on league totals for hitting and pitching.. Home Runs and Runs League Totals reconcile
              Message 6 of 6 , May 19, 2006
              • 0 Attachment
                I've looked at this off and on for a few years, mostly focusing on league totals for hitting and pitching..
                 
                Home Runs and Runs League Totals reconcile perfectly between hitters and pitchers!
                 
                The league "official" totals between batting and pitching are fairly consistent back until 1943, where the NL pitchers issued 4027 walks, while NL batters are credited with 4048 walks.
                 
                In 1942 NL, batters had 10,931 hits, while pitchers allowed 10,386.
                 
                There are many of these small differences in hits, walks and strikeouts until around 1930, where they start to grow slightly larger. 
                 
                Backing up in time, some specific years such as 1917 AL, 1913 AL, 1912 NL have larger differences. 1904-1910 is a relative disaster area.
                 
                Suprisingly 1903 AL and 1891-1902 both leagues are relatively error free in league total differences. 1885-1890 is similar to the 1930-1942 period.  And finally, 1871 - 1884 is relatively free of differences.
                 
                Of course, someone then needs to look at League totals vs. Team Totals, then Team totals vs. player totals....
                 
                THANKS,
                KJOK


                cliffordblau <brak2.0@...> wrote:
                --- In baseball-databank@yahoogroups.com, "dsreyn" <dreynolds@...> wrote:
                >
                > Cliff -
                >
                > See post 2977 for some information on discrepancies in the data.  In
                > short, there are quite a few cases where individual totals don't add
                > to the team totals, and where batting and pitching totals for a season
                > don't balance.
                >
                > I've been working on tracking down some of these problems (though I
                > haven't done much of this lately).  Posts 2930, 2931, 2932, 2933,
                > 2935, 2936, 2941, 2943, 2944, and 2951 have some proposed data
                > corrections (and follow-up discussion).
                >
                > Doug

                So I see, and one of the earlier posts tells how you've been going
                about this.  (Not that I really understand what you mean by scripts.)
                Then my questions remain, is anyone besides you working on this
                problem, and what would it take to fix some of the large discrepancies
                in the earlier seasons?

                And don't let me forget: thanks for your efforts, Doug.

                Cliff






                Be a chatter box. Enjoy free PC-to-PC calls with Yahoo! Messenger with Voice.

              Your message has been successfully submitted and would be delivered to recipients shortly.