Loading ...
Sorry, an error occurred while loading the content.

Re: Unbalanced Statistics

Expand Messages
  • dsreyn
    Cliff - See post 2977 for some information on discrepancies in the data. In short, there are quite a few cases where individual totals don t add to the team
    Message 1 of 6 , May 9, 2006
    • 0 Attachment
      Cliff -

      See post 2977 for some information on discrepancies in the data. In
      short, there are quite a few cases where individual totals don't add
      to the team totals, and where batting and pitching totals for a season
      don't balance.

      I've been working on tracking down some of these problems (though I
      haven't done much of this lately). Posts 2930, 2931, 2932, 2933,
      2935, 2936, 2941, 2943, 2944, and 2951 have some proposed data
      corrections (and follow-up discussion).

      Doug

      --- In baseball-databank@yahoogroups.com, Clifford Blau <brak2.0@...>
      wrote:
      >
      > I'm posting this to both Retrolist and Baseball Databank. Apologies to
      > those who get it twice.
      >
      > I was aware of the problem of unbalanced statistics in distant
      seasons, but
      > still I was surprised today. Upon commencing some research into
      walks, I
      > found some large discrepancies between batter and pitcher totals.
      Taking
      > 1911 for instance, there is a difference of 76 walks between the AL
      batting
      > and pitching totals. In the NL, for which Retrosheet has most PBP
      and I
      > believe all box scores, there is still a difference of 3. Other
      statistics
      > show discrepancies as well. Retrosheet has 2 more runs allowed by NL
      > pitchers than NL teams scored. Baseball-Reference.com, on the other
      hand,
      > has the totals agreeing, with several teams having different totals
      than
      > Retrosheet shows. BB-Ref has the same differences in other categories,
      > though. Does this mean that someone has reconciled the differences
      in the
      > runs columns, while leaving the others alone? Are incorrect numbers
      shown
      > by Retrosheet because they are the official ones?
      >
      > Is anyone trying to reconcile these other differences? Is this
      something
      > that relatively easy to do now, at least for the seasons Retrosheet
      has all
      > box scores?
      >
      >
      > Cliff Blau
      > http://mysite.verizon.net/brak2.0
      >
    • cliffordblau
      ... So I see, and one of the earlier posts tells how you ve been going about this. (Not that I really understand what you mean by scripts.) Then my questions
      Message 2 of 6 , May 9, 2006
      • 0 Attachment
        --- In baseball-databank@yahoogroups.com, "dsreyn" <dreynolds@...> wrote:
        >
        > Cliff -
        >
        > See post 2977 for some information on discrepancies in the data. In
        > short, there are quite a few cases where individual totals don't add
        > to the team totals, and where batting and pitching totals for a season
        > don't balance.
        >
        > I've been working on tracking down some of these problems (though I
        > haven't done much of this lately). Posts 2930, 2931, 2932, 2933,
        > 2935, 2936, 2941, 2943, 2944, and 2951 have some proposed data
        > corrections (and follow-up discussion).
        >
        > Doug

        So I see, and one of the earlier posts tells how you've been going
        about this. (Not that I really understand what you mean by scripts.)
        Then my questions remain, is anyone besides you working on this
        problem, and what would it take to fix some of the large discrepancies
        in the earlier seasons?

        And don't let me forget: thanks for your efforts, Doug.

        Cliff
      • dsreyn
        Cliff - I m not sure if anyone else (other than Retrosheet efforts) is working on this. Regarding your other question, I think fixing problems in earlier
        Message 3 of 6 , May 12, 2006
        • 0 Attachment
          Cliff -

          I'm not sure if anyone else (other than Retrosheet efforts) is working
          on this.

          Regarding your other question, I think fixing problems in earlier
          seasons is going to be difficult. I've been focusing on recent years
          because these are generally easy to fix. For example, with the hits
          allowed by Texas pitching in 1981 (a discrepancy of 40 hits), it seems
          clear where the error lies. There are a number of published
          references available (plus Retrosheet and other on-line sources), and
          all of these agree, so I have no doubt that the BDB teams table was in
          error.

          With earlier seasons, things are often much less clear. There may be
          fewer sources to consult, but the biggest problem is that there may be
          different values for a particular stat from source to source (in many
          cases, several different values). Trying to figure out which source
          is correct is obviously not a straightforward process. Recomputing
          stats from box scores or play by play data might be the only solution.

          My thought is that it ought to be possible to get everything balanced
          back to perhaps around 1960 (give or take a few years). I'd expect
          another there to be another period where many, but not all of the
          problems may be fixable. Beyond some point (I'm not quite sure
          where), I think the chances of solving many of the problems drop
          significantly.

          Doug

          --- In baseball-databank@yahoogroups.com, "cliffordblau" <brak2.0@...>
          wrote:
          >
          > --- In baseball-databank@yahoogroups.com, "dsreyn" <dreynolds@> wrote:
          > >
          > > Cliff -
          > >
          > > See post 2977 for some information on discrepancies in the data. In
          > > short, there are quite a few cases where individual totals don't add
          > > to the team totals, and where batting and pitching totals for a season
          > > don't balance.
          > >
          > > I've been working on tracking down some of these problems (though I
          > > haven't done much of this lately). Posts 2930, 2931, 2932, 2933,
          > > 2935, 2936, 2941, 2943, 2944, and 2951 have some proposed data
          > > corrections (and follow-up discussion).
          > >
          > > Doug
          >
          > So I see, and one of the earlier posts tells how you've been going
          > about this. (Not that I really understand what you mean by scripts.)
          > Then my questions remain, is anyone besides you working on this
          > problem, and what would it take to fix some of the large discrepancies
          > in the earlier seasons?
          >
          > And don't let me forget: thanks for your efforts, Doug.
          >
          > Cliff
          >
        • Paul Wendt
          ... I think the question is how to set priorities, including where it is worth doing at all. We (bb-db) should have a guide to major league statistics that
          Message 4 of 6 , May 14, 2006
          • 0 Attachment
            Clifford Blau <brak2.0@...> wrote:
            > I'm posting this to both Retrolist and Baseball Databank.
            > Apologies to those who get it twice.
            >
            > I was aware of the problem of unbalanced statistics in distant
            > seasons, but still I was surprised today. Upon commencing some
            > research into walks, I found some large discrepancies between
            > batter and pitcher totals. Taking 1911 for instance, there is a
            > difference of 76 walks between the AL batting and pitching totals.

            I think the question is how to set priorities, including where it is
            worth doing at all.

            We (bb-db) should have a guide to major league statistics that shows
            and tells the sources, not player by player but year by year. In my
            mind's eye, it is largely graphical, relying on color- and line-
            shading. Clerical error in compilation of e-databases would be the
            unstated or once-for-all stated source of all data. For 1871-1968,
            ICI would be one general source but the guide would be finer grained:
            ICI daily newspaper research, ICI sum(by computer) of dailies, ICI sum
            of team pitchers (maybe nowhere used), etc. Historical research since
            1969 would be another general source --or two, one being research by
            Pete Palmer and those who worked directly with him.

            Paul Wendt
          • KJOK
            I ve looked at this off and on for a few years, mostly focusing on league totals for hitting and pitching.. Home Runs and Runs League Totals reconcile
            Message 5 of 6 , May 19, 2006
            • 0 Attachment
              I've looked at this off and on for a few years, mostly focusing on league totals for hitting and pitching..
               
              Home Runs and Runs League Totals reconcile perfectly between hitters and pitchers!
               
              The league "official" totals between batting and pitching are fairly consistent back until 1943, where the NL pitchers issued 4027 walks, while NL batters are credited with 4048 walks.
               
              In 1942 NL, batters had 10,931 hits, while pitchers allowed 10,386.
               
              There are many of these small differences in hits, walks and strikeouts until around 1930, where they start to grow slightly larger. 
               
              Backing up in time, some specific years such as 1917 AL, 1913 AL, 1912 NL have larger differences. 1904-1910 is a relative disaster area.
               
              Suprisingly 1903 AL and 1891-1902 both leagues are relatively error free in league total differences. 1885-1890 is similar to the 1930-1942 period.  And finally, 1871 - 1884 is relatively free of differences.
               
              Of course, someone then needs to look at League totals vs. Team Totals, then Team totals vs. player totals....
               
              THANKS,
              KJOK


              cliffordblau <brak2.0@...> wrote:
              --- In baseball-databank@yahoogroups.com, "dsreyn" <dreynolds@...> wrote:
              >
              > Cliff -
              >
              > See post 2977 for some information on discrepancies in the data.  In
              > short, there are quite a few cases where individual totals don't add
              > to the team totals, and where batting and pitching totals for a season
              > don't balance.
              >
              > I've been working on tracking down some of these problems (though I
              > haven't done much of this lately).  Posts 2930, 2931, 2932, 2933,
              > 2935, 2936, 2941, 2943, 2944, and 2951 have some proposed data
              > corrections (and follow-up discussion).
              >
              > Doug

              So I see, and one of the earlier posts tells how you've been going
              about this.  (Not that I really understand what you mean by scripts.)
              Then my questions remain, is anyone besides you working on this
              problem, and what would it take to fix some of the large discrepancies
              in the earlier seasons?

              And don't let me forget: thanks for your efforts, Doug.

              Cliff






              Be a chatter box. Enjoy free PC-to-PC calls with Yahoo! Messenger with Voice.

            Your message has been successfully submitted and would be delivered to recipients shortly.