Loading ...
Sorry, an error occurred while loading the content.

Re: [baseball-databank] Data Question

Expand Messages
  • Sean Forman
    Aaron, Baseball-Reference is almost certainly more accurate, there are thousands of discrepancies between the official totals which is largely in the BDB and
    Message 1 of 9 , Apr 25 6:18 AM
    View Source
    • 0 Attachment
      Aaron,

      Baseball-Reference is almost certainly more accurate, there are thousands of discrepancies between the official totals which is largely in the BDB and what actually happened or what the game logs show.  B-R uses the Palmer database that represents a lot of additional work that Pete has performed to clarify and correct the official record.  

      sean
      ---
      Sean Forman
      Sports Reference LLC, President
      http://www.sports-reference.com/



      On Mon, Apr 25, 2011 at 8:44 AM, aaron_carlisle <aaron_carlisle@...> wrote:
       

      I've googled and checked FAQs, but I can't seem to find the answer to this. If someone could either point me in the direction of an answer, or tell me which one is right, I would appreciate it.

      I had a question not easily answered through the baseball-reference.com website, so I downloaded the BDB-sql-2011-03-28 SQL file. There was a discrepancy when I looked on the baseball-reference.com site for details about one of my results, and I narrowed it down to this:

      From BDB-sql-2011-03-28:
      mysql> select AB from Teams where name = 'New York Yankees' and yearID = '1927';
      +------+
      | AB |
      +------+
      | 5347 |
      +------+
      1 row in set (0.00 sec)

      Looking at http://www.baseball-reference.com/teams/NYY/1927.shtml , I get 5354 total AB for the '27 Yankees (in the Team Totals line).

      So, which is correct for the '27 Yankees? Did the team have 5354 or 5347 ABs? I doubt Retrosheet would be any help this far back, so I'm not sure how to find this out.


    • Tangotiger
      ... Blasphemy! You can do it yourself here: http://www.retrosheet.org/boxesetc/1927/VNYA01927.htm Or you can look at their compilation here:
      Message 2 of 9 , Apr 25 6:21 AM
      View Source
      • 0 Attachment
        > ABs? I doubt Retrosheet would be any help this far back, so I'm not sure
        > how to find this out.
        >
        >

        Blasphemy!

        You can do it yourself here:
        http://www.retrosheet.org/boxesetc/1927/VNYA01927.htm

        Or you can look at their compilation here:
        http://www.retrosheet.org/boxesetc/1927/WNYA01927.htm

        You can also download the files yourself.

        Tom


        ---------------------------------------------
        The Book--Playing The Percentages In Baseball
        http://www.InsideTheBook.com
      • mwe55innc@gmail.com
        ... Realize that you need to take any of these sources with a healthy dose of salt; there are thousands of discrepancies among data sources and they get worse
        Message 3 of 9 , Apr 25 6:55 AM
        View Source
        • 0 Attachment
          On Apr 25, 2011 9:21am, Tangotiger wrote:

          >
          > > ABs? I doubt Retrosheet would be any help this far back, so I'm not sure
          > > how to find this out.
          >
          > Blasphemy!
          >
          > You can do it yourself here:
          > http://www.retrosheet.org/boxesetc/1927/VNYA01927.htm
          >
          > Or you can look at their compilation here:
          >
          > http://www.retrosheet.org/boxesetc/1927/WNYA01927.htm

          Realize that you need to take any of these sources with a healthy dose of salt; there are thousands of discrepancies among data sources and they get worse the further back in history that you go. Retrosheet's researchers and Pete Palmer have done what they can to resolve many of them, but I think that the odds that any particular set of statistics for any team prior to about 1975 or so is exactly correct are pretty small. If you're trying to track down every last AB, good luck (and please report back any discrepancies that you find!)

          Mike Emeigh
          MWE55inNC@...
        • aaron_carlisle
          Yeah, I didn t mean to imply I was looking for an absolute truth. I didn t check retrosheet because I was mostly curious about why the discrepancy exists and
          Message 4 of 9 , Apr 25 7:15 AM
          View Source
          • 0 Attachment
            Yeah, I didn't mean to imply I was looking for an absolute truth. I didn't check retrosheet because I was mostly curious about why the discrepancy exists and which source between the two is more accurate. I was under the incorrect assumption that BDB and B-R.com use the same source data.

            Anyway, retrosheet has a different number than either of the other two sources (one fewer AB than baseball-reference.com), so it doesn't really help verify one over the other.

            Mr. Forman completely answered my question. You people are super responsive and friendly. Three courteous and informative responses within minutes of an email almost gives me faith in the future of internet dialogue. Almost.

            --- In baseball-databank@yahoogroups.com, mwe55innc@... wrote:
            >
            > On Apr 25, 2011 9:21am, Tangotiger wrote:
            >
            >
            > > > ABs? I doubt Retrosheet would be any help this far back, so I'm not sure
            > > > how to find this out.
            >
            > > Blasphemy!
            >
            > > You can do it yourself here:
            > > http://www.retrosheet.org/boxesetc/1927/VNYA01927.htm
            >
            > > Or you can look at their compilation here:
            >
            > > http://www.retrosheet.org/boxesetc/1927/WNYA01927.htm
            >
            > Realize that you need to take any of these sources with a healthy dose of
            > salt; there are thousands of discrepancies among data sources and they get
            > worse the further back in history that you go. Retrosheet's researchers and
            > Pete Palmer have done what they can to resolve many of them, but I think
            > that the odds that any particular set of statistics for any team prior to
            > about 1975 or so is exactly correct are pretty small. If you're trying to
            > track down every last AB, good luck (and please report back any
            > discrepancies that you find!)
            >
            > Mike Emeigh
            > MWE55inNC@...
            >
          • Tangotiger
            ... I agree. These things are not verifiable facts (therefore true and passes the Reagan test). They are assumptions of fact (which means good enough in a
            Message 5 of 9 , Apr 25 7:26 AM
            View Source
            • 0 Attachment
              > Realize that you need to take any of these sources with a healthy dose of
              > salt; there are thousands of discrepancies among data sources and they get
              > worse the further back in history that you go. Retrosheet's researchers
              > and
              > Pete Palmer have done what they can to resolve many of them, but I think
              > that the odds that any particular set of statistics for any team prior to
              > about 1975 or so is exactly correct are pretty small. If you're trying to
              > track down every last AB, good luck (and please report back any
              > discrepancies that you find!)
              >

              I agree. These things are not verifiable facts (therefore true and passes
              the Reagan test). They are assumptions of fact (which means good enough
              in a court of law that will occasionally find an innocent person guilty).

              Retrosheet researchers are just like any other researchers that try to
              make sense of the various evidence they find.

              Tom
            • Tangotiger
              ... I would presume(*) that Pete Palmer would use Retrosheet as his source of facts if they had it. And if he disagreed, he would either rely on Retrosheet,
              Message 6 of 9 , Apr 25 8:17 AM
              View Source
              • 0 Attachment
                > Anyway, retrosheet has a different number than either of the other two
                > sources (one fewer AB than baseball-reference.com), so it doesn't really
                > help verify one over the other.
                >

                I would presume(*) that Pete Palmer would use Retrosheet as his source of
                facts if they had it. And if he disagreed, he would either rely on
                Retrosheet, or let Dave know that he disagrees and where he disagrees.
                And Dave would presumably either remain disagreed, or change to match
                Pete.

                (*) Something I've inferred based on the way Pete has described things in
                the past.

                So, there is a strong relationship between the two (as best as I
                understand it) to try to get to the truth.

                Note that Retro updates sporadically (say two to four times a year), while
                I presume Pete's update cycle is more frequent.

                Tom
              • Clem Comly
                First, if you sum the players ABs from batting.txt they come to 5434 which matches B-R and Retrosheet. The data Retrosheet (and I believe B-R) shows for the
                Message 7 of 9 , Apr 25 4:24 PM
                View Source
                • 0 Attachment
                  First, if you sum the players ABs from batting.txt they come to 5434 which matches B-R and Retrosheet.  The data Retrosheet (and I believe B-R) shows for the team total for 1927 Yankees is the sum of the players stats from Pete Palmer.  I've never seen the raw Palmer data set, but I believe Palmer's data set does have separate team data from the official stats which may not show 5434.  The question is which is correct total for 1927 NYA.  There is no guarantee 5434 is the correct total.
                   
                  As a matter of fact, if one goes through the individual box scores on Retrosheet one gets 5353 ABs for 1927 Yankees.  For those of us who are lazy, we can accomplish the same thing by looking at the Total line of the Retrosheet batting splits page for 1927 Yankees.  By going through the splits of Yankee batters one-by-one we find Mike Gazella has 115 ABs per Palmer and 114 per Retrosheet boxes.  Retrosheet hopes to soon post identified discrepancies which would allow the curious to see where the issues are and the industrious to attempt to resolve.
                   
                  I suspect the problem is Teams.txt is not always updated when Fielding.txt, Pitching.txt or Batting.txt are updated.  There are only a few stats in Teams.txt that aren't simply the sums of individuals (Wins, L,ties,ER,ERA,SHO,DP).  I see 2 courses: drop the other stat columns from Teams.txt or recalculate them all after revisions to other tables.  The ER total is also a sum till the team unearned run was created by what is now rule 10.18 back in the 1960s or 1970s.  
                   
                  Clem Comly 
                   
                  ----- Original Message -----
                  Sent: Monday, April 25, 2011 8:44 AM
                  Subject: [baseball-databank] Data Question

                   

                  I've googled and checked FAQs, but I can't seem to find the answer to this. If someone could either point me in the direction of an answer, or tell me which one is right, I would appreciate it.

                  I had a question not easily answered through the baseball-reference.com website, so I downloaded the BDB-sql-2011-03-28 SQL file. There was a discrepancy when I looked on the baseball-reference.com site for details about one of my results, and I narrowed it down to this:

                  From BDB-sql-2011-03-28:
                  mysql> select AB from Teams where name = 'New York Yankees' and yearID = '1927';
                  +------+
                  | AB |
                  +------+
                  | 5347 |
                  +------+
                  1 row in set (0.00 sec)

                  Looking at http://www.baseball-reference.com/teams/NYY/1927.shtml , I get 5354 total AB for the '27 Yankees (in the Team Totals line).

                  So, which is correct for the '27 Yankees? Did the team have 5354 or 5347 ABs? I doubt Retrosheet would be any help this far back, so I'm not sure how to find this out.

                • tjruane
                  First of all, I think Clem meant to write 5354 below instead of 5434. And it is correct that if you look at the batting splits page for the New York Yankees
                  Message 8 of 9 , Apr 26 5:02 AM
                  View Source
                  • 0 Attachment
                    First of all, I think Clem meant to write 5354 below instead of 5434.

                    And it is correct that if you look at the batting splits page for the New York Yankees that year, you get one fewer at bat, or 5353. The difference is in their game on April 20th, where officially Mike Gazella had four at-bats while we think he had only three. Other sources for the game agree that he had three at-bats and if he did have four at-bats in that game, it would have meant that his spot in the lineup came up five times, while the spots before and after him had four and three plate appearances. So I'm pretty confident that this is an official error.

                    In the pre-convention release, we should be releasing discrepancy files for this (and many other years) and incorporating this data on our game, team and player pages. The line in the discrepancy file for this issue looks like:

                    "1927A0103","gazem101",1927,"NYA","O",,"AB","PHA192704200",3,4,,,,

                    It contains a discrepancy ID, a retro-sheet player ID, year, team ID, "O" (for an offensive discrepancy), "AB" (stat ID), game ID, our total, the official total, and some note fields (not filled in at present).

                    Anyway, I figured some of you might be interested in this level of detail.

                    Tom Ruane

                    --- In baseball-databank@yahoogroups.com, "Clem Comly" <ccomly@...> wrote:
                    >
                    > First, if you sum the players ABs from batting.txt they come to 5434 which matches B-R and Retrosheet. The data Retrosheet (and I believe B-R) shows for the team total for 1927 Yankees is the sum of the players stats from Pete Palmer. I've never seen the raw Palmer data set, but I believe Palmer's data set does have separate team data from the official stats which may not show 5434. The question is which is correct total for 1927 NYA. There is no guarantee 5434 is the correct total.
                    >
                    > As a matter of fact, if one goes through the individual box scores on Retrosheet one gets 5353 ABs for 1927 Yankees. For those of us who are lazy, we can accomplish the same thing by looking at the Total line of the Retrosheet batting splits page for 1927 Yankees. By going through the splits of Yankee batters one-by-one we find Mike Gazella has 115 ABs per Palmer and 114 per Retrosheet boxes. Retrosheet hopes to soon post identified discrepancies which would allow the curious to see where the issues are and the industrious to attempt to resolve.
                    >
                    > I suspect the problem is Teams.txt is not always updated when Fielding.txt, Pitching.txt or Batting.txt are updated. There are only a few stats in Teams.txt that aren't simply the sums of individuals (Wins, L,ties,ER,ERA,SHO,DP). I see 2 courses: drop the other stat columns from Teams.txt or recalculate them all after revisions to other tables. The ER total is also a sum till the team unearned run was created by what is now rule 10.18 back in the 1960s or 1970s.
                    >
                    > Clem Comly
                    >
                    > ----- Original Message -----
                    > From: aaron_carlisle
                    > To: baseball-databank@yahoogroups.com
                    > Sent: Monday, April 25, 2011 8:44 AM
                    > Subject: [baseball-databank] Data Question
                    >
                    >
                    >
                    > I've googled and checked FAQs, but I can't seem to find the answer to this. If someone could either point me in the direction of an answer, or tell me which one is right, I would appreciate it.
                    >
                    > I had a question not easily answered through the baseball-reference.com website, so I downloaded the BDB-sql-2011-03-28 SQL file. There was a discrepancy when I looked on the baseball-reference.com site for details about one of my results, and I narrowed it down to this:
                    >
                    > From BDB-sql-2011-03-28:
                    > mysql> select AB from Teams where name = 'New York Yankees' and yearID = '1927';
                    > +------+
                    > | AB |
                    > +------+
                    > | 5347 |
                    > +------+
                    > 1 row in set (0.00 sec)
                    >
                    > Looking at http://www.baseball-reference.com/teams/NYY/1927.shtml , I get 5354 total AB for the '27 Yankees (in the Team Totals line).
                    >
                    > So, which is correct for the '27 Yankees? Did the team have 5354 or 5347 ABs? I doubt Retrosheet would be any help this far back, so I'm not sure how to find this out.
                    >
                  Your message has been successfully submitted and would be delivered to recipients shortly.