Loading ...
Sorry, an error occurred while loading the content.

problems in Lahman2012 database

Expand Messages
  • Neal Traven
    I ve been starting to look at fielding in the just-released 2012 version of the Lahman database (in Access format). Early in the set of queries I ran I saw
    Message 1 of 1 , Jan 12, 2013
    • 0 Attachment
      I've been starting to look at fielding in the just-released 2012 version of the Lahman database (in Access format). Early in the set of queries I ran I saw some very strange results, which led me to examine the tables themselves. I believe I've uncovered some significant problems in a couple of tables.
      • In the Fielding table, there are no rolled-up "OF" rows for either 2011 or 2012. Thus, for example, I can see that Roger Bernadina played 56 games in CF, 36 in LF, and 10 in RF in 2011, but can't tell how many games in total he played in the outfield (B-R says 84 ... he switched positions a lot).
      • The Appearances table appears to have the data for "games started" (not supposed to be a field in the table) in the "G_batting" field, with the numbers for all other fields shifted one column to the right (and "G_pr" data gone). Again using Bernadina in 2011, the upper row below is what actually appears, and the lower row is what I believe it should be (field names shortened for ease of viewing):
        • year team lgID playerID  G_all bat def  p   c  1b 2b 3b ss lf  cf  rf  of  dh  ph  pr
          2011  WAS  NL  bernaro01  91   71  91   84  0  0  0  0  0  0   36  56  10  84  0   7
        • year team lgID playerID  G_all bat def  p   c  1b 2b 3b ss lf  cf  rf  of  dh  ph  pr
          2011  WAS  NL  bernaro01  91   91  84   0   0  0  0  0  0  36  56  10  84  0   7   4
      The extra "71" in the upper row matches what B-R shows for Bernadina's GS on defense. The values in the lower row, shifted one column to the left from the upper row, match B-R. I added his actual PR count from the B-R game logs.
      As best I can tell, that erroneous extra column appears throughout the Appearances table, even when there aren't numbers to fill it. So, for instance, Cy Young pitched and caught. If I can trust that that's really the situation, I can rework the table for my needs. I'm not looking at pinch-running at the moment, so I don't care that that info is missing. If that's an appropriate jury-rigged solution to the Appearances problem, then I can also use the correctly-placed data for "G_of" as a work-around for the Fielding table situation.

      Correct tables, of course, would be a better solution.

      Thanks!
      -- 
      ---------------------------------------------------------------------------
      neal traven                              beisbol@...
        "You're only young once, but you can be immature forever."
                                     -- Larry Andersen, relief pitcher
      
    Your message has been successfully submitted and would be delivered to recipients shortly.