Loading ...
Sorry, an error occurred while loading the content.

Re: [baseball-databank] Re: Filling in Gaps in Information

Expand Messages
  • Paul Golba
    Now that I have my question answered, here is the data I can provide. I am trying to compute Win Shares using the Data-Bank DB. There are statistics required
    Message 1 of 7 , May 28, 2010
      Now that I have my question answered, here is the data I can provide.  I am trying to compute Win Shares using the Data-Bank DB.  There are statistics required to compute Win Shares that the DB does not have, so either I had to calculate them from Retrosheet (some from the game logs, some from the PBP) or had to type them in from the reference that is the STATS, Inc. All-Time Major League Handbook. 

      I am generally confident in the accuracy of the following:
      - Team Runs Scored and Surrendered, broken down by Home and Away (game logs).  Note that anything that starts with "Team" is only on the team level, not the player level.
      - Team Home Runs Hit and Surrendered, Home and Away (game logs 1920-1932 & 1952-2009, book 1876-1919 & 1933-1951)
      - Team Games Played, Home and Away (game logs)
      - Player Situational Hitting (PBP) - For Win Shares I need batting average with runners in scoring position, and home runs with runners on base.  I should be able to do some other numbers too, on request.
      - Team SH Allowed (game logs 1931-1932 & 1952-2009, book 1933-1951)
      - Team SF Allowed (game logs 1954-2009)
      - Catcher Earned Runs (PBP, 1987-2009) - This is mainly Retrosheet data, assuming that any run scored while the catcher is in the field belongs to said catcher.  However, I then found all situations where (a) the catcher was changed mid-inning (b) with a runner on base and (c) a run scored later in the inning, and then made manual adjustments to the numbers.
      - Team DP (book, 1883-1899)
      - League Singles Allowed (game logs, 1997-2009) - I needed this for 1997 onward because of inter-league play. (No, I can't fathom why you would want this).

      I'm not so confident about these numbers:
      - Team Home Wins and Road Losses (game logs) - If the Retrosheet HTBF flag (Home Team Batted First) is set, I treat the home team like the road team, which is probably not what normal people would want.  I can calculate the normal W-L by home and away without too much of an issue.
      - Pitcher Holds (PBP, 1987-2009) - I am confident that the code works as expected, but Holds are an unofficial statistic with multiple definitions.  I am currently using the STATS definition circa 2000, but I can calculate it other ways.  Probably could figure out Blown Saves too.
      - Catcher Caught Stealing (PBP, 1987-2009) - This is the CS excluding the ones that belong to the pitcher.  First, I may be missing some oddball situations like double CS.  Second, the Retrosheet PBP sometimes has highly unlikely CS scorings (43?, 36?) which I am not sure what to do with.  I think the numbers are reasonably close.
      - Unassisted First Baseman Putouts (PBP, 1987-2009) - Again, the Retrosheet fielding data can be wrong in places.  (Actually, there was a bit of a flame war about this on the Retrosheet mailing list.  I would not expect these numbers to be precise, to say the least.)

      If you would like any of this data, please let me know and I will share it.  I would assume that any data I provide would have to be double checked before it could be added to the DB.


      From: anson2995 <slahman@...>
      To: baseball-databank@yahoogroups.com
      Sent: Thu, May 27, 2010 9:27:55 AM
      Subject: [baseball-databank] Re: Filling in Gaps in Information


      Paul Golba <pgolba2@...> wrote:

      > I am still curious as to why the database that David linked to
      > is not the current Data-Bank database. I'm relatively new
      > here. Can someone clue me in?

      No idea. If David has his own database, that's fine, but that's his own thing.

      You're right about the difference between zero values and null values.

      I confess that I don't know what work has been done to fill in the BFP numbers, either within the BBD database or in other sources. However, it sounds like this data may have been calculated from Retrosheet data, but in some cases calculated incorrectly. I'm hoping someone else can chime in on that. Anyone?

      Sean Lahman

    Your message has been successfully submitted and would be delivered to recipients shortly.