Re: [baseball-databank] Re: Filling in Gaps in Information
- Now that I have my question answered, here is the data I can provide. I am trying to compute Win Shares using the Data-Bank DB. There are statistics required to compute Win Shares that the DB does not have, so either I had to calculate them from Retrosheet (some from the game logs, some from the PBP) or had to type them in from the reference that is the STATS, Inc. All-Time Major League Handbook.
I am generally confident in the accuracy of the following:
- Team Runs Scored and Surrendered, broken down by Home and Away (game logs). Note that anything that starts with "Team" is only on the team level, not the player level.
- Team Home Runs Hit and Surrendered, Home and Away (game logs 1920-1932 & 1952-2009, book 1876-1919 & 1933-1951)
- Team Games Played, Home and Away (game logs)
- Player Situational Hitting (PBP) - For Win Shares I need batting average with runners in scoring position, and home runs with runners on base. I should be able to do some other numbers too, on request.
- Team SH Allowed (game logs 1931-1932 & 1952-2009, book 1933-1951)
- Team SF Allowed (game logs 1954-2009)
- Catcher Earned Runs (PBP, 1987-2009) - This is mainly Retrosheet data, assuming that any run scored while the catcher is in the field belongs to said catcher. However, I then found all situations where (a) the catcher was changed mid-inning (b) with a runner on base and (c) a run scored later in the inning, and then made manual adjustments to the numbers.
- Team DP (book, 1883-1899)
- League Singles Allowed (game logs, 1997-2009) - I needed this for 1997 onward because of inter-league play. (No, I can't fathom why you would want this).
I'm not so confident about these numbers:
- Team Home Wins and Road Losses (game logs) - If the Retrosheet HTBF flag (Home Team Batted First) is set, I treat the home team like the road team, which is probably not what normal people would want. I can calculate the normal W-L by home and away without too much of an issue.
- Pitcher Holds (PBP, 1987-2009) - I am confident that the code works as expected, but Holds are an unofficial statistic with multiple definitions. I am currently using the STATS definition circa 2000, but I can calculate it other ways. Probably could figure out Blown Saves too.
- Catcher Caught Stealing (PBP, 1987-2009) - This is the CS excluding the ones that belong to the pitcher. First, I may be missing some oddball situations like double CS. Second, the Retrosheet PBP sometimes has highly unlikely CS scorings (43?, 36?) which I am not sure what to do with. I think the numbers are reasonably close.
- Unassisted First Baseman Putouts (PBP, 1987-2009) - Again, the Retrosheet fielding data can be wrong in places. (Actually, there was a bit of a flame war about this on the Retrosheet mailing list. I would not expect these numbers to be precise, to say the least.)
If you would like any of this data, please let me know and I will share it. I would assume that any data I provide would have to be double checked before it could be added to the DB.