Loading ...
Sorry, an error occurred while loading the content.

2012 Database

Expand Messages
  • anson2995
    An updated version of the database is now available, in both the CSV and Access versions. Links are below. CSV:
    Message 1 of 16 , Jan 2, 2013
    • 0 Attachment
      An updated version of the database is now available, in both the CSV and Access versions. Links are below.

      CSV: http://seanlahman.com/files/database/lahman2012_beta.zip
      Access: http://seanlahman.com/files/database/lahman2012_beta-MS.zip

      Would love to get feedback on any issues that may exist. We'll need to make an update after the new HOF class is announced next week. Hopefully we can incorporate any fixes at the same time.

      I'm also working on a set of supplementary tables, which in concept would contain datasets that many analysts would like without cluttering the main database. My first iteration will include split stats -- home/road and left/right for batters and pitchers. Dan Hirsch did the work to create these tables from the PBP database, and they cover 1945-2012.

      I'm open for suggestions on other tables that might be added to the supplementary collection.

      Regards,
      Sean Lahman
    • Tangotiger
      For the batter s hand: you should make it as bathand and pithand, especially notable for someone like JT Snow who used to switch hit, and then didn t. We
      Message 2 of 16 , Jan 2, 2013
      • 0 Attachment
        For the batter's hand: you should make it as bathand and pithand,
        especially notable for someone like JT Snow who used to switch hit, and
        then didn't. We shouldn't infer that the batting hand of switch hitters
        always switches. (Indeed, he may not even be listed as a switch hitter.)

        Presuming you want to keep the splits lines to two per split:
        -runners: bases empty v runners on

        -starting: in starting lineup v sub (applies to both pitchers and
        nonpitchers)

        Tom
      • Luc Belanger
        I thought you could be interested ... I thought you could be interested Le 2013-01-02 16:04, anson2995 a écrit :   An updated
        Message 3 of 16 , Jan 2, 2013
        • 0 Attachment

          I thought you could be interested

          Le 2013-01-02 16:04, "anson2995" <seanlahman@...> a écrit :
           

          An updated version of the database is now available, in both the CSV and Access versions. Links are below.

          CSV: http://seanlahman.com/files/database/lahman2012_beta.zip
          Access: http://seanlahman.com/files/database/lahman2012_beta-MS.zip

          Would love to get feedback on any issues that may exist. We'll need to make an update after the new HOF class is announced next week. Hopefully we can incorporate any fixes at the same time.

          I'm also working on a set of supplementary tables, which in concept would contain datasets that many analysts would like without cluttering the main database. My first iteration will include split stats -- home/road and left/right for batters and pitchers. Dan Hirsch did the work to create these tables from the PBP database, and they cover 1945-2012.

          I'm open for suggestions on other tables that might be added to the supplementary collection.

          Regards,
          Sean Lahman

        • KJOK
          My suggestion for H/R and left/right is to split first by H/R, THEN by L/R   H and Bathand A and Bathand etc. for both batters and pitchers.  Switch hitters
          Message 4 of 16 , Jan 2, 2013
          • 0 Attachment
            My suggestion for H/R and left/right is to split first by H/R, THEN by L/R
             
            H and Bathand
            A and Bathand
            etc. for both batters and pitchers.  Switch hitters would usually have 4 lines of data instead of two.
             
            THANKS,
            Kevin
            From: anson2995 <seanlahman@...>
            To: baseball-databank@yahoogroups.com
            Sent: Wednesday, January 2, 2013 3:03 PM
            Subject: [baseball-databank] 2012 Database
             
            An updated version of the database is now available, in both the CSV and Access versions. Links are below.

            CSV: http://seanlahman.com/files/database/lahman2012_beta.zip
            Access: http://seanlahman.com/files/database/lahman2012_beta-MS.zip

            Would love to get feedback on any issues that may exist. We'll need to make an update after the new HOF class is announced next week. Hopefully we can incorporate any fixes at the same time.

            I'm also working on a set of supplementary tables, which in concept would contain datasets that many analysts would like without cluttering the main database. My first iteration will include split stats -- home/road and left/right for batters and pitchers. Dan Hirsch did the work to create these tables from the PBP database, and they cover 1945-2012.

            I'm open for suggestions on other tables that might be added to the supplementary collection.

            Regards,
            Sean Lahman

          • anson2995
            Hans Van Slooten was kind enough to pull together an SQL version of the updated database, for those who roll that way. Big thanks to Hans. Here s a link.
            Message 5 of 16 , Jan 2, 2013
            • 0 Attachment
              Hans Van Slooten was kind enough to pull together an SQL version of the updated database, for those who roll that way. Big thanks to Hans. Here's a link.

              SQL: http://seanlahman.com/files/database/lahman2012_beta-SQL.zip



              --- In baseball-databank@yahoogroups.com, "anson2995" <seanlahman@...> wrote:
              >
              > An updated version of the database is now available, in both the CSV and Access versions. Links are below.
              >
              > CSV: http://seanlahman.com/files/database/lahman2012_beta.zip
              > Access: http://seanlahman.com/files/database/lahman2012_beta-MS.zip
              >
              > Would love to get feedback on any issues that may exist. We'll need to make an update after the new HOF class is announced next week. Hopefully we can incorporate any fixes at the same time.
              >
              > I'm also working on a set of supplementary tables, which in concept would contain datasets that many analysts would like without cluttering the main database. My first iteration will include split stats -- home/road and left/right for batters and pitchers. Dan Hirsch did the work to create these tables from the PBP database, and they cover 1945-2012.
              >
              > I'm open for suggestions on other tables that might be added to the supplementary collection.
              >
              > Regards,
              > Sean Lahman
              >
            • Clem Comly
              I downloaded the Access version but it doesn’t contain the consistency queries. I assume someone has run them. Could they post the queries and the results
              Message 6 of 16 , Jan 2, 2013
              • 0 Attachment
                I downloaded the Access version but it doesn’t contain the consistency queries.  I assume someone has run them.  Could they post the queries and the results please?
                 
                Clem Comly
                 
              • abrennan_jaded
                A few minor corrections to report: In the Managers table: A few new managers have incorrect managerID values: defrato99 - defrato99m venturro01 - venturo01m
                Message 7 of 16 , Jan 2, 2013
                • 0 Attachment
                  A few minor corrections to report:

                  In the Managers table:
                  A few new managers have incorrect managerID values:
                  defrato99 -> defrato99m
                  venturro01 -> venturo01m

                  In the HallOfFame table:
                  A few players have incorrect hofID values:
                  harrisle01 -> harrile01h
                  martiti01h -> martiti02h

                  Also, for all 2011 candidates who were not elected, the inducted column is blank instead of N.

                  There are a few problems with the new ID given to Kevin Youkilis last year:

                  In the AwardsPlayers table (2 entries) and the SchoolsPlayers table:
                  youkike01 -> youklke01


                  Overall, this looks like a really release. Thanks to everyone for all the hard work that's put into this. Looking forward to the new data as well.

                  -Andy


                  --- In baseball-databank@yahoogroups.com, "anson2995" wrote:
                  >
                  > An updated version of the database is now available, in both the CSV and Access versions. Links are below.
                • Jeff
                  Any chance of getting pitchers doubles and triples against this year? -j
                  Message 8 of 16 , Jan 3, 2013
                  • 0 Attachment
                    Any chance of getting pitchers' doubles and triples against this year?

                    -j


                    --- In baseball-databank@yahoogroups.com, "anson2995" wrote:
                    >
                    > An updated version of the database is now available, in both the CSV and Access versions. Links are below.
                    >
                    > CSV: http://seanlahman.com/files/database/lahman2012_beta.zip
                    > Access: http://seanlahman.com/files/database/lahman2012_beta-MS.zip
                    >
                    > Would love to get feedback on any issues that may exist. We'll need to make an update after the new HOF class is announced next week. Hopefully we can incorporate any fixes at the same time.
                    >
                    > I'm also working on a set of supplementary tables, which in concept would contain datasets that many analysts would like without cluttering the main database. My first iteration will include split stats -- home/road and left/right for batters and pitchers. Dan Hirsch did the work to create these tables from the PBP database, and they cover 1945-2012.
                    >
                    > I'm open for suggestions on other tables that might be added to the supplementary collection.
                    >
                    > Regards,
                    > Sean Lahman
                    >
                  • Chris Lambrou
                    Question for the master table: On players who debut in 2012 there appears to be a hexadecimal number in an unnamed column at the end (after bbrefID).  Is
                    Message 9 of 16 , Jan 3, 2013
                    • 0 Attachment
                      Question for the master table:
                      On players who debut in 2012 there appears to be a hexadecimal number in an unnamed column at the end (after bbrefID).  Is this data significant at all?

                      Thanks so much!
                       
                      -Chris Lambrou





                      From: anson2995 <seanlahman@...>
                      To: baseball-databank@yahoogroups.com
                      Sent: Wednesday, January 2, 2013 3:03 PM
                      Subject: [baseball-databank] 2012 Database

                       
                      An updated version of the database is now available, in both the CSV and Access versions. Links are below.

                      CSV: http://seanlahman.com/files/database/lahman2012_beta.zip
                      Access: http://seanlahman.com/files/database/lahman2012_beta-MS.zip

                      Would love to get feedback on any issues that may exist. We'll need to make an update after the new HOF class is announced next week. Hopefully we can incorporate any fixes at the same time.

                      I'm also working on a set of supplementary tables, which in concept would contain datasets that many analysts would like without cluttering the main database. My first iteration will include split stats -- home/road and left/right for batters and pitchers. Dan Hirsch did the work to create these tables from the PBP database, and they cover 1945-2012.

                      I'm open for suggestions on other tables that might be added to the supplementary collection.

                      Regards,
                      Sean Lahman



                    • anson2995
                      Great feedback. Thanks, Andy. Chris asked about the stray hexadecimal codes in the CSV version of the master table. Those are the SABR-IDs for those new
                      Message 10 of 16 , Jan 4, 2013
                      • 0 Attachment
                        Great feedback. Thanks, Andy.

                        Chris asked about the stray hexadecimal codes in the CSV version of the master table. Those are the SABR-IDs for those new players. At one point I pondered adding those for all players but I forgot about it. Will remove them for now until we get back to that task.

                        Also, Jeff asked about adding doubles and triples by pitchers. It's something we ought to do for all seasons as far back as we are able. It raises the larger issue of fields that could be added to the main statistical tables, and I suspect there is also room for this type of improvement in the post-season tables as well. This is something I'll commit to tackling, but in as much as it will entail some changes to the structure of the tables, I'd rather wait until after we get this update squared away and do it as a separate project.

                        --Sean

                        --- In baseball-databank@yahoogroups.com, "abrennan_jaded" wrote:
                        >
                        > A few minor corrections to report:
                        >
                        > In the Managers table:
                        > A few new managers have incorrect managerID values:
                        > defrato99 -> defrato99m
                        > venturro01 -> venturo01m
                        >
                        > In the HallOfFame table:
                        > A few players have incorrect hofID values:
                        > harrisle01 -> harrile01h
                        > martiti01h -> martiti02h
                        >
                        > Also, for all 2011 candidates who were not elected, the inducted column is blank instead of N.
                        >
                        > There are a few problems with the new ID given to Kevin Youkilis last year:
                        >
                        > In the AwardsPlayers table (2 entries) and the SchoolsPlayers table:
                        > youkike01 -> youklke01
                        >
                        >
                        > Overall, this looks like a really release. Thanks to everyone for all the hard work that's put into this. Looking forward to the new data as well.
                        >
                        > -Andy
                        >
                      • Charles
                        ... Make it start/relief for pitchers, and PH and either PR or DH for batters.
                        Message 11 of 16 , Jan 4, 2013
                        • 0 Attachment
                          --- In baseball-databank@yahoogroups.com, "Tangotiger" wrote:

                          > -starting: in starting lineup v sub (applies to both pitchers and
                          > nonpitchers)

                          Make it start/relief for pitchers, and PH and either PR or DH for batters.
                        • Tangotiger
                          ... I would include PH, PR, and defensive subs into one. DH is an in-betweener. If you want to count it as starting nine on the field v others , that could
                          Message 12 of 16 , Jan 4, 2013
                          • 0 Attachment
                            > --- In baseball-databank@yahoogroups.com, "Tangotiger" wrote:
                            >
                            >> -starting: in starting lineup v sub (applies to both pitchers and
                            >> nonpitchers)
                            >
                            > Make it start/relief for pitchers, and PH and either PR or DH for batters.
                            >
                            >

                            I would include PH, PR, and defensive subs into one.

                            DH is an in-betweener.

                            If you want to count it as "starting nine on the field" v "others", that
                            could be fine. If you want to count it as "starting nine on batting
                            lineup" v "others" that would make more sense, especially since we are
                            talking about batting stats. So, DH would be part of the starters.

                            Regardless: defensive sub plate appearances would be part of "others".

                            Ted's CWEVENT makes the calculation pretty easy (RESP_BAT_START_FL,
                            RESP_PIT_START_FL), as well as his fantastic CWSUB program. No point in
                            complicating things beyond the simple solution: start or not.

                            Tom


                            ---------------------------------------------
                            The Book--Playing The Percentages In Baseball
                            http://www.InsideTheBook.com
                          • Theodore Turocy
                            ... I do think there are in general a number of design improvements that could be done -- although I am mindful, in mentioning this, that there are plenty of
                            Message 13 of 16 , Jan 5, 2013
                            • 0 Attachment
                              On Jan 4, 2013, at 2:07 PM, anson2995 <seanlahman@...> wrote:

                              > Also, Jeff asked about adding doubles and triples by pitchers. It's something we ought to do for all seasons as far back as we are able. It raises the larger issue of fields that could be added to the main statistical tables, and I suspect there is also room for this type of improvement in the post-season tables as well. This is something I'll commit to tackling, but in as much as it will entail some changes to the structure of the tables, I'd rather wait until after we get this update squared away and do it as a separate project.
                              >

                              I do think there are in general a number of design improvements that could be done -- although I am mindful, in mentioning this, that there are plenty of downstream applications which depend on the existing format. We're probably stuck providing data in the current format for the indefinite future.

                              On items such as doubles and triples against: These aren't official statistics. That doesn't make them unimportant or not useful, or that we shouldn't provide them. But there's a distinction to be drawn between official stats and those that are derived from Retrosheet. For recent seasons -- most of the last two decades I expect -- there's not doing to be much, if any, difference. Before that, the number of variances between the two start to increase quickly.

                              A clean way to do this -- which can also accommodate the idea of providing splits -- is to report separately tables of data derived from Retrosheet. This can then leave it to the downstream user whether to adopt categories like doubles and triples allowed, or other such categories one might dream up. The table can then incorporate a suitable set of scoping columns as metadata, so that there can be "all games", "vs RHP", "vs LHP", "home", "away", etc. as a scope column.

                              Ted
                            • Clem Comly
                              In AwardsManagers it appears the 2012 winners have yearID of 2011. Also, should the value “Mgr of the year” in AwardID column of AwardsManagers be
                              Message 14 of 16 , Jan 8, 2013
                              • 0 Attachment
                                In  AwardsManagers it appears the 2012 winners have yearID of 2011.
                                Also, should the value “Mgr of the year” in AwardID column of AwardsManagers
                                be “BBWAA Mgr of the year”?
                                 
                                Any thought of adding TSN Mgr of the year winners  which go back to 1936? BBWAA Mgr of the year only goes back to 1983.
                                 
                                Clem Comly
                              • anson2995
                                Clem, I ll make this fix. Your suggestion about adding other awards reminds me that my late pal, Greg Spira, submitted an expanded list of awards to be added,
                                Message 15 of 16 , Jan 8, 2013
                                • 0 Attachment
                                  Clem,
                                  I'll make this fix. Your suggestion about adding other awards reminds me that my late pal, Greg Spira, submitted an expanded list of awards to be added, including some that are defunct. It included the TSN awards and many others. I believe Greg oversaw this section in both the ESPN encyclopedia and later editions of Total Baseball.

                                  Regards,
                                  Sean Lahman

                                  --- In baseball-databank@yahoogroups.com, "Clem Comly" wrote:
                                  >
                                  > In AwardsManagers it appears the 2012 winners have yearID of 2011.
                                  > Also, should the value “Mgr of the year” in AwardID column of AwardsManagers
                                  > be “BBWAA Mgr of the year”?
                                  > Any thought of adding TSN Mgr of the year winners which go back to 1936? BBWAA Mgr of the year only goes back to 1983.
                                  > Clem Comly
                                  >
                                • aidanshealy
                                  I run a database where we do a lot of information with splits. I figured I would share the basics of the database design here, and you can see how we handle
                                  Message 16 of 16 , Jan 9, 2013
                                  • 0 Attachment
                                    I run a database where we do a lot of information with splits. I figured I would share the basics of the database design here, and you can see how we handle things. Not everything is perfect here, but it really does allow for you to see just about any information set from a game, and gives you a fairly easy way to query all of that data in whatever way you want to look at it.

                                    Just throwing out some ideas...



                                    CREATE TABLE `splits` (
                                    `season` smallint(2) NOT NULL DEFAULT '18',
                                    `game_month` smallint(2) NOT NULL DEFAULT '0',
                                    `game_day` smallint(2) NOT NULL DEFAULT '0',
                                    `game_year` smallint(4) NOT NULL DEFAULT '0',
                                    `boxscore_id` bigint(13) NOT NULL DEFAULT '0',
                                    `org` int(1) NOT NULL DEFAULT '0',
                                    `type` int(5) NOT NULL,
                                    `ballpark_name` varchar(50) NOT NULL DEFAULT '',
                                    `batter_id` smallint(5) NOT NULL DEFAULT '0',
                                    `batter_team_id` smallint(2) NOT NULL DEFAULT '0',
                                    `batter_position` char(2) DEFAULT '',
                                    `batter_bat` char(1) DEFAULT '',
                                    `home_away` char(1) NOT NULL,
                                    `pitcher_id` smallint(5) NOT NULL DEFAULT '0',
                                    `pitcher_team_id` smallint(2) NOT NULL DEFAULT '0',
                                    `boxscore_ordinal` int(3) NOT NULL DEFAULT '0',
                                    `inning` int(2) NOT NULL DEFAULT '0',
                                    `outs` mediumint(1) NOT NULL DEFAULT '0',
                                    `hitting_batting_order` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_pa` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_ab` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_s` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_db` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_tr` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_hr` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_bb` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_iw` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_k` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_hbp` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_sh` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_sf` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_rbi` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_ro1b` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_ro2b` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_ro3b` tinyint(1) NOT NULL DEFAULT '0',
                                    `hitting_strikes` int(1) NOT NULL DEFAULT '0',
                                    `hitting_balls` int(1) NOT NULL DEFAULT '0',
                                    `hitting_pitch_count` int(2) NOT NULL DEFAULT '0',
                                    `hitting_description` varchar(200) DEFAULT NULL,
                                    `original_notation` char(30) DEFAULT NULL,
                                    `baserunning_run_scored` tinyint(1) NOT NULL DEFAULT '0',
                                    `baserunning_sba` int(1) NOT NULL DEFAULT '0',
                                    `baserunning_sb` int(1) NOT NULL DEFAULT '0',
                                    `baserunning_cs` int(1) NOT NULL DEFAULT '0',
                                    `baserunning_pickoff` int(1) NOT NULL DEFAULT '0',
                                    `baserunning_base` char(1) NOT NULL DEFAULT '',
                                    `pitching_outs` smallint(2) DEFAULT NULL,
                                    `pitching_inn` double(3,1) NOT NULL DEFAULT '0.0',
                                    `pitching_h` int(2) NOT NULL DEFAULT '0',
                                    `pitching_r` int(2) NOT NULL DEFAULT '0',
                                    `pitching_er` int(2) NOT NULL DEFAULT '0',
                                    `pitching_bb` int(2) NOT NULL DEFAULT '0',
                                    `pitching_k` int(2) NOT NULL DEFAULT '0',
                                    `pitching_pch` int(3) NOT NULL DEFAULT '0',
                                    `pitching_str` int(3) NOT NULL DEFAULT '0',
                                    `pitching_era` double(6,2) NOT NULL DEFAULT '0.00',
                                    `pitching_started` tinyint(1) NOT NULL DEFAULT '0',
                                    `pitching_finished` tinyint(1) NOT NULL DEFAULT '0',
                                    `pitching_win` tinyint(1) NOT NULL DEFAULT '0',
                                    `pitching_loss` tinyint(1) NOT NULL DEFAULT '0',
                                    `pitching_hold` tinyint(1) NOT NULL DEFAULT '0',
                                    `pitching_save` tinyint(1) NOT NULL DEFAULT '0',
                                    `pitching_blown_save` tinyint(1) NOT NULL DEFAULT '0',
                                    `pitching_cg` tinyint(1) NOT NULL DEFAULT '0',
                                    `pitching_shutout` tinyint(1) NOT NULL DEFAULT '0',
                                    `pitching_order` tinyint(2) DEFAULT NULL,
                                    `fielder_pitcher` varchar(40) DEFAULT NULL,
                                    `fielder_catcher` varchar(40) DEFAULT NULL,
                                    `fielder_first` varchar(40) DEFAULT NULL,
                                    `fielder_second` varchar(40) DEFAULT NULL,
                                    `fielder_third` varchar(40) DEFAULT NULL,
                                    `fielder_short` varchar(40) DEFAULT NULL,
                                    `fielder_left` varchar(40) DEFAULT NULL,
                                    `fielder_center` varchar(40) DEFAULT NULL,
                                    `fielder_right` varchar(40) DEFAULT NULL,
                                    )
                                  Your message has been successfully submitted and would be delivered to recipients shortly.