Loading ...
Sorry, an error occurred while loading the content.

Re: Short-Time Listener, First-Time Poster

Expand Messages
  • truane@vnet.ibm.com
    Thanks, Sean, for your quick response. ... That s what I was hoping. I m not concerned about most of these. The only thing that would have to be fixed is if
    Message 1 of 15 , Apr 2, 2002
    • 0 Attachment
      Thanks, Sean, for your quick response.

      > To answer Tom Ruane's first question, many of the fixes are stylistic
      > changes, such as changing player names from all upper case to a mix of
      > upper and lower. Some of what I've seen posted are not errors at all,
      > such as the inclusion of players who don't have
      > batting/pitching/fielding records (folks like umpire Al Barlick and
      > Negro Leaguer Cool Papa Bell). There are some valid issues to be
      > resolved around the consitency of the team/franchise ids and the use of
      > non-alpha characters in the player ids.

      That's what I was hoping. I'm not concerned about most of these. The
      only thing that would have to be fixed is if the same player was
      assigned different IDs in different years (or two different players
      got the same ID). The characters in these IDs don't matter to me,
      and I can probably figure out the team IDs (or at least I could the
      last time I looked at the data).

      > Rather than being distressed at the things that are uncovered, I think
      > it's a refreshing example of the power of peer review. That's the whole
      > point for opening this up and having this discussion. The whole
      > principle of the open source model is that many people can provide
      > feedback and improvements that can be incorporated into the central
      > database. The combined efforts of dozens can be more vigilant, more
      > analytical, and more creative than one person working alone. It's been
      > only five days since the database has been released, and a lot of
      > excellent peer review has taken place. I believe that the end result
      > will be a stronger and more robust database than has ever existed
      > before.

      I agree wholeheartedly. I was amazed at how quickly problems were
      unearned in Retrosheet's web-site once we were opened to the public.
      One problem in particular, affecting only two box scores out of about
      40,000, and was discovered in the first few days of release. The one
      thing that worried me about this discussion was the need to apply
      thousands of fixes that were specified only in SQL commands. It
      sounds as though this shouldn't be a concern.

      Thanks again.
      Tom Ruane
    • tom tom
      I second Sean L s comment that the errors are not distressing. The thousands of records wrong in the Fielding table with respect to the team IDs looks to me
      Message 2 of 15 , Apr 3, 2002
      • 0 Attachment
        I second Sean L's comment that the "errors" are not
        distressing.

        The thousands of records wrong in the Fielding table
        with respect to the team IDs looks to me like Sean L
        might have used Sean F or Retro's team IDs. I'm sure
        this can be easily corrected with a Xref table.

        The errors I have found is with respect to
        "referential integrity checks". Basically, a record
        found in a "child" like batting and pitching, but does
        not exist in the "parent" (i.e., the master table).
        In almost all cases, it's because of the mixup in the
        use of the single-quote and period in the Lahman ID.

        There are probably 10 or 20 records in the whole
        database that need to be looked at more closely
        (especially the duplicate keys) to try to correct
        them.

        I have NOT looked if the individual pieces of data are
        themselves accurate. Last year, I tried to do this by
        simply doing "sums" of the batting table and compare
        to the team table. But, as one of the Seans pointed
        out to me, the team table is what is reported by the
        leagues, and it doesn't not necessarily add up!
        That's crazy from my perspective. So, I end up
        creating my own "teams" table that is a sum of the
        hitting/pitching/fielding tables.

        Thanks, Tom (aka Tangotiger)

        __________________________________________________
        Do You Yahoo!?
        Yahoo! Tax Center - online filing with TurboTax
        http://taxes.yahoo.com/
      • tom tom
        ... So far, the only crossing-over of data seems to be the two Ed Diaz of the 90s. I m not sure who was who, and therefore collapsed them into the same
        Message 3 of 15 , Apr 3, 2002
        • 0 Attachment
          --- truane@... wrote:
          > That's what I was hoping. I'm not concerned about
          > most of these. The
          > only thing that would have to be fixed is if the
          > same player was
          > assigned different IDs in different years (or two
          > different players
          > got the same ID). The characters in these IDs don't
          > matter to me,
          > and I can probably figure out the team IDs (or at
          > least I could the
          > last time I looked at the data).
          >

          So far, the only "crossing-over" of data seems to be
          the two "Ed Diaz" of the 90s. I'm not sure who was
          who, and therefore collapsed them into the same one
          player, and leave it to Sean L to sort out.

          Once the team ID in the fielding table has been
          cleared, I'll be able to look at more of these
          "crossing over" players.

          Thanks, Tom

          __________________________________________________
          Do You Yahoo!?
          Yahoo! Tax Center - online filing with TurboTax
          http://taxes.yahoo.com/
        • micke.hovmoller@stockholmsborsen.se
          ... Actually, for some stats it s correct that they don t add up! For battng stats I think they should add up, but for pitching that is not the case. One
          Message 4 of 15 , Apr 3, 2002
          • 0 Attachment
            On 2002-04-03 17:21:11 tom tom wrote:

            >I have NOT looked if the individual pieces of data are
            >themselves accurate. Last year, I tried to do this by
            >simply doing "sums" of the batting table and compare
            >to the team table. But, as one of the Seans pointed
            >out to me, the team table is what is reported by the
            >leagues, and it doesn't not necessarily add up!
            >That's crazy from my perspective. So, I end up
            >creating my own "teams" table that is a sum of the
            >hitting/pitching/fielding tables.

            Actually, for some stats it's correct that they don't add up! For battng stats I
            think they should add up, but for pitching that is not the case. One example is
            unearned runs (and thus also ERA etc). If a pitcher gets two outs, an error that
            would have been the third out and is then replaced, any batters that the new
            pitcher pitch to that scores are earned for that pitcher but unearned for the
            team!

            However, that's not very common, obviously. I used the ML Fact Book to go
            through all the data from the 1997-1999 seasons and found a few errors. They are
            posted on the baseball1 message boards.

            As for the quality of data: I'd assume that there are still several hundred up
            to a few thousand errors in teh data. That may seem like a whole lot, but really
            isn't. This database is *HUGE* and some of the data is more than 100 years old,
            so the records are unclear and difficult to validate today. Also, the database
            contains millions of data points, so if one tenth of one percent of these are
            wrong, that is way, way better than most any database I have worked with
            professionally. Also, these errors are typically small and random, so it's
            highly unlikely that any studies made on the materail will rech incorrect
            conclusions.

            /Micke
          • tmasc
            ... project ... I noticed that Retro had more pitching categories http://www.retrosheet.org/boxesetc/Pgrovl101.htm namely, Runs (as opposed to just ER), as
            Message 5 of 15 , Apr 3, 2002
            • 0 Attachment
              --- In baseball-databank@y..., truane@v... wrote:
              > I was planning on using the data on baseball1.com as part of a
              project
              > I'm working on with Retrosheet...

              I noticed that Retro had more pitching categories
              http://www.retrosheet.org/boxesetc/Pgrovl101.htm
              namely, Runs (as opposed to just ER), as well as BFP, WP, HBP, BK.

              I know that Tom R said (in the Retro group) that he only generates
              internal data structures of the data, and the "final form" exists
              only as web pages. And David S of Retro said that Retro is a
              provider of bare data (library), and not compiled data (analysts).

              Tom R: are the programs that created those web pages, as well as the
              source data, property of Retrosheet, and it would be an "issue" to do
              something with them? Or can you (easily) modify your programs to
              generate pitching and hitting tables? I'd love to see incorporated
              some of the data from the Retro site.

              I would think that between the data from Tom R, Sean L, and Sean F,
              we should be able to resolve any discrepencies.

              Thanks, Tom
            • Derek Adair
              I m positive Retrosheet has better data than I do, but I can provide some of this data. In my custom version of the DB, in pitching, I have R, BFP, GF, BK,
              Message 6 of 15 , Apr 3, 2002
              • 0 Attachment
                I'm positive Retrosheet has better data than I do, but I can provide some
                of this data. In my custom version of the DB, in pitching, I have R, BFP,
                GF, BK, IBB, WP. I'd be more than happy to provide that data to the
                "cause" - however, unfortunately, in older years where data was not
                tracked, I have 0 instead of null. My other issue is that I have 34224
                records in my pitching table, while the current Lahman DB has 36468, so
                I'll need to track down those issues.

                In my batting table, I have GIDP. Same issues here as above, except I have
                more batting records (81675 to 73272 - my guess is that I've got a lot of
                empty pitching batting records for AL pitchers in my db).

                In my fielding table, I have PB. My fielding table matches record-wise, so
                that data should import very cleanly.

                I'll be tracking down the discrepancies in the number of records, but
                Sean/Sean/anyone else, let me know if providing this data would help.

                Regards,
                Derek


                On Wed, 3 Apr 2002, tmasc wrote:

                > --- In baseball-databank@y..., truane@v... wrote:
                > > I was planning on using the data on baseball1.com as part of a
                > project
                > > I'm working on with Retrosheet...
                >
                > I noticed that Retro had more pitching categories
                > http://www.retrosheet.org/boxesetc/Pgrovl101.htm
                > namely, Runs (as opposed to just ER), as well as BFP, WP, HBP, BK.
                >
                > I know that Tom R said (in the Retro group) that he only generates
                > internal data structures of the data, and the "final form" exists
                > only as web pages. And David S of Retro said that Retro is a
                > provider of bare data (library), and not compiled data (analysts).
                >
                > Tom R: are the programs that created those web pages, as well as the
                > source data, property of Retrosheet, and it would be an "issue" to do
                > something with them? Or can you (easily) modify your programs to
                > generate pitching and hitting tables? I'd love to see incorporated
                > some of the data from the Retro site.
                >
                > I would think that between the data from Tom R, Sean L, and Sean F,
                > we should be able to resolve any discrepencies.
                >
                > Thanks, Tom
                >
                >
                >
                >
                > http://www.baseball-databank.org/
                >
                > To unsubscribe from this group, send an email to:
                > baseball-databank-unsubscribe@yahoogroups.com
                >
                >
                >
                > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                >
                >
                >

                Derek Adair
                dadair@...

                And turning on his heel
                He left a trace of bubbles
                Bleeding in his stead
              • Derek Adair
                A few 2000/2001 issues: Hilton Smith is not in the HOF table. Here s my entry: smithhi99,2001,VC,,,, P ,Player The managers table does not have 2001 entries.
                Message 7 of 15 , Apr 3, 2002
                • 0 Attachment
                  A few 2000/2001 issues:

                  Hilton Smith is not in the HOF table. Here's my entry:
                  smithhi99,2001,VC,,,,"P",Player

                  The managers table does not have 2001 entries.
                  The postbatting and postpitching tables are two years behind (2000, 2001).
                  The awards table has some problem entries in 2001 - ichiro, pujols,
                  counsell, and pettitte are all used as ID's. Gold gloves are omitted for
                  2001 as well.

                  Regards,
                  Derek
                • Derek Adair
                  My awards table entries for 2001: pettian01,ALCS MVP,,2001,AL, ripkeca01,AS MVP,,2001,ML, clemero02,Cy Young,,2001,AL, johnsra05,Cy Young,,2001,NL,
                  Message 8 of 15 , Apr 3, 2002
                  • 0 Attachment
                    My awards table entries for 2001:

                    pettian01,ALCS MVP,,2001,AL,
                    ripkeca01,AS MVP,,2001,ML,
                    clemero02,Cy Young,,2001,AL,
                    johnsra05,Cy Young,,2001,NL,
                    mientdo01,Gold Glove,,2001,AL,1B
                    alomaro01,Gold Glove,,2001,AL,2B
                    chaveer01,Gold Glove,,2001,AL,3B
                    rodriiv01,Gold Glove,,2001,AL,C
                    camermi01,Gold Glove,,2001,AL,OF
                    hunteto01,Gold Glove,,2001,AL,OF
                    suzukic01,Gold Glove,,2001,AL,OF
                    mussimi01,Gold Glove,,2001,AL,P
                    vizquom01,Gold Glove,,2001,AL,SS
                    heltoto01,Gold Glove,,2001,NL,1B
                    vinafe01,Gold Glove,,2001,NL,2B
                    rolensc01,Gold Glove,,2001,NL,3B
                    ausmubr01,Gold Glove,,2001,NL,C
                    edmonji01,Gold Glove,,2001,NL,OF
                    jonesan01,Gold Glove,,2001,NL,OF
                    walkela01,Gold Glove,,2001,NL,OF
                    maddugr01,Gold Glove,,2001,NL,P
                    cabreor01,Gold Glove,,2001,NL,SS
                    pinielo01,Mgr of the year,,2001,AL,
                    brenlbo01,Mgr of the year,,2001,NL,
                    bondsba01,MVP,,2001,NL,
                    suzukic01,MVP,,2001,AL,
                    counscr01,NLCS MVP,,2001,NL,
                    riverma01,Rolaids Relief,,2001,AL,
                    benitar01,Rolaids Relief,,2001,NL,
                    suzukic01,Rookie of the Year,,2001,AL,
                    pujolal01,Rookie of the Year,,2001,NL,
                    johnsra05,WS MVP,Y,2001,ML,
                    schilcu01,WS MVP,Y,2001,ML,

                    I'd suggest spelling out Manager in "Mgr of the year" as well as
                    capitalizing Year (which also be consistent with "Rookie of the Year").

                    Regards,
                    Derek

                    On Wed, 3 Apr 2002, Derek Adair wrote:

                    > A few 2000/2001 issues:
                    >
                    > Hilton Smith is not in the HOF table. Here's my entry:
                    > smithhi99,2001,VC,,,,"P",Player
                    >
                    > The managers table does not have 2001 entries.
                    > The postbatting and postpitching tables are two years behind (2000, 2001).
                    > The awards table has some problem entries in 2001 - ichiro, pujols,
                    > counsell, and pettitte are all used as ID's. Gold gloves are omitted for
                    > 2001 as well.
                    >
                    > Regards,
                    > Derek
                  • Derek Adair
                    set LahmanID= smith01 in fielding where LahmanID= smithed05 and G= 1 ; This should affect two records. As noted elsewhere, smithed05 should be added to
                    Message 9 of 15 , Apr 3, 2002
                    • 0 Attachment
                      set LahmanID="smith01" in fielding where LahmanID="smithed05" and
                      G="1";

                      This should affect two records.

                      As noted elsewhere, smithed05 should be added to master.

                      Regards,
                      Derek
                    • Derek Adair
                      Well, I found where the Gold Gloves went - all 2001 Gold Gloves are coded as 2000. Regards, Derek
                      Message 10 of 15 , Apr 3, 2002
                      • 0 Attachment
                        Well, I found where the Gold Gloves went - all 2001 Gold Gloves are coded
                        as 2000.

                        Regards,
                        Derek

                        On Wed, 3 Apr 2002, Derek Adair wrote:

                        > My awards table entries for 2001:
                        >
                        > pettian01,ALCS MVP,,2001,AL,
                        > ripkeca01,AS MVP,,2001,ML,
                        > clemero02,Cy Young,,2001,AL,
                        > johnsra05,Cy Young,,2001,NL,
                        > mientdo01,Gold Glove,,2001,AL,1B
                        > alomaro01,Gold Glove,,2001,AL,2B
                        > chaveer01,Gold Glove,,2001,AL,3B
                        > rodriiv01,Gold Glove,,2001,AL,C
                        > camermi01,Gold Glove,,2001,AL,OF
                        > hunteto01,Gold Glove,,2001,AL,OF
                        > suzukic01,Gold Glove,,2001,AL,OF
                        > mussimi01,Gold Glove,,2001,AL,P
                        > vizquom01,Gold Glove,,2001,AL,SS
                        > heltoto01,Gold Glove,,2001,NL,1B
                        > vinafe01,Gold Glove,,2001,NL,2B
                        > rolensc01,Gold Glove,,2001,NL,3B
                        > ausmubr01,Gold Glove,,2001,NL,C
                        > edmonji01,Gold Glove,,2001,NL,OF
                        > jonesan01,Gold Glove,,2001,NL,OF
                        > walkela01,Gold Glove,,2001,NL,OF
                        > maddugr01,Gold Glove,,2001,NL,P
                        > cabreor01,Gold Glove,,2001,NL,SS
                        > pinielo01,Mgr of the year,,2001,AL,
                        > brenlbo01,Mgr of the year,,2001,NL,
                        > bondsba01,MVP,,2001,NL,
                        > suzukic01,MVP,,2001,AL,
                        > counscr01,NLCS MVP,,2001,NL,
                        > riverma01,Rolaids Relief,,2001,AL,
                        > benitar01,Rolaids Relief,,2001,NL,
                        > suzukic01,Rookie of the Year,,2001,AL,
                        > pujolal01,Rookie of the Year,,2001,NL,
                        > johnsra05,WS MVP,Y,2001,ML,
                        > schilcu01,WS MVP,Y,2001,ML,
                        >
                        > I'd suggest spelling out Manager in "Mgr of the year" as well as
                        > capitalizing Year (which also be consistent with "Rookie of the Year").
                        >
                        > Regards,
                        > Derek
                      • Derek Adair
                        Tip O Neill is no longer in awards as a Triple Crown winner in 1887. I think he should be added back, based on his standings in the leaderboards (as per
                        Message 11 of 15 , Apr 3, 2002
                        • 0 Attachment
                          Tip O'Neill is no longer in awards as a Triple Crown winner in 1887. I
                          think he should be added back, based on his standings in the leaderboards
                          (as per baseball-reference.com). This could possibly be correct, if it's
                          one of those cases where walks vs. hits were reworked. Not sure if this is
                          even in that time period, though.

                          Managers not in managers:
                          Gene Lamont 1995 CHW
                          Cookie Rojas 1996 FLA

                          Regards,
                          Derek
                        • Sean Forman
                          ... I ve got all of these. Sincerely, Sean Forman Baseball Stats! http://www.Baseball-Reference.com/ Baseball Analysis! http://www.BaseballPrimer.com/
                          Message 12 of 15 , Apr 3, 2002
                          • 0 Attachment
                            Derek Adair wrote:
                            >
                            > A few 2000/2001 issues:
                            >
                            > Hilton Smith is not in the HOF table. Here's my entry:
                            > smithhi99,2001,VC,,,,"P",Player
                            >
                            > The managers table does not have 2001 entries.
                            > The postbatting and postpitching tables are two years behind (2000, 2001).
                            > The awards table has some problem entries in 2001 - ichiro, pujols,
                            > counsell, and pettitte are all used as ID's. Gold gloves are omitted for
                            > 2001 as well.
                            >
                            > Regards,
                            > Derek

                            I've got all of these.

                            Sincerely,
                            Sean Forman

                            Baseball Stats! http://www.Baseball-Reference.com/
                            Baseball Analysis! http://www.BaseballPrimer.com/
                          • Paul Wendt
                            ... In 1887, a base on balls was scored a hit (and an atbat). It is the one season of the 2001-2002 brouhaha, because official MLB records reverted to
                            Message 13 of 15 , Apr 4, 2002
                            • 0 Attachment
                              On Wed, 3 Apr 2002, Derek Adair wrote:

                              > Tip O'Neill is no longer in awards as a Triple Crown winner in 1887. I
                              > think he should be added back, based on his standings in the leaderboards
                              > (as per baseball-reference.com). This could possibly be correct, if it's
                              > one of those cases where walks vs. hits were reworked. Not sure if this is
                              > even in that time period, though.

                              In 1887, a base on balls was scored a hit (and an atbat). It is the one
                              season of the 2001-2002 brouhaha, because official MLB records reverted to
                              contemporary scoring.

                              Fred I-C, John Thorn, and David Nemec all provided some explanation
                              (sharply critical from Nemec) on 19cBB, the egroup I administer here.

                              Why doesn't Tip O'Neill win the Triple Crown by both methods of scoring?
                              I don't know. He leads in batting at .435 or .492 (from memory).

                              ----Paul

                              Paul Wendt, Watertown MA, USA <pgw@...>
                              Chair, 19th Century committee, SABR
                              Owner-Administrator, 19cBB (egroup at Yahoo)
                            Your message has been successfully submitted and would be delivered to recipients shortly.