Loading ...
Sorry, an error occurred while loading the content.

Re: [baseball-databank] Re: Newbie Question on IDs

Expand Messages
  • Tangotiger
    Sean, You aren t missing anything. The current proposal in the BDB Design group is to have a single numeric, never-changing, field. This would be a PERSONid,
    Message 1 of 28 , Nov 17, 2004
    • 0 Attachment
      Sean,

      You aren't missing anything. The current proposal in
      the BDB Design group is to have a single numeric,
      never-changing, field. This would be a PERSONid, and
      we'd have a PERSONS table.

      We'd also have a PERSONMAP table that would have all
      the various IDs (including backward compatibility to
      Lahman4, 4.5, 5, 5.1) to map all the various sources
      of data to the PERSONid.

      There are dozens of changes required to the current
      version of the BDB so that we can finally have
      something stable enough that we won't have to revisit
      these issues.

      Michael Westbay had also other fascinating ideas
      regarding LOCATION and NAMES, which will become very
      important once we incorporate Japanese data.

      KJOK also has his PARKS db, which he and I normalized
      2 years ago, and has not yet been incorporated into
      the BDB.

      I suggest that any change you make to the DB be done
      all in one shot, so that you don't have several
      releases over several months.

      Tom




      --- Sean Lahman <slahman@...> wrote:

      > Tom Ruane raised a question about playerIDs last
      > week, the problem that the
      > master table has three different sets of IDs to
      > identify players in various
      > tables. While I initially explained the rationale
      > for the current design, I
      > have to confess that I wasn't involved in the
      > decision to make that design
      > change, and the more I reflect on it, the less sense
      > it makes to me. I'm
      > beginning to see why this current model is unwieldy.
      > Perhaps someone from
      > the design committee can help me to understand why
      > it makes any sense.
      >
      > I understand the idea of mapping tables to integrate
      > data sets from other
      > sources. The current version of the master table
      > has five fields for that
      > purpose right now. That's necessary and should
      > continue. However, what I
      > guess I don't understand is the benefit of having
      > the key value for the
      > master table split into a playerID, managerID, and
      > hofID. As Ruane
      > suggested, this causes problems in some of the other
      > tables. In the HOF
      > table for example, we record every player who
      > received a vote for induction,
      > and these records all have an ID tagged with the "h"
      > suffix. What that
      > means, then, is that a player who received votes but
      > wasn't elected "Lenny
      > Dykstra" has both a playerID and a hofID. I don't
      > know what this
      > accomplishes except creating confusion. It requires
      > new IDs to be created
      > every year for existing players (Juan Samuel, for
      > example, who received HOF
      > votes this year.) As both Ruane and F.X. Flinn
      > argued, this approach
      > violates the basic tenets of a relational database.
      > Why not tag every
      > player with one ID when they make their debut and
      > use that to identify them
      > in every table?
      >
      > Whether the key identifier for someone in the master
      > table is an integer or
      > a string, it's gotta be one value that never
      > changes. I propose that the
      > master table be simplified to remove the managerID
      > and hofID fields and
      > consolidate them into one playerID field. If anyone
      > disagrees, please tell
      > me what I'm missing.
      >
      > Regards,
      > Sean Lahman
      >
      >
      >




      __________________________________
      Do you Yahoo!?
      Meet the all-new My Yahoo! - Try it today!
      http://my.yahoo.com
    • Sean Forman
      ... I think it can be summarized as not everyone in the HOF was a player and many of these people (Yawkey, Barrow, etc.) were never players. And it is
      Message 2 of 28 , Nov 17, 2004
      • 0 Attachment
        > Whether the key identifier for someone in the master table is an integer or
        > a string, it's gotta be one value that never changes. I propose that the
        > master table be simplified to remove the managerID and hofID fields and
        > consolidate them into one playerID field. If anyone disagrees, please tell
        > me what I'm missing.
        >
        > Regards,
        > Sean Lahman


        I think it can be summarized as not everyone in the HOF was a player and
        many of these people (Yawkey, Barrow, etc.) were never players. And it
        is uncertain how the HOF will run things in the future. The HOFid was
        done to be consistent with ideas for adding additional data such as
        manager data, minor league data, japanese data, korean leagues data,
        cuban league data, independent league data, umpire data, coaches data,
        executives data, broadcaster data, negro league data, winter league
        data, spring training data, aagpbl data, college data, hs data and any
        other possible data sets. HOFers do not fit neatly within any other group.

        Also, why does the hofID violate relational db? If the writers ever
        vote for non-players the system as designed will be fine. The playerID
        is for players who appeared in an mlb game (people in the fielding,
        batting, pitching tables). the HOFID is for a person who received an
        HOF vote, there is no guarantee that one set will always be included in
        the other. The managerID is for the people who have managed a game.
        I'd be mildly surprised if a non-player has never appeared on a writer's
        ballot (even if only as an uncounted protest vote).

        We have an imperfect system for imperfect data. And any system we
        choose will have its flaws. The point is not to have a perfectly
        correct system, but to distribute data to the widest possible audience.
        I have asked for feedback on the 2004 data and have received very
        little feedback (thank you emancip8ed for your comments (any nulls are
        data I'm hoping to have filled in)).

        One other thing. I don't really understand the fanaticism for strictly
        numerical keys.

        I know tangotiger (Tom M??) wants to get the parks data into the db
        asap. What other data would you like to see done in the next two
        months? I'll prioritize, and if I don't get anything done in the next
        two months, I'll set up a web account on my server, step aside, and let
        someone else take the reins. I'm under the impression that that
        postseason data is being produced. Is this correct?

        Sorry for the rant.

        Sincerely,
        Sean Forman

        Baseball Stats! http://www.Baseball-Reference.com/
      • Sean Lahman
        ... Yes. I ve also done updates for the four awards tables, HOF, AllStars, and the SeriesPost table. I fixed the problems with place of birth info for people
        Message 3 of 28 , Nov 17, 2004
        • 0 Attachment
          Sean Forman wrote:
          > I'm under the impression that that
          > postseason data is being produced. Is this correct?

          Yes. I've also done updates for the four awards tables, HOF, AllStars, and
          the SeriesPost table. I fixed the problems with place of birth info for
          people who debuted in 2004. I'll be done and post copies to the site
          sometime Thursday. I believe that will complete all of the updates needed
          for the end of 2004 season.

          I think with that, it's time to prepare the release of the full 2004
          database. Other projects, such as additional data tables and any proposed
          re-designs can be tackled over the winter.

          Regards,
          Sean Lahman
        • Tangotiger
          ... It doesn t violate relational DB, but it probably violates 3NF. You shouldn t have raineti01, and raineti01h in the same record. Every field must stand on
          Message 4 of 28 , Nov 17, 2004
          • 0 Attachment
            --- Sean Forman <sean-forman@...>
            wrote:
            > Also, why does the hofID violate relational db? If

            It doesn't violate relational DB, but it probably
            violates 3NF. You shouldn't have
            raineti01, and raineti01h
            in the same record.
            Every field must stand on its own, and not be a
            concatenation of some other set of fields in the same
            record. Certainly not the key field. You also want a
            numeric key for any table that has a child table,
            which just makes it convenient to have a numeric key
            in all tables. I wouldn't have BOS as a park key
            either. My rule of thumb is to avoid any possibility
            of a key change. This is all from an admin DB
            viewpoint, which brings me to...


            > We have an imperfect system for imperfect data.
            > And any system we
            > choose will have its flaws. The point is not to
            > have a perfectly
            > correct system, but to distribute data to the widest
            > possible audience.

            ...The current proposal:
            http://www.baseball-databank.org/purpose.txt
            specifically says:

            "This databank, once it is fully normalized and
            proofed, will be the standard source for those
            professionals creating new data products... But first
            and foremost, the Baseball Databank is a library of
            authoritative baseball statistics and information
            maintained in a simple-to-access format for
            information providers and baseball researchers."

            Fully normalized is presumably 3NF. Reading the
            above, and my impression is that the DB is for
            developers (information providers and researchers) and
            not the widest audience possible.

            It is the information providers (presumably Sean,
            Sean, Westbay, and others) that will make a form of
            this database available to the widest audience
            possible, with likely redundant tables, alphanumeric
            keys, and other things that improve usability, but
            degrade normalization.


            > I have asked for feedback on the 2004 data and
            > have received very
            > little feedback (thank you emancip8ed for your
            > comments (any nulls are
            > data I'm hoping to have filled in)).

            I loaded the data, and have only come across the NULL
            situation. I have not tested the data, but so far so
            good.

            >
            > One other thing. I don't really understand the
            > fanaticism for strictly
            > numerical keys.

            I hope you do now!!

            >
            > I know tangotiger (Tom M??) wants to get the parks
            > data into the db
            > asap. What other data would you like to see done in
            > the next two
            > months? I'll prioritize, and if I don't get

            Probably Ruane's upload, though I haven't verified if
            that was done yet, since I only downloaded the 04
            data.


            > anything done in the next
            > two months, I'll set up a web account on my server,
            > step aside, and let
            > someone else take the reins. I'm under the
            > impression that that
            > postseason data is being produced. Is this correct?

            I supplied KJOK with the fielding data, and presumably
            he will attach IDs to the names.

            >
            > Sorry for the rant.
            >
            > Sincerely,
            > Sean Forman

            Rants are encouraged.

            Tom
            --------------------------------
            http://www.tangotiger.net








            __________________________________
            Do you Yahoo!?
            The all-new My Yahoo! - Get yours free!
            http://my.yahoo.com
          • Derek Adair
            Tom/Sean/anyone else who s interested, I wasn t sure what I was missing either, so I looked through the archives, since we obviously made the changes for a
            Message 5 of 28 , Nov 17, 2004
            • 0 Attachment
              Tom/Sean/anyone else who's interested,

              I wasn't sure what I was missing either, so I looked through the archives,
              since we obviously made the changes for a reason.

              There are at least two distinct issues here. I refer to:

              http://sports.groups.yahoo.com/group/baseball-databank/message/568 (and
              the following few messages in the thread)

              First is the issue of numeric vs. string ID's. I honestly haven't seen a
              compelling reason to switch. The two reasons I've seen are that numeric
              ID's won't need to change, and that that's the way DB's are generally
              designed. As previously addressed, I think we have enough differences from
              a typical business DB to need to handle things a bit differently, and a
              retrosheet approach where string ID's are fixed could work for us as well.
              On the flip side, I see visual inspection of the data will be much more
              difficult with numeric keys. I'm completely open to being convinced here,
              so if anyone has concrete benefits I'm all ears.

              Second (and most important) is the issue of what ID's we'll use in the
              "core" BDB tables. Sean L.'s proposal was to collapse the ID's in Master.
              If we have only a single primary key across the code DB, then we end up
              with the problem that post 568 describes.

              I think we're getting enough secondary source input that having the person
              table makes sense, but I'd also propose that we handle our own tables the
              same way - each logical grouping has its own primary key set which is
              linked to through the person table. So instead of collapsing master, I'd
              shift everything over and keep it as is (except for possible changes under
              the first issue).

              Regards,
              Derek
            • Tangotiger
              Can I suggest that design issues be brought up with in the BDB Design group? Michael has done some tremendous work, and we ve gone through many of these
              Message 6 of 28 , Nov 18, 2004
              • 0 Attachment
                Can I suggest that design issues be brought up with in
                the BDB Design group? Michael has done some
                tremendous work, and we've gone through many of these
                issues. The group itself has been dormant for a
                while, but it might be time for it to erupt again.

                After we've done a good design, we can propose it
                here, and get feedback as to how to proceed.

                This will also stay in line with Lahman's request that
                we hold off on design issues for the next few months.

                Tom





                --- Derek Adair <dadair@...> wrote:

                > Tom/Sean/anyone else who's interested,
                >
                > I wasn't sure what I was missing either, so I looked
                > through the archives,
                > since we obviously made the changes for a reason.
                >
                > There are at least two distinct issues here. I refer
                > to:
                >
                >
                http://sports.groups.yahoo.com/group/baseball-databank/message/568
                > (and
                > the following few messages in the thread)
                >
                > First is the issue of numeric vs. string ID's. I
                > honestly haven't seen a
                > compelling reason to switch. The two reasons I've
                > seen are that numeric
                > ID's won't need to change, and that that's the way
                > DB's are generally
                > designed. As previously addressed, I think we have
                > enough differences from
                > a typical business DB to need to handle things a bit
                > differently, and a
                > retrosheet approach where string ID's are fixed
                > could work for us as well.
                > On the flip side, I see visual inspection of the
                > data will be much more
                > difficult with numeric keys. I'm completely open to
                > being convinced here,
                > so if anyone has concrete benefits I'm all ears.
                >
                > Second (and most important) is the issue of what
                > ID's we'll use in the
                > "core" BDB tables. Sean L.'s proposal was to
                > collapse the ID's in Master.
                > If we have only a single primary key across the code
                > DB, then we end up
                > with the problem that post 568 describes.
                >
                > I think we're getting enough secondary source input
                > that having the person
                > table makes sense, but I'd also propose that we
                > handle our own tables the
                > same way - each logical grouping has its own primary
                > key set which is
                > linked to through the person table. So instead of
                > collapsing master, I'd
                > shift everything over and keep it as is (except for
                > possible changes under
                > the first issue).
                >
                > Regards,
                > Derek
                >




                __________________________________
                Do you Yahoo!?
                The all-new My Yahoo! - Get yours free!
                http://my.yahoo.com
              • Sean Forman
                ... I really, really think this is an invitation for transposition errors being added into the db. For instance, when the SABR biographical committee sends out
                Message 7 of 28 , Nov 19, 2004
                • 0 Attachment
                  > >
                  > > One other thing. I don't really understand the
                  > > fanaticism for strictly
                  > > numerical keys.
                  >
                  > I hope you do now!!


                  I really, really think this is an invitation for transposition errors
                  being added into the db.

                  For instance, when the SABR biographical committee sends out their
                  bi-monthly files. They list about 80 or so players who have died or for
                  whom new data has been located. Here are the steps that would need to
                  be taken for these changes to be incorporated.

                  select ID, nameLast, nameFirst, debut from Master where nameLast='smith'
                  and nameFirst='john';

                  Then figure out which numeric key is the right one.

                  update deathState='NY', deathCountry='USA', deathCity='brooklyn' where
                  ID='015324';

                  I know with the places table being proposed, I might even have to look
                  up numeric values for the state, country and city. Would I have to look
                  up new values for the year, month and day as well? ;-)

                  I just think the likelihood of errors probably goes up by a factor of
                  ten with this scheme.

                  For a change to the batting table which often happens when the records
                  committee sends out their newsletter.

                  select ID, nameLast, nameFirst, debut from Master where nameLast='aaron'
                  and nameFirst like 'h%';

                  select teamID from Teams where teamCity='Milwaukee' and yearID=1961;

                  update Batting set HR=12 where yearID=1961 and teamID=23 and ID='000121'
                  and stint=1;

                  Maybe I'm just being a whiner, but I see this as being less than ideal.

                  --
                  Sincerely,
                  Sean Forman

                  Baseball Stats! http://www.Baseball-Reference.com/
                • Tangotiger
                  ... Au contraire! UPDATE PERSONBIO p INNER JOIN LOCATION l SET p.stateid = l.stateid where l.statename = FL ; You could make it even better by having a
                  Message 8 of 28 , Nov 19, 2004
                  • 0 Attachment
                    --- Sean Forman <sean-forman@...>
                    wrote:
                    > I really, really think this is an invitation for
                    > transposition errors
                    > being added into the db.

                    Au contraire!

                    UPDATE PERSONBIO p
                    INNER JOIN LOCATION l
                    SET p.stateid = l.stateid
                    where l.statename = 'FL';

                    You could make it even better by having a TRANSACTIONS
                    table, and add a third join, instead of the hardcoded
                    'FL'.

                    Tom


                    __________________________________________________
                    Do You Yahoo!?
                    Tired of spam? Yahoo! Mail has the best spam protection around
                    http://mail.yahoo.com
                  • Sean Lahman
                    Just noticed today that the stint IDs for Juan Encarnacion s 2004 season are reversed. He was traded from Florida to LA, so Fla should be #1 and LA #2.
                    Message 9 of 28 , Nov 19, 2004
                    • 0 Attachment
                      Just noticed today that the stint IDs for Juan Encarnacion's 2004 season are
                      reversed. He was traded from Florida to LA, so Fla should be #1 and LA #2.
                      Currently, it is reveresed.

                      Regards,
                      Sean Lahman
                    • Richard Shoults
                      Respectfully, no. In 2004 he was traded from LA to FLA where he ended the season. Rick Sean Lahman wrote: Just noticed today that the
                      Message 10 of 28 , Nov 19, 2004
                      • 0 Attachment
                        Respectfully, no.  In 2004 he was traded from LA to FLA where he ended the season.
                         
                        Rick

                        Sean Lahman <slahman@...> wrote:

                        Just noticed today that the stint IDs for Juan Encarnacion's 2004 season are
                        reversed. He was traded from Florida to LA, so Fla should be #1 and LA #2.
                        Currently, it is reveresed.

                        Regards,
                        Sean Lahman




                        ------------------------ Yahoo! Groups Sponsor --------------------~-->
                        $9.95 domain names from Yahoo!. Register anything.
                        http://us.click.yahoo.com/J8kdrA/y20IAA/yQLSAA/VCUolB/TM
                        --------------------------------------------------------------------~->

                        http://www.baseball-databank.org/
                        Yahoo! Groups Links

                        <*> To visit your group on the web, go to:
                        http://groups.yahoo.com/group/baseball-databank/

                        <*> To unsubscribe from this group, send an email to:
                        baseball-databank-unsubscribe@yahoogroups.com

                        <*> Your use of Yahoo! Groups is subject to:
                        http://docs.yahoo.com/info/terms/




                      • Sean Lahman
                        I noticed today that there are about 800 players in the master table who don t have a debut date. On closer inspection, most of these players made their debut
                        Message 11 of 28 , Nov 19, 2004
                        • 0 Attachment
                          I noticed today that there are about 800 players in the master table who
                          don't have a debut date. On closer inspection, most of these players made
                          their debut between 1998 and 2001. Right now, there are less than two dozen
                          players in the table with a value in the debut field for that five year
                          period.

                          I'll take a look at resolving this issue, but just wanted to alert everybody
                          to the problem.

                          Regards,
                          Sean Lahman
                        • Derek Adair
                          Were these in the fixes from Michael Mavriogannis I asked about a couple of days ago? http://sports.groups.yahoo.com/group/baseball-databank/message/2089
                          Message 12 of 28 , Nov 19, 2004
                          • 0 Attachment
                            Were these in the fixes from Michael Mavriogannis I asked about a couple
                            of days ago?

                            http://sports.groups.yahoo.com/group/baseball-databank/message/2089

                            Regards,
                            Derek

                            P.S> This is where CVS/SVN would come in *very* handy.

                            On Fri, 19 Nov 2004, Sean Lahman wrote:

                            >
                            > I noticed today that there are about 800 players in the master table who
                            > don't have a debut date. On closer inspection, most of these players made
                            > their debut between 1998 and 2001. Right now, there are less than two dozen
                            > players in the table with a value in the debut field for that five year
                            > period.
                            >
                            > I'll take a look at resolving this issue, but just wanted to alert everybody
                            > to the problem.
                            >
                            > Regards,
                            > Sean Lahman
                            >
                            >
                            >
                            >
                            >
                            >
                            > http://www.baseball-databank.org/
                            > Yahoo! Groups Links
                            >
                            >
                            >
                            >
                            >
                            >
                            >
                            >
                          • piratefan1@bellsouth.net
                            ... ended the season. ... season are ... LA #2. ... Rick is correct. Encarnacion was traded by Florida to the Dodgers for a PTBNL before the season started,
                            Message 13 of 28 , Nov 19, 2004
                            • 0 Attachment
                              Richard Shoults wrote:
                              > Respectfully, no. In 2004 he was traded from LA to FLA where he
                              ended the season.
                              >
                              > Rick
                              >
                              > Sean Lahman <slahman@b...> wrote:
                              >
                              > Just noticed today that the stint IDs for Juan Encarnacion's 2004
                              season are
                              > reversed. He was traded from Florida to LA, so Fla should be #1 and
                              LA #2.
                              > Currently, it is reveresed.

                              Rick is correct. Encarnacion was traded by Florida to the Dodgers for
                              a PTBNL before the season started, then traded back to the Marlins in
                              the LoDuca trade.

                              Mike Emeigh
                              piratefan1@...
                            • Sean Lahman
                              Thanks, Mike & Rick, for correcting my mistake. --SL
                              Message 14 of 28 , Nov 19, 2004
                              • 0 Attachment
                                Thanks, Mike & Rick, for correcting my mistake.
                                --SL

                                > -----Original Message-----
                                > From: piratefan1@... [mailto:piratefan1@...]
                                > Sent: Friday, November 19, 2004 3:52 PM
                                > To: baseball-databank@yahoogroups.com
                                > Subject: [baseball-databank] Re: ERROR stint ID for Juan Encarnacion
                                >
                                >
                                >
                                >
                                > Richard Shoults wrote:
                                > > Respectfully, no. In 2004 he was traded from LA to FLA where he
                                > ended the season.
                                > >
                                > > Rick
                                > >
                                > > Sean Lahman <slahman@b...> wrote:
                                > >
                                > > Just noticed today that the stint IDs for Juan Encarnacion's 2004
                                > season are
                                > > reversed. He was traded from Florida to LA, so Fla should be #1 and
                                > LA #2.
                                > > Currently, it is reveresed.
                                >
                                > Rick is correct. Encarnacion was traded by Florida to the Dodgers for
                                > a PTBNL before the season started, then traded back to the Marlins in
                                > the LoDuca trade.
                                >
                                > Mike Emeigh
                                > piratefan1@...
                                >
                              • Tangotiger
                                Here s one more thought on alphanumerics: what do you do with Japanese, Chinese, and french players? We ve always been thinking of this as english and
                                Message 15 of 28 , Nov 19, 2004
                                • 0 Attachment
                                  Here's one more thought on alphanumerics:

                                  what do you do with Japanese, Chinese, and french
                                  players? We've always been thinking of this as
                                  english and unilingual.

                                  This is why having numeric keys is great. Every
                                  language accepts the numeral system. Every table will
                                  then have translation fields. Say you have a LOCATION
                                  table, you'd have something like:
                                  locID,locType,engName,japName,chiname,freName
                                  673,1,USA,%$#,*&^,Etats Unis
                                  or some such.

                                  Same thing with people's names. You could potentially
                                  have numerics for the names, which maps to a table of
                                  names that are translatable to various languages.

                                  What we have the potential for here is a fully
                                  normalized DB that is expandable across the world.
                                  Imagine World Cup 2006 using our database, because
                                  it's the only DB that supports all the languages of
                                  the countries in the World Cup.

                                  Imagine being a french visitor going to b-r.com or
                                  Retrosheet.org, and they click the french link, and in
                                  one fell swoop, have the page redisplayed in the
                                  language of the visitor. HR becomes CC, SP becomes
                                  LP, etc.

                                  Tom

                                  __________________________________________________
                                  Do You Yahoo!?
                                  Tired of spam? Yahoo! Mail has the best spam protection around
                                  http://mail.yahoo.com
                                • Sean Lahman
                                  Derek Adair wrote... ... I m beginning to see that. I have Subversion software installed on my server, and I ll take a look over the weekend at setting things
                                  Message 16 of 28 , Nov 19, 2004
                                  • 0 Attachment
                                    Derek Adair wrote...
                                    >
                                    > This is where CVS/SVN would come in *very* handy.
                                    >

                                    I'm beginning to see that. I have Subversion software installed on my
                                    server, and I'll take a look over the weekend at setting things up for the
                                    baseball database.

                                    Regards,
                                    Sean Lahman
                                  • Sean Lahman
                                    Those of you on the SABR-L list may have seen recent discussion about inaccuracies in the vote totals that were released by MLB for the American League MVP
                                    Message 17 of 28 , Nov 19, 2004
                                    • 0 Attachment
                                      Those of you on the SABR-L list may have seen recent discussion about
                                      inaccuracies in the vote totals that were released by MLB for the American
                                      League MVP award on Monday. Simply put, the number of votes and number of
                                      points didn't add up. Several of us have been working on getting an
                                      accurate reporting of the data, and Bill Deane was finally able to get a
                                      defintive response from the source. Bill writes:

                                      > MLB.com had omitted both Curt Schilling (14 points) and Travis Hafner (2)
                                      > from their chart, and my guess about Melvin Mora (one 8th and one 9th,
                                      > 5 points) was correct. That left only the problem with Mariano Rivera's
                                      > and Ivan Rodriguez's point totals not adding up based on their vote
                                      > breakdowns. This error was included in every published list I saw.
                                      >
                                      > A couple of you referred me to BBWAA Secretary-Treasurer Jack McConnell,
                                      > who solved the mystery. The point totals for Rivera (59) and
                                      > Rodriguez (36) were correct, but Mo had 3 ninth-place votes (not 4),
                                      > while I-Rod had 5 (not 4).

                                      For the record, what appears below is an accurate reporting of the vote
                                      totals. I don't believe any of the other published sources (ESPN, USA Today,
                                      CNN/SI) have it right. Totals for Rivera and Rodriguez need to be updated
                                      in the AwardSharePlayers table that I postyed yesterday.

                                      Regards,
                                      Sean Lahman


                                      2004 AL MVP Voting
                                      Player 1 2 3 4 5 6 7 8 9 10 Points
                                      ----------------------------------------------------
                                      V Guerrero 21 5 1 1 0 0 0 0 0 0 354
                                      G Sheffield 5 8 9 4 2 0 0 0 0 0 254
                                      M Ramirez 1 14 9 2 2 0 0 0 0 0 238
                                      D Ortiz 1 0 5 9 2 5 5 0 0 0 174
                                      M Tejada 0 1 0 6 2 7 2 4 2 1 123
                                      J Santana 0 0 2 1 7 5 4 2 2 1 117
                                      I Suzuki 0 0 1 0 6 2 6 4 1 6 98
                                      M Young 0 0 1 4 2 2 4 4 3 0 92
                                      M Rivera 0 0 0 0 2 3 3 3 3 5 59
                                      I Rodriguez 0 0 0 1 1 1 0 2 5 2 36
                                      C Schilling 0 0 0 0 0 0 2 2 0 0 14
                                      J Nathan 0 0 0 0 1 1 0 0 0 1 12
                                      D Jeter 0 0 0 0 1 0 0 1 1 0 11
                                      M Kotsay 0 0 0 0 0 1 0 0 1 1 8
                                      A Rodriguez 0 0 0 0 0 0 0 2 1 0 8
                                      J Damon 0 0 0 0 0 0 0 1 2 0 7
                                      P Konerko 0 0 0 0 0 0 0 0 1 5 7
                                      H Blalock 0 0 0 0 0 1 0 0 0 0 5
                                      M Mora 0 0 0 0 0 0 0 1 1 0 5
                                      M Teixeira 0 0 0 0 0 0 0 1 0 2 5
                                      T Hunter 0 0 0 0 0 0 1 0 0 0 4
                                      V Martinez 0 0 0 0 0 0 1 0 0 0 4
                                      E Durazo 0 0 0 0 0 0 0 1 0 0 3
                                      F Cordero 0 0 0 0 0 0 0 0 1 0 2
                                      L Ford 0 0 0 0 0 0 0 0 1 0 2
                                      C Guillen 0 0 0 0 0 0 0 0 1 0 2
                                      T Hafner 0 0 0 0 0 0 0 0 1 0 2
                                      H Matsui 0 0 0 0 0 0 0 0 1 0 2
                                      C Figgins 0 0 0 0 0 0 0 0 0 2 2
                                      E Chavez 0 0 0 0 0 0 0 0 0 1 1
                                      J Varitek 0 0 0 0 0 0 0 0 0 1 1
                                      -------------------------------------------------
                                    • Sean Forman
                                      ... I ve got all of these taken care of. There *are* however about 258 final games that I m missing. I m also cleaning up some previously erroneous debuts
                                      Message 18 of 28 , Nov 19, 2004
                                      • 0 Attachment
                                        Sean Lahman wrote:
                                        > I noticed today that there are about 800 players in the master table who
                                        > don't have a debut date. On closer inspection, most of these players made
                                        > their debut between 1998 and 2001. Right now, there are less than two dozen
                                        > players in the table with a value in the debut field for that five year
                                        > period.
                                        >
                                        > I'll take a look at resolving this issue, but just wanted to alert everybody
                                        > to the problem.
                                        >
                                        > Regards,
                                        > Sean Lahman


                                        I've got all of these taken care of. There *are* however about 258
                                        final games that I'm missing. I'm also cleaning up some previously
                                        erroneous debuts and final games as well.
                                        --
                                        Sincerely,
                                        Sean Forman

                                        Baseball Stats! http://www.Baseball-Reference.com/
                                      • Paul Wendt
                                        19 Nov 2004, Sean Forman wrote: . . . ... games as well. Once this Spring, the Biographical Database was missing 96 Last Games, all from
                                        Message 19 of 28 , Dec 2, 2004
                                        • 0 Attachment
                                          19 Nov 2004, Sean Forman <sean-forman@b...> wrote:
                                          . . .
                                          > There *are* however about 258 final games that I'm missing.
                                          > I'm also cleaning up some previously erroneous debuts and final
                                          games as well.

                                          Once this Spring, the Biographical Database was missing 96 Last
                                          Games, all from 1876-1880 (not counting 2003). I believe that more
                                          than 50 are still missing.

                                          This Summer and Fall, the Biog. newsletter should be reporting
                                          numerous small corrections in debut dates and perhaps lastgame dates
                                          from the 1870s-80s. I don't know the update lag and I am a few months
                                          behind on
                                          the other SABR research cmte newsletters.

                                          Paul Wendt
                                        Your message has been successfully submitted and would be delivered to recipients shortly.