Loading ...
Sorry, an error occurred while loading the content.

Re: [baseball-databank] Re: Newbie Question on IDs

Expand Messages
  • Tangotiger
    ... Yes, this Mapping table is what we have designed in the BDBDesign group. Any time someone wants to give us new data, we simply tack it on to this
    Message 1 of 28 , Nov 11, 2004
    View Source
    • 0 Attachment
      --- tjruane <truane@...> wrote:
      > I will defer to your judgement. I think a more
      > direct approach
      > would have been to have a single ID used in all the
      > tables and in
      > the master table simply have a series of fields
      > indicating which
      > people were players, managers, executives, coaches,
      > umpires, scouts,
      > and so on.
      >

      Yes, this "Mapping" table is what we have designed in
      the BDBDesign group. Any time someone wants to give
      us new data, we simply tack it on to this mapping
      table. The main ID would be numeric, and would link
      all the associated tables.

      So, if someone is compiling minor league records, and
      they have Tim Raines as raiti001, that's fine, and we
      just tack that on to the mapping table to ID 67934.
      HAve multiple sources for MLB data (Lahman, Retro,
      STATS, MLB, etc)? No problem. Everything gets
      plugged in there.

      We had a long discussion on this topic at the above
      mentioned group, which you might want to reference.

      Tom


      __________________________________________________
      Do You Yahoo!?
      Tired of spam? Yahoo! Mail has the best spam protection around
      http://mail.yahoo.com
    • Derek Adair
      ... As I understand the e-mails above, I don t agree that the mapping table strategy you re describing is the same as the strategy Tom R. would use for a
      Message 2 of 28 , Nov 11, 2004
      View Source
      • 0 Attachment
        On Thu, 11 Nov 2004, Tangotiger wrote:

        >
        >
        > --- tjruane <truane@...> wrote:
        >> I will defer to your judgement. I think a more
        >> direct approach
        >> would have been to have a single ID used in all the
        >> tables and in
        >> the master table simply have a series of fields
        >> indicating which
        >> people were players, managers, executives, coaches,
        >> umpires, scouts,
        >> and so on.
        >>
        >
        > Yes, this "Mapping" table is what we have designed in
        > the BDBDesign group. Any time someone wants to give
        > us new data, we simply tack it on to this mapping
        > table. The main ID would be numeric, and would link
        > all the associated tables.
        >
        > So, if someone is compiling minor league records, and
        > they have Tim Raines as raiti001, that's fine, and we
        > just tack that on to the mapping table to ID 67934.
        > HAve multiple sources for MLB data (Lahman, Retro,
        > STATS, MLB, etc)? No problem. Everything gets
        > plugged in there.
        >
        > We had a long discussion on this topic at the above
        > mentioned group, which you might want to reference.

        As I understand the e-mails above, I don't agree that the mapping table
        strategy you're describing is the same as the strategy Tom R. would use
        for a master table.

        Here's what I think Tom R. is referring to (stripped down):

        ID,MLB?,HOF?,JAPNLG?,...
        cobbty01,Y,Y,N,...

        Here's what I think Tom (Tangotiger) is describing:
        ID,LahmanID,StatsID,RetroID
        14567,cobbty01,5678,cobbt001

        As I see it, the current strategy is very similar to the second; it just
        pretends HOF, Manager, executive, etc. info is from a separate source.

        Regards,
        Derek
      • Derek Adair
        ... I understand the desire to use longs. Currently, all of our tables with the exception of Master (and maybe one or two small ones that escape me) have
        Message 3 of 28 , Nov 11, 2004
        View Source
        • 0 Attachment
          On Thu, 11 Nov 2004, F.X.Flinn wrote:

          > I'm firmly of the opinion that table keys should be long integers and
          > all tables should have fields that collect relevant corresponding table
          > IDs.
          >
          > For example, if I have a table of scouts, I have a table key named
          > ScoutID and I have fields for PlayerID, ManagerID, etc., or I have one
          > field called MasterID (if I maintain a table of all persons, a rosetta
          > stone table, as it were).
          >
          > On the other hand, the existing LahmanID scheme works well for the
          > audience that is comfortable using flat-file approaches. I don't see a
          > wholesale switchover happening in the context of bb-db.

          I understand the desire to use longs. Currently, all of our tables with
          the exception of Master (and maybe one or two small ones that escape me)
          have compound primary keys, which can lead to some ugliness, both in terms
          of development and performance in dependent applications.

          That said, I do think it would make visual inspection of the data much
          more difficult. I think, over time, that would have a negative impact on
          data integrity. Our workflow is much different than the typical business
          database, where performance is more crucial and hand-massaging of the data
          is less common.

          I think a perl or similar script to take the DB files and prepend unique
          long ID's and switch out the dependent ID's would not be that difficult to
          write (although somewhat time-consuming to run).

          Regards,
          Derek
        • Tangotiger
          Actually, we have to pretend it s from a separate source as long as the source feeding BDB is not entrenched in BDB. I mean, if someone was going to provide
          Message 4 of 28 , Nov 11, 2004
          View Source
          • 0 Attachment
            Actually, we have to pretend it's from a separate
            source as long as the source feeding BDB is not
            entrenched in BDB. I mean, if someone was going to
            provide even more detailed Gold Glove data or whatnot,
            we'd have to be able to plug them in. The source
            provider is not going to necessarily want to stick to
            the BDB IDs.

            However, I agree that there's no reason to have the
            "u" and whatnot, if it's a derivative of the playerid.


            Given the extremely loose nature of this group, having
            a plug-and-play kind of model is the best thing
            that'll work. I think.

            Tom



            --- Derek Adair <dadair@...> wrote:

            > On Thu, 11 Nov 2004, Tangotiger wrote:
            >
            > >
            > >
            > > --- tjruane <truane@...> wrote:
            > >> I will defer to your judgement. I think a more
            > >> direct approach
            > >> would have been to have a single ID used in all
            > the
            > >> tables and in
            > >> the master table simply have a series of fields
            > >> indicating which
            > >> people were players, managers, executives,
            > coaches,
            > >> umpires, scouts,
            > >> and so on.
            > >>
            > >
            > > Yes, this "Mapping" table is what we have designed
            > in
            > > the BDBDesign group. Any time someone wants to
            > give
            > > us new data, we simply tack it on to this mapping
            > > table. The main ID would be numeric, and would
            > link
            > > all the associated tables.
            > >
            > > So, if someone is compiling minor league records,
            > and
            > > they have Tim Raines as raiti001, that's fine, and
            > we
            > > just tack that on to the mapping table to ID
            > 67934.
            > > HAve multiple sources for MLB data (Lahman, Retro,
            > > STATS, MLB, etc)? No problem. Everything gets
            > > plugged in there.
            > >
            > > We had a long discussion on this topic at the
            > above
            > > mentioned group, which you might want to
            > reference.
            >
            > As I understand the e-mails above, I don't agree
            > that the mapping table
            > strategy you're describing is the same as the
            > strategy Tom R. would use
            > for a master table.
            >
            > Here's what I think Tom R. is referring to (stripped
            > down):
            >
            > ID,MLB?,HOF?,JAPNLG?,...
            > cobbty01,Y,Y,N,...
            >
            > Here's what I think Tom (Tangotiger) is describing:
            > ID,LahmanID,StatsID,RetroID
            > 14567,cobbty01,5678,cobbt001
            >
            > As I see it, the current strategy is very similar to
            > the second; it just
            > pretends HOF, Manager, executive, etc. info is from
            > a separate source.
            >
            > Regards,
            > Derek
            >




            __________________________________
            Do you Yahoo!?
            Check out the new Yahoo! Front Page.
            www.yahoo.com
          • Sean Lahman
            Tom Ruane raised a question about playerIDs last week, the problem that the master table has three different sets of IDs to identify players in various tables.
            Message 5 of 28 , Nov 17, 2004
            View Source
            • 0 Attachment
              Tom Ruane raised a question about playerIDs last week, the problem that the
              master table has three different sets of IDs to identify players in various
              tables. While I initially explained the rationale for the current design, I
              have to confess that I wasn't involved in the decision to make that design
              change, and the more I reflect on it, the less sense it makes to me. I'm
              beginning to see why this current model is unwieldy. Perhaps someone from
              the design committee can help me to understand why it makes any sense.

              I understand the idea of mapping tables to integrate data sets from other
              sources. The current version of the master table has five fields for that
              purpose right now. That's necessary and should continue. However, what I
              guess I don't understand is the benefit of having the key value for the
              master table split into a playerID, managerID, and hofID. As Ruane
              suggested, this causes problems in some of the other tables. In the HOF
              table for example, we record every player who received a vote for induction,
              and these records all have an ID tagged with the "h" suffix. What that
              means, then, is that a player who received votes but wasn't elected "Lenny
              Dykstra" has both a playerID and a hofID. I don't know what this
              accomplishes except creating confusion. It requires new IDs to be created
              every year for existing players (Juan Samuel, for example, who received HOF
              votes this year.) As both Ruane and F.X. Flinn argued, this approach
              violates the basic tenets of a relational database. Why not tag every
              player with one ID when they make their debut and use that to identify them
              in every table?

              Whether the key identifier for someone in the master table is an integer or
              a string, it's gotta be one value that never changes. I propose that the
              master table be simplified to remove the managerID and hofID fields and
              consolidate them into one playerID field. If anyone disagrees, please tell
              me what I'm missing.

              Regards,
              Sean Lahman
            • Tangotiger
              Sean, You aren t missing anything. The current proposal in the BDB Design group is to have a single numeric, never-changing, field. This would be a PERSONid,
              Message 6 of 28 , Nov 17, 2004
              View Source
              • 0 Attachment
                Sean,

                You aren't missing anything. The current proposal in
                the BDB Design group is to have a single numeric,
                never-changing, field. This would be a PERSONid, and
                we'd have a PERSONS table.

                We'd also have a PERSONMAP table that would have all
                the various IDs (including backward compatibility to
                Lahman4, 4.5, 5, 5.1) to map all the various sources
                of data to the PERSONid.

                There are dozens of changes required to the current
                version of the BDB so that we can finally have
                something stable enough that we won't have to revisit
                these issues.

                Michael Westbay had also other fascinating ideas
                regarding LOCATION and NAMES, which will become very
                important once we incorporate Japanese data.

                KJOK also has his PARKS db, which he and I normalized
                2 years ago, and has not yet been incorporated into
                the BDB.

                I suggest that any change you make to the DB be done
                all in one shot, so that you don't have several
                releases over several months.

                Tom




                --- Sean Lahman <slahman@...> wrote:

                > Tom Ruane raised a question about playerIDs last
                > week, the problem that the
                > master table has three different sets of IDs to
                > identify players in various
                > tables. While I initially explained the rationale
                > for the current design, I
                > have to confess that I wasn't involved in the
                > decision to make that design
                > change, and the more I reflect on it, the less sense
                > it makes to me. I'm
                > beginning to see why this current model is unwieldy.
                > Perhaps someone from
                > the design committee can help me to understand why
                > it makes any sense.
                >
                > I understand the idea of mapping tables to integrate
                > data sets from other
                > sources. The current version of the master table
                > has five fields for that
                > purpose right now. That's necessary and should
                > continue. However, what I
                > guess I don't understand is the benefit of having
                > the key value for the
                > master table split into a playerID, managerID, and
                > hofID. As Ruane
                > suggested, this causes problems in some of the other
                > tables. In the HOF
                > table for example, we record every player who
                > received a vote for induction,
                > and these records all have an ID tagged with the "h"
                > suffix. What that
                > means, then, is that a player who received votes but
                > wasn't elected "Lenny
                > Dykstra" has both a playerID and a hofID. I don't
                > know what this
                > accomplishes except creating confusion. It requires
                > new IDs to be created
                > every year for existing players (Juan Samuel, for
                > example, who received HOF
                > votes this year.) As both Ruane and F.X. Flinn
                > argued, this approach
                > violates the basic tenets of a relational database.
                > Why not tag every
                > player with one ID when they make their debut and
                > use that to identify them
                > in every table?
                >
                > Whether the key identifier for someone in the master
                > table is an integer or
                > a string, it's gotta be one value that never
                > changes. I propose that the
                > master table be simplified to remove the managerID
                > and hofID fields and
                > consolidate them into one playerID field. If anyone
                > disagrees, please tell
                > me what I'm missing.
                >
                > Regards,
                > Sean Lahman
                >
                >
                >




                __________________________________
                Do you Yahoo!?
                Meet the all-new My Yahoo! - Try it today!
                http://my.yahoo.com
              • Sean Forman
                ... I think it can be summarized as not everyone in the HOF was a player and many of these people (Yawkey, Barrow, etc.) were never players. And it is
                Message 7 of 28 , Nov 17, 2004
                View Source
                • 0 Attachment
                  > Whether the key identifier for someone in the master table is an integer or
                  > a string, it's gotta be one value that never changes. I propose that the
                  > master table be simplified to remove the managerID and hofID fields and
                  > consolidate them into one playerID field. If anyone disagrees, please tell
                  > me what I'm missing.
                  >
                  > Regards,
                  > Sean Lahman


                  I think it can be summarized as not everyone in the HOF was a player and
                  many of these people (Yawkey, Barrow, etc.) were never players. And it
                  is uncertain how the HOF will run things in the future. The HOFid was
                  done to be consistent with ideas for adding additional data such as
                  manager data, minor league data, japanese data, korean leagues data,
                  cuban league data, independent league data, umpire data, coaches data,
                  executives data, broadcaster data, negro league data, winter league
                  data, spring training data, aagpbl data, college data, hs data and any
                  other possible data sets. HOFers do not fit neatly within any other group.

                  Also, why does the hofID violate relational db? If the writers ever
                  vote for non-players the system as designed will be fine. The playerID
                  is for players who appeared in an mlb game (people in the fielding,
                  batting, pitching tables). the HOFID is for a person who received an
                  HOF vote, there is no guarantee that one set will always be included in
                  the other. The managerID is for the people who have managed a game.
                  I'd be mildly surprised if a non-player has never appeared on a writer's
                  ballot (even if only as an uncounted protest vote).

                  We have an imperfect system for imperfect data. And any system we
                  choose will have its flaws. The point is not to have a perfectly
                  correct system, but to distribute data to the widest possible audience.
                  I have asked for feedback on the 2004 data and have received very
                  little feedback (thank you emancip8ed for your comments (any nulls are
                  data I'm hoping to have filled in)).

                  One other thing. I don't really understand the fanaticism for strictly
                  numerical keys.

                  I know tangotiger (Tom M??) wants to get the parks data into the db
                  asap. What other data would you like to see done in the next two
                  months? I'll prioritize, and if I don't get anything done in the next
                  two months, I'll set up a web account on my server, step aside, and let
                  someone else take the reins. I'm under the impression that that
                  postseason data is being produced. Is this correct?

                  Sorry for the rant.

                  Sincerely,
                  Sean Forman

                  Baseball Stats! http://www.Baseball-Reference.com/
                • Sean Lahman
                  ... Yes. I ve also done updates for the four awards tables, HOF, AllStars, and the SeriesPost table. I fixed the problems with place of birth info for people
                  Message 8 of 28 , Nov 17, 2004
                  View Source
                  • 0 Attachment
                    Sean Forman wrote:
                    > I'm under the impression that that
                    > postseason data is being produced. Is this correct?

                    Yes. I've also done updates for the four awards tables, HOF, AllStars, and
                    the SeriesPost table. I fixed the problems with place of birth info for
                    people who debuted in 2004. I'll be done and post copies to the site
                    sometime Thursday. I believe that will complete all of the updates needed
                    for the end of 2004 season.

                    I think with that, it's time to prepare the release of the full 2004
                    database. Other projects, such as additional data tables and any proposed
                    re-designs can be tackled over the winter.

                    Regards,
                    Sean Lahman
                  • Tangotiger
                    ... It doesn t violate relational DB, but it probably violates 3NF. You shouldn t have raineti01, and raineti01h in the same record. Every field must stand on
                    Message 9 of 28 , Nov 17, 2004
                    View Source
                    • 0 Attachment
                      --- Sean Forman <sean-forman@...>
                      wrote:
                      > Also, why does the hofID violate relational db? If

                      It doesn't violate relational DB, but it probably
                      violates 3NF. You shouldn't have
                      raineti01, and raineti01h
                      in the same record.
                      Every field must stand on its own, and not be a
                      concatenation of some other set of fields in the same
                      record. Certainly not the key field. You also want a
                      numeric key for any table that has a child table,
                      which just makes it convenient to have a numeric key
                      in all tables. I wouldn't have BOS as a park key
                      either. My rule of thumb is to avoid any possibility
                      of a key change. This is all from an admin DB
                      viewpoint, which brings me to...


                      > We have an imperfect system for imperfect data.
                      > And any system we
                      > choose will have its flaws. The point is not to
                      > have a perfectly
                      > correct system, but to distribute data to the widest
                      > possible audience.

                      ...The current proposal:
                      http://www.baseball-databank.org/purpose.txt
                      specifically says:

                      "This databank, once it is fully normalized and
                      proofed, will be the standard source for those
                      professionals creating new data products... But first
                      and foremost, the Baseball Databank is a library of
                      authoritative baseball statistics and information
                      maintained in a simple-to-access format for
                      information providers and baseball researchers."

                      Fully normalized is presumably 3NF. Reading the
                      above, and my impression is that the DB is for
                      developers (information providers and researchers) and
                      not the widest audience possible.

                      It is the information providers (presumably Sean,
                      Sean, Westbay, and others) that will make a form of
                      this database available to the widest audience
                      possible, with likely redundant tables, alphanumeric
                      keys, and other things that improve usability, but
                      degrade normalization.


                      > I have asked for feedback on the 2004 data and
                      > have received very
                      > little feedback (thank you emancip8ed for your
                      > comments (any nulls are
                      > data I'm hoping to have filled in)).

                      I loaded the data, and have only come across the NULL
                      situation. I have not tested the data, but so far so
                      good.

                      >
                      > One other thing. I don't really understand the
                      > fanaticism for strictly
                      > numerical keys.

                      I hope you do now!!

                      >
                      > I know tangotiger (Tom M??) wants to get the parks
                      > data into the db
                      > asap. What other data would you like to see done in
                      > the next two
                      > months? I'll prioritize, and if I don't get

                      Probably Ruane's upload, though I haven't verified if
                      that was done yet, since I only downloaded the 04
                      data.


                      > anything done in the next
                      > two months, I'll set up a web account on my server,
                      > step aside, and let
                      > someone else take the reins. I'm under the
                      > impression that that
                      > postseason data is being produced. Is this correct?

                      I supplied KJOK with the fielding data, and presumably
                      he will attach IDs to the names.

                      >
                      > Sorry for the rant.
                      >
                      > Sincerely,
                      > Sean Forman

                      Rants are encouraged.

                      Tom
                      --------------------------------
                      http://www.tangotiger.net








                      __________________________________
                      Do you Yahoo!?
                      The all-new My Yahoo! - Get yours free!
                      http://my.yahoo.com
                    • Derek Adair
                      Tom/Sean/anyone else who s interested, I wasn t sure what I was missing either, so I looked through the archives, since we obviously made the changes for a
                      Message 10 of 28 , Nov 17, 2004
                      View Source
                      • 0 Attachment
                        Tom/Sean/anyone else who's interested,

                        I wasn't sure what I was missing either, so I looked through the archives,
                        since we obviously made the changes for a reason.

                        There are at least two distinct issues here. I refer to:

                        http://sports.groups.yahoo.com/group/baseball-databank/message/568 (and
                        the following few messages in the thread)

                        First is the issue of numeric vs. string ID's. I honestly haven't seen a
                        compelling reason to switch. The two reasons I've seen are that numeric
                        ID's won't need to change, and that that's the way DB's are generally
                        designed. As previously addressed, I think we have enough differences from
                        a typical business DB to need to handle things a bit differently, and a
                        retrosheet approach where string ID's are fixed could work for us as well.
                        On the flip side, I see visual inspection of the data will be much more
                        difficult with numeric keys. I'm completely open to being convinced here,
                        so if anyone has concrete benefits I'm all ears.

                        Second (and most important) is the issue of what ID's we'll use in the
                        "core" BDB tables. Sean L.'s proposal was to collapse the ID's in Master.
                        If we have only a single primary key across the code DB, then we end up
                        with the problem that post 568 describes.

                        I think we're getting enough secondary source input that having the person
                        table makes sense, but I'd also propose that we handle our own tables the
                        same way - each logical grouping has its own primary key set which is
                        linked to through the person table. So instead of collapsing master, I'd
                        shift everything over and keep it as is (except for possible changes under
                        the first issue).

                        Regards,
                        Derek
                      • Tangotiger
                        Can I suggest that design issues be brought up with in the BDB Design group? Michael has done some tremendous work, and we ve gone through many of these
                        Message 11 of 28 , Nov 18, 2004
                        View Source
                        • 0 Attachment
                          Can I suggest that design issues be brought up with in
                          the BDB Design group? Michael has done some
                          tremendous work, and we've gone through many of these
                          issues. The group itself has been dormant for a
                          while, but it might be time for it to erupt again.

                          After we've done a good design, we can propose it
                          here, and get feedback as to how to proceed.

                          This will also stay in line with Lahman's request that
                          we hold off on design issues for the next few months.

                          Tom





                          --- Derek Adair <dadair@...> wrote:

                          > Tom/Sean/anyone else who's interested,
                          >
                          > I wasn't sure what I was missing either, so I looked
                          > through the archives,
                          > since we obviously made the changes for a reason.
                          >
                          > There are at least two distinct issues here. I refer
                          > to:
                          >
                          >
                          http://sports.groups.yahoo.com/group/baseball-databank/message/568
                          > (and
                          > the following few messages in the thread)
                          >
                          > First is the issue of numeric vs. string ID's. I
                          > honestly haven't seen a
                          > compelling reason to switch. The two reasons I've
                          > seen are that numeric
                          > ID's won't need to change, and that that's the way
                          > DB's are generally
                          > designed. As previously addressed, I think we have
                          > enough differences from
                          > a typical business DB to need to handle things a bit
                          > differently, and a
                          > retrosheet approach where string ID's are fixed
                          > could work for us as well.
                          > On the flip side, I see visual inspection of the
                          > data will be much more
                          > difficult with numeric keys. I'm completely open to
                          > being convinced here,
                          > so if anyone has concrete benefits I'm all ears.
                          >
                          > Second (and most important) is the issue of what
                          > ID's we'll use in the
                          > "core" BDB tables. Sean L.'s proposal was to
                          > collapse the ID's in Master.
                          > If we have only a single primary key across the code
                          > DB, then we end up
                          > with the problem that post 568 describes.
                          >
                          > I think we're getting enough secondary source input
                          > that having the person
                          > table makes sense, but I'd also propose that we
                          > handle our own tables the
                          > same way - each logical grouping has its own primary
                          > key set which is
                          > linked to through the person table. So instead of
                          > collapsing master, I'd
                          > shift everything over and keep it as is (except for
                          > possible changes under
                          > the first issue).
                          >
                          > Regards,
                          > Derek
                          >




                          __________________________________
                          Do you Yahoo!?
                          The all-new My Yahoo! - Get yours free!
                          http://my.yahoo.com
                        • Sean Forman
                          ... I really, really think this is an invitation for transposition errors being added into the db. For instance, when the SABR biographical committee sends out
                          Message 12 of 28 , Nov 19, 2004
                          View Source
                          • 0 Attachment
                            > >
                            > > One other thing. I don't really understand the
                            > > fanaticism for strictly
                            > > numerical keys.
                            >
                            > I hope you do now!!


                            I really, really think this is an invitation for transposition errors
                            being added into the db.

                            For instance, when the SABR biographical committee sends out their
                            bi-monthly files. They list about 80 or so players who have died or for
                            whom new data has been located. Here are the steps that would need to
                            be taken for these changes to be incorporated.

                            select ID, nameLast, nameFirst, debut from Master where nameLast='smith'
                            and nameFirst='john';

                            Then figure out which numeric key is the right one.

                            update deathState='NY', deathCountry='USA', deathCity='brooklyn' where
                            ID='015324';

                            I know with the places table being proposed, I might even have to look
                            up numeric values for the state, country and city. Would I have to look
                            up new values for the year, month and day as well? ;-)

                            I just think the likelihood of errors probably goes up by a factor of
                            ten with this scheme.

                            For a change to the batting table which often happens when the records
                            committee sends out their newsletter.

                            select ID, nameLast, nameFirst, debut from Master where nameLast='aaron'
                            and nameFirst like 'h%';

                            select teamID from Teams where teamCity='Milwaukee' and yearID=1961;

                            update Batting set HR=12 where yearID=1961 and teamID=23 and ID='000121'
                            and stint=1;

                            Maybe I'm just being a whiner, but I see this as being less than ideal.

                            --
                            Sincerely,
                            Sean Forman

                            Baseball Stats! http://www.Baseball-Reference.com/
                          • Tangotiger
                            ... Au contraire! UPDATE PERSONBIO p INNER JOIN LOCATION l SET p.stateid = l.stateid where l.statename = FL ; You could make it even better by having a
                            Message 13 of 28 , Nov 19, 2004
                            View Source
                            • 0 Attachment
                              --- Sean Forman <sean-forman@...>
                              wrote:
                              > I really, really think this is an invitation for
                              > transposition errors
                              > being added into the db.

                              Au contraire!

                              UPDATE PERSONBIO p
                              INNER JOIN LOCATION l
                              SET p.stateid = l.stateid
                              where l.statename = 'FL';

                              You could make it even better by having a TRANSACTIONS
                              table, and add a third join, instead of the hardcoded
                              'FL'.

                              Tom


                              __________________________________________________
                              Do You Yahoo!?
                              Tired of spam? Yahoo! Mail has the best spam protection around
                              http://mail.yahoo.com
                            • Sean Lahman
                              Just noticed today that the stint IDs for Juan Encarnacion s 2004 season are reversed. He was traded from Florida to LA, so Fla should be #1 and LA #2.
                              Message 14 of 28 , Nov 19, 2004
                              View Source
                              • 0 Attachment
                                Just noticed today that the stint IDs for Juan Encarnacion's 2004 season are
                                reversed. He was traded from Florida to LA, so Fla should be #1 and LA #2.
                                Currently, it is reveresed.

                                Regards,
                                Sean Lahman
                              • Richard Shoults
                                Respectfully, no. In 2004 he was traded from LA to FLA where he ended the season. Rick Sean Lahman wrote: Just noticed today that the
                                Message 15 of 28 , Nov 19, 2004
                                View Source
                                • 0 Attachment
                                  Respectfully, no.  In 2004 he was traded from LA to FLA where he ended the season.
                                   
                                  Rick

                                  Sean Lahman <slahman@...> wrote:

                                  Just noticed today that the stint IDs for Juan Encarnacion's 2004 season are
                                  reversed. He was traded from Florida to LA, so Fla should be #1 and LA #2.
                                  Currently, it is reveresed.

                                  Regards,
                                  Sean Lahman




                                  ------------------------ Yahoo! Groups Sponsor --------------------~-->
                                  $9.95 domain names from Yahoo!. Register anything.
                                  http://us.click.yahoo.com/J8kdrA/y20IAA/yQLSAA/VCUolB/TM
                                  --------------------------------------------------------------------~->

                                  http://www.baseball-databank.org/
                                  Yahoo! Groups Links

                                  <*> To visit your group on the web, go to:
                                  http://groups.yahoo.com/group/baseball-databank/

                                  <*> To unsubscribe from this group, send an email to:
                                  baseball-databank-unsubscribe@yahoogroups.com

                                  <*> Your use of Yahoo! Groups is subject to:
                                  http://docs.yahoo.com/info/terms/




                                • Sean Lahman
                                  I noticed today that there are about 800 players in the master table who don t have a debut date. On closer inspection, most of these players made their debut
                                  Message 16 of 28 , Nov 19, 2004
                                  View Source
                                  • 0 Attachment
                                    I noticed today that there are about 800 players in the master table who
                                    don't have a debut date. On closer inspection, most of these players made
                                    their debut between 1998 and 2001. Right now, there are less than two dozen
                                    players in the table with a value in the debut field for that five year
                                    period.

                                    I'll take a look at resolving this issue, but just wanted to alert everybody
                                    to the problem.

                                    Regards,
                                    Sean Lahman
                                  • Derek Adair
                                    Were these in the fixes from Michael Mavriogannis I asked about a couple of days ago? http://sports.groups.yahoo.com/group/baseball-databank/message/2089
                                    Message 17 of 28 , Nov 19, 2004
                                    View Source
                                    • 0 Attachment
                                      Were these in the fixes from Michael Mavriogannis I asked about a couple
                                      of days ago?

                                      http://sports.groups.yahoo.com/group/baseball-databank/message/2089

                                      Regards,
                                      Derek

                                      P.S> This is where CVS/SVN would come in *very* handy.

                                      On Fri, 19 Nov 2004, Sean Lahman wrote:

                                      >
                                      > I noticed today that there are about 800 players in the master table who
                                      > don't have a debut date. On closer inspection, most of these players made
                                      > their debut between 1998 and 2001. Right now, there are less than two dozen
                                      > players in the table with a value in the debut field for that five year
                                      > period.
                                      >
                                      > I'll take a look at resolving this issue, but just wanted to alert everybody
                                      > to the problem.
                                      >
                                      > Regards,
                                      > Sean Lahman
                                      >
                                      >
                                      >
                                      >
                                      >
                                      >
                                      > http://www.baseball-databank.org/
                                      > Yahoo! Groups Links
                                      >
                                      >
                                      >
                                      >
                                      >
                                      >
                                      >
                                      >
                                    • piratefan1@bellsouth.net
                                      ... ended the season. ... season are ... LA #2. ... Rick is correct. Encarnacion was traded by Florida to the Dodgers for a PTBNL before the season started,
                                      Message 18 of 28 , Nov 19, 2004
                                      View Source
                                      • 0 Attachment
                                        Richard Shoults wrote:
                                        > Respectfully, no. In 2004 he was traded from LA to FLA where he
                                        ended the season.
                                        >
                                        > Rick
                                        >
                                        > Sean Lahman <slahman@b...> wrote:
                                        >
                                        > Just noticed today that the stint IDs for Juan Encarnacion's 2004
                                        season are
                                        > reversed. He was traded from Florida to LA, so Fla should be #1 and
                                        LA #2.
                                        > Currently, it is reveresed.

                                        Rick is correct. Encarnacion was traded by Florida to the Dodgers for
                                        a PTBNL before the season started, then traded back to the Marlins in
                                        the LoDuca trade.

                                        Mike Emeigh
                                        piratefan1@...
                                      • Sean Lahman
                                        Thanks, Mike & Rick, for correcting my mistake. --SL
                                        Message 19 of 28 , Nov 19, 2004
                                        View Source
                                        • 0 Attachment
                                          Thanks, Mike & Rick, for correcting my mistake.
                                          --SL

                                          > -----Original Message-----
                                          > From: piratefan1@... [mailto:piratefan1@...]
                                          > Sent: Friday, November 19, 2004 3:52 PM
                                          > To: baseball-databank@yahoogroups.com
                                          > Subject: [baseball-databank] Re: ERROR stint ID for Juan Encarnacion
                                          >
                                          >
                                          >
                                          >
                                          > Richard Shoults wrote:
                                          > > Respectfully, no. In 2004 he was traded from LA to FLA where he
                                          > ended the season.
                                          > >
                                          > > Rick
                                          > >
                                          > > Sean Lahman <slahman@b...> wrote:
                                          > >
                                          > > Just noticed today that the stint IDs for Juan Encarnacion's 2004
                                          > season are
                                          > > reversed. He was traded from Florida to LA, so Fla should be #1 and
                                          > LA #2.
                                          > > Currently, it is reveresed.
                                          >
                                          > Rick is correct. Encarnacion was traded by Florida to the Dodgers for
                                          > a PTBNL before the season started, then traded back to the Marlins in
                                          > the LoDuca trade.
                                          >
                                          > Mike Emeigh
                                          > piratefan1@...
                                          >
                                        • Tangotiger
                                          Here s one more thought on alphanumerics: what do you do with Japanese, Chinese, and french players? We ve always been thinking of this as english and
                                          Message 20 of 28 , Nov 19, 2004
                                          View Source
                                          • 0 Attachment
                                            Here's one more thought on alphanumerics:

                                            what do you do with Japanese, Chinese, and french
                                            players? We've always been thinking of this as
                                            english and unilingual.

                                            This is why having numeric keys is great. Every
                                            language accepts the numeral system. Every table will
                                            then have translation fields. Say you have a LOCATION
                                            table, you'd have something like:
                                            locID,locType,engName,japName,chiname,freName
                                            673,1,USA,%$#,*&^,Etats Unis
                                            or some such.

                                            Same thing with people's names. You could potentially
                                            have numerics for the names, which maps to a table of
                                            names that are translatable to various languages.

                                            What we have the potential for here is a fully
                                            normalized DB that is expandable across the world.
                                            Imagine World Cup 2006 using our database, because
                                            it's the only DB that supports all the languages of
                                            the countries in the World Cup.

                                            Imagine being a french visitor going to b-r.com or
                                            Retrosheet.org, and they click the french link, and in
                                            one fell swoop, have the page redisplayed in the
                                            language of the visitor. HR becomes CC, SP becomes
                                            LP, etc.

                                            Tom

                                            __________________________________________________
                                            Do You Yahoo!?
                                            Tired of spam? Yahoo! Mail has the best spam protection around
                                            http://mail.yahoo.com
                                          • Sean Lahman
                                            Derek Adair wrote... ... I m beginning to see that. I have Subversion software installed on my server, and I ll take a look over the weekend at setting things
                                            Message 21 of 28 , Nov 19, 2004
                                            View Source
                                            • 0 Attachment
                                              Derek Adair wrote...
                                              >
                                              > This is where CVS/SVN would come in *very* handy.
                                              >

                                              I'm beginning to see that. I have Subversion software installed on my
                                              server, and I'll take a look over the weekend at setting things up for the
                                              baseball database.

                                              Regards,
                                              Sean Lahman
                                            • Sean Lahman
                                              Those of you on the SABR-L list may have seen recent discussion about inaccuracies in the vote totals that were released by MLB for the American League MVP
                                              Message 22 of 28 , Nov 19, 2004
                                              View Source
                                              • 0 Attachment
                                                Those of you on the SABR-L list may have seen recent discussion about
                                                inaccuracies in the vote totals that were released by MLB for the American
                                                League MVP award on Monday. Simply put, the number of votes and number of
                                                points didn't add up. Several of us have been working on getting an
                                                accurate reporting of the data, and Bill Deane was finally able to get a
                                                defintive response from the source. Bill writes:

                                                > MLB.com had omitted both Curt Schilling (14 points) and Travis Hafner (2)
                                                > from their chart, and my guess about Melvin Mora (one 8th and one 9th,
                                                > 5 points) was correct. That left only the problem with Mariano Rivera's
                                                > and Ivan Rodriguez's point totals not adding up based on their vote
                                                > breakdowns. This error was included in every published list I saw.
                                                >
                                                > A couple of you referred me to BBWAA Secretary-Treasurer Jack McConnell,
                                                > who solved the mystery. The point totals for Rivera (59) and
                                                > Rodriguez (36) were correct, but Mo had 3 ninth-place votes (not 4),
                                                > while I-Rod had 5 (not 4).

                                                For the record, what appears below is an accurate reporting of the vote
                                                totals. I don't believe any of the other published sources (ESPN, USA Today,
                                                CNN/SI) have it right. Totals for Rivera and Rodriguez need to be updated
                                                in the AwardSharePlayers table that I postyed yesterday.

                                                Regards,
                                                Sean Lahman


                                                2004 AL MVP Voting
                                                Player 1 2 3 4 5 6 7 8 9 10 Points
                                                ----------------------------------------------------
                                                V Guerrero 21 5 1 1 0 0 0 0 0 0 354
                                                G Sheffield 5 8 9 4 2 0 0 0 0 0 254
                                                M Ramirez 1 14 9 2 2 0 0 0 0 0 238
                                                D Ortiz 1 0 5 9 2 5 5 0 0 0 174
                                                M Tejada 0 1 0 6 2 7 2 4 2 1 123
                                                J Santana 0 0 2 1 7 5 4 2 2 1 117
                                                I Suzuki 0 0 1 0 6 2 6 4 1 6 98
                                                M Young 0 0 1 4 2 2 4 4 3 0 92
                                                M Rivera 0 0 0 0 2 3 3 3 3 5 59
                                                I Rodriguez 0 0 0 1 1 1 0 2 5 2 36
                                                C Schilling 0 0 0 0 0 0 2 2 0 0 14
                                                J Nathan 0 0 0 0 1 1 0 0 0 1 12
                                                D Jeter 0 0 0 0 1 0 0 1 1 0 11
                                                M Kotsay 0 0 0 0 0 1 0 0 1 1 8
                                                A Rodriguez 0 0 0 0 0 0 0 2 1 0 8
                                                J Damon 0 0 0 0 0 0 0 1 2 0 7
                                                P Konerko 0 0 0 0 0 0 0 0 1 5 7
                                                H Blalock 0 0 0 0 0 1 0 0 0 0 5
                                                M Mora 0 0 0 0 0 0 0 1 1 0 5
                                                M Teixeira 0 0 0 0 0 0 0 1 0 2 5
                                                T Hunter 0 0 0 0 0 0 1 0 0 0 4
                                                V Martinez 0 0 0 0 0 0 1 0 0 0 4
                                                E Durazo 0 0 0 0 0 0 0 1 0 0 3
                                                F Cordero 0 0 0 0 0 0 0 0 1 0 2
                                                L Ford 0 0 0 0 0 0 0 0 1 0 2
                                                C Guillen 0 0 0 0 0 0 0 0 1 0 2
                                                T Hafner 0 0 0 0 0 0 0 0 1 0 2
                                                H Matsui 0 0 0 0 0 0 0 0 1 0 2
                                                C Figgins 0 0 0 0 0 0 0 0 0 2 2
                                                E Chavez 0 0 0 0 0 0 0 0 0 1 1
                                                J Varitek 0 0 0 0 0 0 0 0 0 1 1
                                                -------------------------------------------------
                                              • Sean Forman
                                                ... I ve got all of these taken care of. There *are* however about 258 final games that I m missing. I m also cleaning up some previously erroneous debuts
                                                Message 23 of 28 , Nov 19, 2004
                                                View Source
                                                • 0 Attachment
                                                  Sean Lahman wrote:
                                                  > I noticed today that there are about 800 players in the master table who
                                                  > don't have a debut date. On closer inspection, most of these players made
                                                  > their debut between 1998 and 2001. Right now, there are less than two dozen
                                                  > players in the table with a value in the debut field for that five year
                                                  > period.
                                                  >
                                                  > I'll take a look at resolving this issue, but just wanted to alert everybody
                                                  > to the problem.
                                                  >
                                                  > Regards,
                                                  > Sean Lahman


                                                  I've got all of these taken care of. There *are* however about 258
                                                  final games that I'm missing. I'm also cleaning up some previously
                                                  erroneous debuts and final games as well.
                                                  --
                                                  Sincerely,
                                                  Sean Forman

                                                  Baseball Stats! http://www.Baseball-Reference.com/
                                                • Paul Wendt
                                                  19 Nov 2004, Sean Forman wrote: . . . ... games as well. Once this Spring, the Biographical Database was missing 96 Last Games, all from
                                                  Message 24 of 28 , Dec 2, 2004
                                                  View Source
                                                  • 0 Attachment
                                                    19 Nov 2004, Sean Forman <sean-forman@b...> wrote:
                                                    . . .
                                                    > There *are* however about 258 final games that I'm missing.
                                                    > I'm also cleaning up some previously erroneous debuts and final
                                                    games as well.

                                                    Once this Spring, the Biographical Database was missing 96 Last
                                                    Games, all from 1876-1880 (not counting 2003). I believe that more
                                                    than 50 are still missing.

                                                    This Summer and Fall, the Biog. newsletter should be reporting
                                                    numerous small corrections in debut dates and perhaps lastgame dates
                                                    from the 1870s-80s. I don't know the update lag and I am a few months
                                                    behind on
                                                    the other SABR research cmte newsletters.

                                                    Paul Wendt
                                                  Your message has been successfully submitted and would be delivered to recipients shortly.