Loading ...
Sorry, an error occurred while loading the content.
 

Player ID's

Expand Messages
  • Sean Forman
    I think this was bounced the first time ... I would steer clear of copying an existing id system, be it retrosheet, STATS or Elias. Using their system would
    Message 1 of 14 , Mar 31, 2002
      I think this was bounced the first time
      -----------------

      On Sat, 30 Mar 2002, ken_matinale wrote:

      > I would support the consensus but a fixed set of keys must be
      > established and it should consider what the big boys use: Stats,
      > Total Baseball, Elias, whatever. That way data can be shared even if
      > it's only a future possibility.
      >
      > Ken


      I would steer clear of copying an existing id system, be it retrosheet,
      STATS or Elias. Using their system would make us beholden to them to
      release ID's for us to be consistent with their system. I think the
      world
      of Retrosheet, but they are very, very cautious about publicly releasing
      data, information etc. I also feel that the core of people here are as
      much leaders in this field as any other group. Sean L., myself and
      others
      here have probably had our stats viewed by more people than any other
      organization, perhaps other than Topps. I think we can feel safe,
      setting
      our own keys and then issuing cross-reference tables.

      Tom is working on finalizing a retrosheet/lahman ID table. I have a
      table
      of STATS ID's versus my and lahman ID's. The same could be done for
      team
      ID's etc.

      I think we just need to decide on a convention stick with it.

      We may be gnashing our teeth on this too much. The main issues with a
      player ID are.

      Letters or numbers or both

      first how many letters of first or last name.

      ordering of suffix numbers for duplicate letter prefixes

      How about initialed names 'J.D.' or apostrophe'ed last names O'Leary.

      What about short last names: Tam

      How do we decide on new names.

      What about non-player's and ID's (managers and negro leaguers)

      really I think that those are the main issues.

      I would say use

      first letter of first name, first 3-5 letters of last name and then
      order
      them by the date of debut with a 2-number suffix (depends on number of J
      Smith's).

      apostrophe's and any non-letter in the names will be ignored.

      All team ID's should be three letters in length.

      I tend to think that we should leave managers and negro leaguers within
      the normal scope of indices. Or we could do 99 for their suffix.

      Sincerely,
      Sean Forman

      Baseball Stats! http://www.Baseball-Reference.com/
      Baseball Analysis! http://www.BaseballPrimer.com/
    • Sean Lahman
      ... I agree, and I have no objections to the guidelines Sean Forman has proposed. Unless there are significant objections, let s go with those. ... If we
      Message 2 of 14 , Apr 1, 2002
        Sean Forman wrote:
        > I think we just need to decide on a convention
        > stick with it.

        I agree, and I have no objections to the guidelines Sean Forman has proposed.
        Unless there are significant objections, let's go with those.

        > I would steer clear of copying an existing id
        > system, be it retrosheet, STATS or Elias. Using
        > their system would make us beholden to them to
        > release ID's for us to be consistent with their
        > system.

        If we include playerIDs from Elias or TotalBaseball or STATS, they may sue for
        copyright infringement. Whether such a suit has legal merit (and I think not)
        is moot, because if one of these corporate elephants files a cease and desist
        order against Forman or Lahman, it'll be the last time you ever hear from us.

        Regards,
        Sean Lahman
      • ken_matinale
        Hi, Player ID: If using non-numeric, why not concatonate name and birthday? That should produce both an easily recognizable ID and in most cases a unique ID.
        Message 3 of 14 , Apr 1, 2002
          Hi,

          Player ID: If using non-numeric, why not concatonate name and
          birthday? That should produce both an easily recognizable ID and in
          most cases a unique ID. If not unique, then add something to make it
          so. The only downside is that it would be long, maybe 30 charcters.
          Even if a player's birthday changed, the ID would remain. Players
          without birthday's could still be OK as long as the ID is unique.ar

          Hank Aaron would have: hankaaron02/05/1934. Kind of catchy.


          Team ID: Why not the most common name for that team and get away from
          2-3 character abreviation for place plus one character for league?
          The Brewers switching leagues should be enough to show the potential
          problem. Just use Brewers. For the Yankees use Yankees. Have a
          seperate table that relates Yankees to Highlanders in certain years.
          I would donate such a table.

          Ken

          --- In baseball-databank@y..., Sean Lahman <slahman@b...> wrote:
          > Sean Forman wrote:
          > > I think we just need to decide on a convention
          > > stick with it.
          >
          > I agree, and I have no objections to the guidelines Sean Forman has
          proposed.
          > Unless there are significant objections, let's go with those.
          >
          > > I would steer clear of copying an existing id
          > > system, be it retrosheet, STATS or Elias. Using
          > > their system would make us beholden to them to
          > > release ID's for us to be consistent with their
          > > system.
          >
          > If we include playerIDs from Elias or TotalBaseball or STATS, they
          may sue for
          > copyright infringement. Whether such a suit has legal merit (and I
          think not)
          > is moot, because if one of these corporate elephants files a cease
          and desist
          > order against Forman or Lahman, it'll be the last time you ever
          hear from us.
          >
          > Regards,
          > Sean Lahman
        • ken_matinale
          Hi, Player ID: If using non-numeric, why not concatonate name and birthday? That should produce both an easily recognizable ID and in most cases a unique ID.
          Message 4 of 14 , Apr 1, 2002
            Hi,

            Player ID: If using non-numeric, why not concatonate name and
            birthday? That should produce both an easily recognizable ID and in
            most cases a unique ID. If not unique, then add something to make it
            so. The only downside is that it would be long, maybe 30 charcters.
            Even if a player's birthday changed, the ID would remain. Players
            without birthday's could still be OK as long as the ID is unique.ar

            Hank Aaron would have: hankaaron02/05/1934. Kind of catchy.


            Team ID: Why not the most common name for that team and get away from
            2-3 character abreviation for place plus one character for league?
            The Brewers switching leagues should be enough to show the potential
            problem. Just use Brewers. For the Yankees use Yankees. Have a
            seperate table that relates Yankees to Highlanders in certain years.
            I would donate such a table.

            Ken

            --- In baseball-databank@y..., Sean Lahman <slahman@b...> wrote:
            > Sean Forman wrote:
            > > I think we just need to decide on a convention
            > > stick with it.
            >
            > I agree, and I have no objections to the guidelines Sean Forman has
            proposed.
            > Unless there are significant objections, let's go with those.
            >
            > > I would steer clear of copying an existing id
            > > system, be it retrosheet, STATS or Elias. Using
            > > their system would make us beholden to them to
            > > release ID's for us to be consistent with their
            > > system.
            >
            > If we include playerIDs from Elias or TotalBaseball or STATS, they
            may sue for
            > copyright infringement. Whether such a suit has legal merit (and I
            think not)
            > is moot, because if one of these corporate elephants files a cease
            and desist
            > order against Forman or Lahman, it'll be the last time you ever
            hear from us.
            >
            > Regards,
            > Sean Lahman
          • Holmes, Dan
            the problem with using Yankees or Brewers is that many teams have been known under different names and it s difficult to pinpoint when the team name became
            Message 5 of 14 , Apr 1, 2002
              the problem with using Yankees or Brewers is that many teams have been known
              under different names and it's difficult to pinpoint when the team name
              became "official." for example, the washington senators were actually the
              Nationals but were called Senators by the newspapers (TSN used both). the
              Yankees were the Highlanders, and the Red Sox were the Pilgrims, Somersets,
              and Americans. Every long-standing franchise has this history.

              Dan

              -----Original Message-----
              From: ken_matinale [mailto:ken_matinale@...]
              Sent: Monday, April 01, 2002 10:20 AM
              To: baseball-databank@yahoogroups.com
              Subject: [baseball-databank] Re: Player ID's


              Hi,

              Player ID: If using non-numeric, why not concatonate name and
              birthday? That should produce both an easily recognizable ID and in
              most cases a unique ID. If not unique, then add something to make it
              so. The only downside is that it would be long, maybe 30 charcters.
              Even if a player's birthday changed, the ID would remain. Players
              without birthday's could still be OK as long as the ID is unique.ar

              Hank Aaron would have: hankaaron02/05/1934. Kind of catchy.


              Team ID: Why not the most common name for that team and get away from
              2-3 character abreviation for place plus one character for league?
              The Brewers switching leagues should be enough to show the potential
              problem. Just use Brewers. For the Yankees use Yankees. Have a
              seperate table that relates Yankees to Highlanders in certain years.
              I would donate such a table.

              Ken

              --- In baseball-databank@y..., Sean Lahman <slahman@b...> wrote:
              > Sean Forman wrote:
              > > I think we just need to decide on a convention
              > > stick with it.
              >
              > I agree, and I have no objections to the guidelines Sean Forman has
              proposed.
              > Unless there are significant objections, let's go with those.
              >
              > > I would steer clear of copying an existing id
              > > system, be it retrosheet, STATS or Elias. Using
              > > their system would make us beholden to them to
              > > release ID's for us to be consistent with their
              > > system.
              >
              > If we include playerIDs from Elias or TotalBaseball or STATS, they
              may sue for
              > copyright infringement. Whether such a suit has legal merit (and I
              think not)
              > is moot, because if one of these corporate elephants files a cease
              and desist
              > order against Forman or Lahman, it'll be the last time you ever
              hear from us.
              >
              > Regards,
              > Sean Lahman



              http://www.baseball-databank.org/

              To unsubscribe from this group, send an email to:
              baseball-databank-unsubscribe@yahoogroups.com



              Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
            • Darren Munk
              As someone who hasn t posted to the group before, my opinion may not carry too much weight, but here s my input on the ID subject. Strictly from a programming
              Message 6 of 14 , Apr 1, 2002
                As someone who hasn't posted to the group before, my opinion may not carry
                too much weight, but here's my input on the ID subject. Strictly from a
                programming and database performance perspective, I would recommend going
                with a numeric ID. Unless I'm mistaken about the purpose of this field, the
                IDs don't really need to be meaningful, since we have the other columns of
                data to tell us about the record. If that's the case, you could just set
                the field up as a sequential or random Autonumber in Access and be done with
                it, without having to run the queries it would take to create a concatenated
                string ID. However, maybe what you guys are really intending is that the ID
                field also be kind of a "quick reference" to the table, by which people who
                understand the convention can quickly find a record. Even if that's the
                case, I would caution against getting too crazy with the types of characters
                used in the ID field. Someone mentioned the possibilities of things like
                apostrophes showing up in this field, which could be catastrophic (not to
                mention frustrating and nearly impossible to pin down) from a programming
                standpoint. Every programming language has different special characters
                that it handles in different ways, which is another reason to go with
                numeric keys unless there's a compelling reason otherwise.

                To sum up though, for doing JOIN queries, database INSERTs and UPDATEs, and
                for creating what Access calls "Relationships" (FOREIGN KEY constraints), it
                is better performance-wise to use a numeric key.

                Sorry, I'll shut up now. =)

                --Darren

                _________________________________________________________________
                Join the world�s largest e-mail service with MSN Hotmail.
                http://www.hotmail.com
              • Derek Adair
                Personally, I think all the points you make are good ones, especially the one about apostrophes. However, one advantage of predictable, alphanumeric keys is
                Message 7 of 14 , Apr 1, 2002
                  Personally, I think all the points you make are good ones, especially the
                  one about apostrophes. However, one advantage of predictable, alphanumeric
                  keys is that you can, with fairly good certainty, know what a rookie's ID
                  is. I was able to put together 2001 stats files based on other data
                  sources before the 4.5 release came out, and now, with a few exceptions
                  here and there, I had no real updates to do. If the ID's were numeric
                  only, then I would have to do a lot of work post-release to convert my
                  ID's to the new ones. I'd imagine a lot of people on this list were in the
                  same situation.

                  Another advantage I've found is proofing. Probably the most common cause
                  of errors that I've found have to do with ID's - brownro02 instead of
                  brownro01, for example. Doing a grep (or search) for brownro0 across the
                  database makes it easy to spot the patterns of the mistakes and where
                  things are lined up incorrectly. I'd imagine this would get more difficult
                  if we were strictly numeric.

                  Finally, I think it's easier to write a script that would replace all the
                  alphanumeric ID's with an increment than it would be to go the other way.
                  If anyone would want such a script, I can put one together in perl for
                  you.

                  Regards,
                  Derek

                  On Mon, 1 Apr 2002, Darren Munk wrote:

                  > As someone who hasn't posted to the group before, my opinion may not carry
                  > too much weight, but here's my input on the ID subject. Strictly from a
                  > programming and database performance perspective, I would recommend going
                  > with a numeric ID. Unless I'm mistaken about the purpose of this field, the
                  > IDs don't really need to be meaningful, since we have the other columns of
                  > data to tell us about the record. If that's the case, you could just set
                  > the field up as a sequential or random Autonumber in Access and be done with
                  > it, without having to run the queries it would take to create a concatenated
                  > string ID. However, maybe what you guys are really intending is that the ID
                  > field also be kind of a "quick reference" to the table, by which people who
                  > understand the convention can quickly find a record. Even if that's the
                  > case, I would caution against getting too crazy with the types of characters
                  > used in the ID field. Someone mentioned the possibilities of things like
                  > apostrophes showing up in this field, which could be catastrophic (not to
                  > mention frustrating and nearly impossible to pin down) from a programming
                  > standpoint. Every programming language has different special characters
                  > that it handles in different ways, which is another reason to go with
                  > numeric keys unless there's a compelling reason otherwise.
                  >
                  > To sum up though, for doing JOIN queries, database INSERTs and UPDATEs, and
                  > for creating what Access calls "Relationships" (FOREIGN KEY constraints), it
                  > is better performance-wise to use a numeric key.
                  >
                  > Sorry, I'll shut up now. =)
                  >
                  > --Darren
                • tmasc
                  ... Key: I agree that the problem is not how to make the key. The important thing is to have a STABLE key. I should be able to make joins and create an index
                  Message 8 of 14 , Apr 1, 2002
                    --- In baseball-databank@y..., Sean Forman <sean-forman@b...> wrote:
                    > Tom is working on finalizing a retrosheet/lahman ID table. I have a
                    > table
                    > of STATS ID's versus my and lahman ID's. The same could be done for
                    > team
                    > ID's etc.
                    >
                    > I think we just need to decide on a convention stick with it.
                    >
                    > We may be gnashing our teeth on this too much.

                    Key: I agree that the problem is not how to make the key. The
                    important thing is to have a STABLE key. I should be able to make
                    joins and create an index on a table without having to worry about
                    the referential integrity of the database.

                    Updates: I urge people who are updating the database to keep track of
                    their changes, preferably by posting them here as SQL commands
                    (whether Access 2000 version, or standard version). You can keep
                    running my queries to find all the problems, and hopefully be able to
                    find fixes for them.

                    XREF: As for finalizing the XRef table between Retro and Lahman, I
                    will put this on hold until we have corrected the Lahman DB. The
                    XRef table only works if both sources have keys that are stable.

                    IP: we also talked about this in the past. For users who wish to do
                    calcs with IP, I would use Keith W's suggestion, and create an "outs"
                    (or thirds of an inning) field. A pitcher with 200 IP has 600 outs.
                    200.1 IP has 601 outs. This will make "sum" and other calcs more
                    accurate.

                    Thanks, Tom
                  • Paul Wendt
                    Hi, Now shorn of my impressive signature ;-) let me show off my ignorance on another matter. I do know (I think) that microsoft departs from all standards in
                    Message 9 of 14 , Apr 2, 2002
                      Hi,
                      Now shorn of my impressive signature ;-) let me show off my ignorance
                      on another matter. I do know (I think) that microsoft departs from all
                      standards in hopes of replacing them.

                      1 Apr 2002, tmasc wrote "Re: [baseball-databank] Re: Player ID's"

                      > Updates: I urge people who are updating the database to keep track of
                      > their changes, preferably by posting them here as SQL commands
                      > (whether Access 2000 version, or standard version). You can keep
                      > running my queries to find all the problems, and hopefully be able to
                      > find fixes for them.

                      Should patches be distributed in two different dialects of SQL?
                      This process seems contrary to the approach adopted concerning the
                      substance of the database, which aims for standardization.

                      (patch, process & substance
                      - excuse me if these terms are discomforting here)

                      ----Paul

                      Paul Wendt, Watertown MA, USA <pgw@...>
                      near Boston
                    • tmasc
                      Since Sean Lahman distributed in Access 2000 format, this is what I am using to do the updates. If Oracle or SQL Server treat the ( ) differently, and they
                      Message 10 of 14 , Apr 2, 2002
                        Since Sean Lahman distributed in Access 2000 format, this is what I
                        am using to do the updates.

                        If Oracle or SQL Server treat the (') differently, and they need
                        an "escape" character, than perhaps someone who uses those databases
                        should alert the readers here.

                        I provide them in Access format because after I run them, I know that
                        they work, and then cut/paste them here for all to use.

                        I'm keeping track of all my modules, so whenever we complete the
                        integrity check, I will supply the Module Patches to everyone here.

                        Thanks, Tom
                      • kjokbaseball
                        ... THANK YOU! THANK YOU! THANK YOU! KJOK
                        Message 11 of 14 , Apr 2, 2002
                          --- In baseball-databank@y..., "tmasc" <tmasc@y...> wrote:
                          >............................
                          > I'm keeping track of all my modules, so whenever we complete the
                          > integrity check, I will supply the Module Patches to everyone here.
                          >
                          > Thanks, Tom

                          THANK YOU! THANK YOU! THANK YOU!

                          KJOK
                        • Paul Wendt
                          Here is the gist of what I posted under this title, 2 Apr 2002. ... . . . ... Four days later, I would like to say that when I wrote this, I imagined that
                          Message 12 of 14 , Apr 6, 2002
                            Here is the gist of what I posted under this title, 2 Apr 2002.

                            > Hi,
                            > Now shorn of my impressive signature ;-) let me show off my ignorance
                            . . .
                            > Should patches be distributed in two different dialects of SQL?
                            > This process seems contrary to the approach adopted concerning the
                            > substance of the database, which aims for standardization.

                            Four days later, I would like to say that when I wrote this, I imagined
                            that people would rely on patches for "a long time" before the next
                            release. In that context, a library of patches in different dialects
                            seemed wrong to me. Now I guess that the next version 4.x will follow in
                            "a short time", so my concern is insignificant.

                            --Paul
                          • Sean Lahman
                            ... I m hoping to have another release before the end of the month. --SL
                            Message 13 of 14 , Apr 6, 2002
                              Paul Wendt wrote:
                              Four days later, I would like to say that when I wrote this, I imagined that people would rely on patches for "a long time" before the next release.  In that context, a library of patches in different dialects seemed wrong to me.  Now I guess that the next version 4.x will follow in "a short time", so my concern is insignificant.
                              I'm hoping to have another release before the end of the month.

                              --SL

                            • nxnn14
                              Hi everyone, I was wondering if anyone has an MLB_AM ID to Lahman_ID or Retro_ID table, as I didn t see one like this on the files page. Also, does anyone
                              Message 14 of 14 , May 12, 2008
                                Hi everyone,

                                I was wondering if anyone has an MLB_AM ID to Lahman_ID or Retro_ID table, as I didn't see
                                one like this on the files page. Also, does anyone have or know where I can obtain historical
                                information about the June Draft. I already have a full historical database of the draft, but
                                would be interested in obtaining extra information such as hometowns for all, or at least
                                many, of the players drafted.

                                Thanks,

                                Nick
                              Your message has been successfully submitted and would be delivered to recipients shortly.