Loading ...
Sorry, an error occurred while loading the content.

Player IDs

Expand Messages
  • Tangotiger
    http://www.insidethebook.com/ee/index.php/site/article/player_id_mappings_2009/ Tom ... The Book--Playing The Percentages In Baseball
    Message 1 of 12 , Nov 26, 2009
    • 0 Attachment
      http://www.insidethebook.com/ee/index.php/site/article/player_id_mappings_2009/

      Tom


      ---------------------------------------------
      The Book--Playing The Percentages In Baseball
      http://www.InsideTheBook.com
    • Rod Nelson
      Hi Tom - this seems to be a perpetual problem.. Either that, or I m clueless. burneaj01 burnea.01
      Message 2 of 12 , Nov 26, 2009
      • 0 Attachment
        Hi Tom - this seems to be a perpetual problem..  Either that, or I'm clueless.

        burneaj01    burnea.01  http://www.baseball-reference.com/players/b/burnea.01.shtml
        dickera01    dicker01
        drewjd01    drewj.01
        furcara01    furcara02
        garcifr02    garcifr03
        harriwi01    harriwi02
        jonesja04    jonesja05
        lopezro01    lopezro02
        pierzaj01    pierza.01
        romerjc01    romerj.01
        ryanbj01    ryanb.01
        sabatcc01    sabatc.01
        santajo01    santajo02

        Rod Nelson

        On Fri, Nov 27, 2009 at 12:45 AM, Tangotiger <tom@...> wrote:
        http://www.insidethebook.com/ee/index.php/site/article/player_id_mappings_2009/

        Tom


        ---------------------------------------------
        The Book--Playing The Percentages In Baseball
        http://www.InsideTheBook.com




        ------------------------------------

        http://www.baseball-databank.org/Yahoo! Groups Links

        <*> To visit your group on the web, go to:
           http://groups.yahoo.com/group/baseball-databank/

        <*> Your email settings:
           Individual Email | Traditional

        <*> To change settings online go to:
           http://groups.yahoo.com/group/baseball-databank/join
           (Yahoo! ID required)

        <*> To change settings via email:
           baseball-databank-digest@yahoogroups.com
           baseball-databank-fullfeatured@yahoogroups.com

        <*> To unsubscribe from this group, send an email to:
           baseball-databank-unsubscribe@yahoogroups.com

        <*> Your use of Yahoo! Groups is subject to:
           http://docs.yahoo.com/info/terms/




        --
        Rod Nelson
      • Tangotiger
        I m not sure I understand the problem. The player_id in the bdb is not the same as the player_id on b-r.com. Indeed, there is a b-r.com id in the Master
        Message 3 of 12 , Nov 27, 2009
        • 0 Attachment
          I'm not sure I understand the problem. The player_id in the bdb is not
          the same as the player_id on b-r.com. Indeed, there is a b-r.com id in
          the Master table. I presume if you look there, it might resolve whatever
          issue you are seeing?

          Tom


          > Hi Tom - this seems to be a perpetual problem.. Either that, or I'm
          > clueless.
          >
          > burneaj01 burnea.01
          > http://www.baseball-reference.com/players/b/burnea.01.shtml
          > dickera01 dicker01
          > drewjd01 drewj.01
          > furcara01 furcara02
          > garcifr02 garcifr03
          > harriwi01 harriwi02
          > jonesja04 jonesja05
          > lopezro01 lopezro02
          > pierzaj01 pierza.01
          > romerjc01 romerj.01
          > ryanbj01 ryanb.01
          > sabatcc01 sabatc.01
          > santajo01 santajo02
          >
          > Rod Nelson
          >
          > On Fri, Nov 27, 2009 at 12:45 AM, Tangotiger <tom@...> wrote:
          >
          >>
          >> http://www.insidethebook.com/ee/index.php/site/article/player_id_mappings_2009/
          >>
          >> Tom
          >>
          >>
          >> ---------------------------------------------
          >> The Book--Playing The Percentages In Baseball
          >> http://www.InsideTheBook.com
          >>
          >>
          >>
          >>
          >> ------------------------------------
          >>
          >> http://www.baseball-databank.org/Yahoo! Groups Links
          >>
          >>
          >>
          >>
          >
          >
          > --
          > Rod Nelson
          >


          ---------------------------------------------
          The Book--Playing The Percentages In Baseball
          http://www.InsideTheBook.com
        • Robert Gebeloff
          ... I asked a question a while back and don t believe it was ever answered, so let me try again -- Is the LahmanID gone for good? I have used the master player
          Message 4 of 12 , Dec 30, 2015
          • 0 Attachment
            First of all, thanks much for all the work that goes into this.

            I asked a question a while back and don't believe it was ever answered, so let me try again -- Is the LahmanID gone for good?

            I have used the master player table in my strat-o-matic league's historical database for years, and have used the lahmanid as my primary key.

            I noticed last year that it was no longer included in the master table, so I created "temporary" numeric IDs for all of the new players, but if this is going to be a permanent change, I will need to rethink my architecture.

            In my years of working with databases, I've always been taught that having a numeric primary key is a best practice, and it also is the default standard in Ruby on Rails, which is my league's Web platform.

            I bring this up not as a complaint - I greatly appreciate having this resource in any form -- but I'm just seeking information about your plans.

            Thanks,
            Rob

          • Theodore Turocy
            Rob and all, ... I wonÆt be providing it, as it is redundant. Whether Sean will provide it in the downstream Lahman database is a question for him to answer.
            Message 5 of 12 , Dec 30, 2015
            • 0 Attachment
              Rob and all,

              On Dec 30, 2015, at 8:09 PM, Robert Gebeloff rob@... [baseball-databank] <baseball-databank@yahoogroups.com> wrote:

              > I asked a question a while back and don't believe it was ever answered, so let me try again -- Is the LahmanID gone for good?

              I won’t be providing it, as it is redundant. Whether Sean will provide it in the downstream Lahman database is a question for him to answer.

              > In my years of working with databases, I've always been taught that having a numeric primary key is a best practice

              OK, I’ll bite. *Why*?

              Ted
              --
              Dr Theodore L Turocy -- Chadwick Baseball Bureau -- ted.turocy@...
              Web: http://www.chadwick-bureau.com -- Twitter: @chadwickbureau
            • aidanshealy
              ... I would bet it is around the classic INT vs CHAR index search battle. I am not sure what the industry says these days.
              Message 6 of 12 , Dec 30, 2015
              • 0 Attachment
                > OK, I’ll bite. *Why*?

                I would bet it is around the classic INT vs CHAR index search battle. I am not sure what the industry says these days.
              • Mike Emeigh
                Most benchmarking studies show little difference between INT and CHAR. The main reason for using INT keys is mostly for joins where the key is used as a
                Message 7 of 12 , Dec 30, 2015
                • 0 Attachment
                  Most benchmarking studies show little difference between INT and CHAR. The main reason for using INT keys is mostly for joins where the key is used as a foreign key.

                  In my own work I use natural keys wherever I can; I try not to invent a key that has no meaning within the table.

                  Mike Emeigh
                  Sent from my iPad

                  On Dec 30, 2015, at 16:51, aidanshealy@... [baseball-databank] <baseball-databank@yahoogroups.com> wrote:

                   

                  > OK, I’ll bite. *Why*?


                  I would bet it is around the classic INT vs CHAR index search battle. I am not sure what the industry says these days.

                • Theodore Turocy
                  ... Well, right - a machine int is always going to be faster for FK purposes than a longer string, where a comparison is going to require multiple
                  Message 8 of 12 , Dec 30, 2015
                  • 0 Attachment
                    On Dec 31, 2015, at 12:11 AM, Mike Emeigh mwe55innc@... [baseball-databank] <baseball-databank@yahoogroups.com> wrote:

                    >
                    > Most benchmarking studies show little difference between INT and CHAR. The main reason for using INT keys is mostly for joins where the key is used as a foreign key.
                    >
                    > In my own work I use natural keys wherever I can; I try not to invent a key that has no meaning within the table.
                    >
                    >
                    > Sent from my iPad
                    >
                    > On Dec 30, 2015, at 16:51, aidanshealy@... [baseball-databank] <baseball-databank@yahoogroups.com> wrote:
                    >
                    >>
                    >> > OK, I’ll bite. *Why*?
                    >>
                    >>
                    >> I would bet it is around the classic INT vs CHAR index search battle. I am not sure what the industry says these days.
                    >>
                    >

                    Well, right - a machine int is always going to be faster for FK purposes than a longer string, where a comparison is going to require multiple instructions.

                    However:
                    * On a dataset of this size, the performance difference is going to be so tiny on a modern processor that it would only make a difference if you were, e.g., serving a website and hitting the database live all the time;

                    * Again, there’s a tacit assumption here that database == RDBMS, which is fallacious. It’s completely fine to use a RDBMS, and it’s completely fine to choose to optimise your tables so your joins are as fast as they can be, if one’s application calls for it. For those purposes, a machine int key is a completely reasonable design choice, and could be considered a best practice for that setting. The fallacy is that this does not imply that it is a best practice for *any* representation of data. — after all, once you’ve serialised the data to text, those integer keys are in fact text strings.


                    I am an advocate of distinguishing between “public” and “internal” identifiers. Human-friendly identifiers are ideal for “public” purposes, because they are easiest for us to look at, understand what’s in a file, and to suss errors. Having an internal system of identifiers within one’s own application is useful because one can’t always assume your external data source(s) will have issued an identifier for, say, a person, by the time you need it in your system. If that’s your use case, it’s far more flexible to have an internal record identifier (which can very well be an integer), and then a cross-reference against the other identifier system(s) you might want to import data from. Then, you only need to deal with the text-string identifiers once, at load time, when you do the mapping; once you’ve done them mapping, your RDBMS can work entirely with your internal integers.

                    Ted
                    --
                    Dr Theodore L Turocy -- Chadwick Baseball Bureau -- ted.turocy@...
                    Web: http://www.chadwick-bureau.com -- Twitter: @chadwickbureau
                  • Geb Net
                    Thanks for response -- I am using the lahmanid only because of conventions (Rails in particular), not because of performance or ideology. I will adapt,
                    Message 9 of 12 , Dec 30, 2015
                    • 0 Attachment
                      Thanks for response -- I am using the lahmanid only because of conventions (Rails in particular), not because of performance or ideology. I will adapt, thanks...






                      Sent from my iPhone


                      > On Dec 30, 2015, at 7:30 PM, Theodore Turocy drarbiter@... [baseball-databank] <baseball-databank@yahoogroups.com> wrote:
                      >
                      >
                      >> On Dec 31, 2015, at 12:11 AM, Mike Emeigh mwe55innc@... [baseball-databank] <baseball-databank@yahoogroups.com> wrote:
                      >>
                      >>
                      >> Most benchmarking studies show little difference between INT and CHAR. The main reason for using INT keys is mostly for joins where the key is used as a foreign key.
                      >>
                      >> In my own work I use natural keys wherever I can; I try not to invent a key that has no meaning within the table.
                      >>
                      >>
                      >> Sent from my iPad
                      >>
                      >>> On Dec 30, 2015, at 16:51, aidanshealy@... [baseball-databank] <baseball-databank@yahoogroups.com> wrote:
                      >>>
                      >>>
                      >>>> OK, I’ll bite. *Why*?
                      >>>
                      >>>
                      >>> I would bet it is around the classic INT vs CHAR index search battle. I am not sure what the industry says these days.
                      >
                      > Well, right - a machine int is always going to be faster for FK purposes than a longer string, where a comparison is going to require multiple instructions.
                      >
                      > However:
                      > * On a dataset of this size, the performance difference is going to be so tiny on a modern processor that it would only make a difference if you were, e.g., serving a website and hitting the database live all the time;
                      >
                      > * Again, there’s a tacit assumption here that database == RDBMS, which is fallacious. It’s completely fine to use a RDBMS, and it’s completely fine to choose to optimise your tables so your joins are as fast as they can be, if one’s application calls for it. For those purposes, a machine int key is a completely reasonable design choice, and could be considered a best practice for that setting. The fallacy is that this does not imply that it is a best practice for *any* representation of data. — after all, once you’ve serialised the data to text, those integer keys are in fact text strings.
                      >
                      >
                      > I am an advocate of distinguishing between “public” and “internal” identifiers. Human-friendly identifiers are ideal for “public” purposes, because they are easiest for us to look at, understand what’s in a file, and to suss errors. Having an internal system of identifiers within one’s own application is useful because one can’t always assume your external data source(s) will have issued an identifier for, say, a person, by the time you need it in your system. If that’s your use case, it’s far more flexible to have an internal record identifier (which can very well be an integer), and then a cross-reference against the other identifier system(s) you might want to import data from. Then, you only need to deal with the text-string identifiers once, at load time, when you do the mapping; once you’ve done them mapping, your RDBMS can work entirely with your internal integers.
                      >
                      > Ted
                      > --
                      > Dr Theodore L Turocy -- Chadwick Baseball Bureau -- ted.turocy@...
                      > Web: http://www.chadwick-bureau.com -- Twitter: @chadwickbureau
                      >
                      >
                      >
                      > ------------------------------------
                      >
                      > ------------------------------------
                      >
                      > http://www.baseball-databank.org/
                      > ------------------------------------
                      >
                      > Yahoo Groups Links
                      >
                      >
                      >
                    • anson2995
                      ... I won’t be providing it, as it is redundant. Whether Sean will provide it in the downstream Lahman database is a question for him to answer. Not planning
                      Message 10 of 12 , Dec 31, 2015
                      • 0 Attachment

                        ---In baseball-databank@yahoogroups.com, <drarbiter@...> wrote :
                        > I asked a question a while back and don't believe it was ever answered, so let me try again -- Is the LahmanID gone for good?

                        I won’t be providing it, as it is redundant. Whether Sean will provide it in the downstream Lahman database is a question for him to answer.

                        Not planning to.  I never thought it made sense.  At some point maybe ten years ago, we decided to change the name of the key field from "LahmanID" to "playerID."  Someone -- perhaps Sean Forman -- then suggested creating a new field called LahmanID which was just an INT.  It was never used in any of the other tables and didn't really serve any purpose except to create confusion. 

                        --Sean
                      Your message has been successfully submitted and would be delivered to recipients shortly.