Loading ...
Sorry, an error occurred while loading the content.

New Database

Expand Messages
  • ucraimx
    I have a few questions: 1. Why is there now a HOF/Manager ID tied to those players? Maybe I m missing it but it seems redundant. 2. Why was the awards table
    Message 1 of 9 , Aug 27, 2002
    View Source
    • 0 Attachment
      I have a few questions:

      1. Why is there now a HOF/Manager ID tied to those players? Maybe I'm
      missing it but it seems redundant.

      2. Why was the awards table split up? As a system admin, it seems
      that the fewer tables to manage, the better.

      3. Why was debutyear dropped from master table?

      4. There are some new fields (Post season GIDP for example) that are
      blank. Is anyone working on that data?

      5. Opponent batting average is 2 places (ie .25). Where is the third
      digit?
    • tom tom
      I just want to answer generally some of your points... ... I don t look at that table, but the assertion that the fewer the tables the better is one that does
      Message 2 of 9 , Aug 27, 2002
      View Source
      • 0 Attachment
        I just want to answer generally some of your points...
        --- ucraimx <ucraimx@...> wrote:
        > I have a few questions:
        >
        > 2. Why was the awards table split up? As a system
        > admin, it seems
        > that the fewer tables to manage, the better.
        >
        > 3. Why was debutyear dropped from master table?
        >

        I don't look at that table, but the assertion that the
        fewer the tables the better is one that does not wash.
        As an administrator, you want normalized data. You
        want to remove redundancy and ensure integrity.

        Debut year is a perfect example. This is a calculated
        field. A field that can be derived from one or more
        fields has no place in a normalized database (like
        batting average, slg avg, ERA). The reason is that if
        you insert a new record, say Tim Raines actually
        played in 1978, then you have to worry about updating
        some other field in some other table to change 1979 to
        1978.

        However, for "user-friendly" databases, it makes sense
        to show ERA, BA, SLG because it is more efficient to
        have those popular calculated fields, rather than
        running a query/view each time.

        This is why there should be an admin DB that is the
        base. And from that, you can generate an "efficient
        user-friendly" database.

        I believe, though I don't remember, that we are
        targetting the admin crowd, and it's up to them to
        generate the user-friendly version for the masses.

        Check out past posts on this topic in the yahoo group.

        Thanks, Tom

        __________________________________________________
        Do You Yahoo!?
        Yahoo! Finance - Get real-time stock quotes
        http://finance.yahoo.com
      • ucraimx
        Good points. Guess I was looking more at the User-friendly piece than the admin side. Mike Crain
        Message 3 of 9 , Aug 27, 2002
        View Source
        • 0 Attachment
          Good points. Guess I was looking more at the User-friendly piece than
          the admin side.

          Mike Crain

          --- In baseball-databank@y..., tom tom <tmasc@y...> wrote:
          > I just want to answer generally some of your points...
          > --- ucraimx <ucraimx@y...> wrote:
          > > I have a few questions:
          > >
          > > 2. Why was the awards table split up? As a system
          > > admin, it seems
          > > that the fewer tables to manage, the better.
          > >
          > > 3. Why was debutyear dropped from master table?
          > >
          >
          > I don't look at that table, but the assertion that the
          > fewer the tables the better is one that does not wash.
          > As an administrator, you want normalized data. You
          > want to remove redundancy and ensure integrity.
          >
          > Debut year is a perfect example. This is a calculated
          > field. A field that can be derived from one or more
          > fields has no place in a normalized database (like
          > batting average, slg avg, ERA). The reason is that if
          > you insert a new record, say Tim Raines actually
          > played in 1978, then you have to worry about updating
          > some other field in some other table to change 1979 to
          > 1978.
          >
          > However, for "user-friendly" databases, it makes sense
          > to show ERA, BA, SLG because it is more efficient to
          > have those popular calculated fields, rather than
          > running a query/view each time.
          >
          > This is why there should be an admin DB that is the
          > base. And from that, you can generate an "efficient
          > user-friendly" database.
          >
          > I believe, though I don't remember, that we are
          > targetting the admin crowd, and it's up to them to
          > generate the user-friendly version for the masses.
          >
          > Check out past posts on this topic in the yahoo group.
          >
          > Thanks, Tom
          >
          > __________________________________________________
          > Do You Yahoo!?
          > Yahoo! Finance - Get real-time stock quotes
          > http://finance.yahoo.com
        • SABRscouts@aol.com
          All, As much as I appreciate the enormous contributions that Tango Tom continues to make here, I have to respectfully take exception to the statement that
          Message 4 of 9 , Aug 27, 2002
          View Source
          • 0 Attachment
            All,
            As much as I appreciate the enormous contributions that Tango Tom continues to make here, I have to respectfully take exception to the statement that Debut Year is a redundant piece of data.  Technically yes - the Year can be derived from the existing statistical record but I would contend that the Debut Date (MM/DD/YY) should be considered to be a very significant piece of the biographical record.  Just this week Sean Lahman posed the question to SABR-L regarding the oft-repeated fallacy that Joe Nuxhall was the youngest player to make a MLB debut.  Simply put, dates matter.   Specificity in that field is by no means always available, and if mapping the Year (and Debut Ballclub, for that matter) from the playing record is how  adminstrators choose to populate the field, it should certainly not be left as a Null.  Retrosheeters are currently in the process of compiling Last Game data as well as a comprehensive log of roster  transactions, the value of which to future analysis is yet to be understood or appreciated by our best and brightest.  I imagine we all share the hope that one day minor league statistics will be able to be linked to the well documented major league record.  In time, I intend to introduce draft year and round and signing dates as part of the Scouts Committee's Who-Signed-Who project.  Everyone on this forum has database expertise superior to my own, but I haven't lost sight of the forest for the trees.  Debut Date is very significant.  Ask any ballplayer, it's the day that dreams come true...

            Rod Nelson

            In a message dated 8/27/02, tom tom <tmasc@...> writes:

            As an administrator, you want normalized data.  You want to remove redundancy and ensure integrity.

            Debut year is a perfect example.  This is a calculated field.  A field that can be derived from one or more fields has no place in a normalized database (like
            batting average, slg avg, ERA).  The reason is that if you insert a new record, say Tim Raines actually played in 1978, then you have to worry about updating
            some other field in some other table to change 1979 to 1978.

            However, for "user-friendly" databases, it makes sense to show ERA, BA, SLG because it is more efficient to have those popular calculated fields, rather than
            running a query/view each time.

            This is why there should be an admin DB that is the base.  And from that, you can generate an "efficient user-friendly" database.


          • tom tom
            Debut year is redundant, but debut date (yy.mm.dd) is not. Tom Ruane s enormous transaction database is a good example of not having to worry about having a
            Message 5 of 9 , Aug 28, 2002
            View Source
            • 0 Attachment
              Debut year is redundant, but debut date (yy.mm.dd) is
              not.

              Tom Ruane's enormous transaction database is a good
              example of not having to worry about having a field
              called "debut date". He simply shows all the records
              of when someone entered a ballclub. He doesn't need
              to have an extra field called debut date, because it
              can be derived. (We couldn't use this for us because
              his trxn database consider minors did be part of the
              ballclub).

              Certainly, the debut date is important. But since it
              can be derived from the data source in question (Tom
              R's db), then there is no need to have a field in the
              database for it.

              For our database (bdb), we only show the year portion
              of when someone entered. The Lahman database in the
              past only showed the debut year, and never the whole
              date (as far as I know). If the whole date is being
              asked for, then it can be included, and would not be
              redundant.

              Again, just as I put out a query to determine a
              player's primary position by year (an important piece
              of information to be sure), we can also derive a query
              to determine a player's debut year.

              As Sean F I believe mentioned, he sees the
              baseball-databank.org site growing as a placeholder
              where we can create all these various queries and
              reports to produce this information.

              Thanks, Tom



              __________________________________________________
              Do You Yahoo!?
              Yahoo! Finance - Get real-time stock quotes
              http://finance.yahoo.com
            • ucraimx
              I had a couple of questions that went unanswered a week or so ago and was wondering why the database went that way. I m just asking, NOT complaining! I still
              Message 6 of 9 , Sep 5, 2002
              View Source
              • 0 Attachment
                I had a couple of questions that went unanswered a week or so ago and
                was wondering why the database went that way. I'm just asking, NOT
                complaining! I still think this is the best product around.

                1. Why is there now a HOF/Manager ID tied to those players? Maybe I'm
                missing it but it seems redundant.

                2. Why was the awards table split up? As a system admin, it seems
                that the fewer tables to manage, the better.

                3. There are some new fields (Post season GIDP for example) that are
                blank. Is anyone working on that data?

                4. Opponent batting average is 2 places (ie .25). Where is the third
                digit?


                Mike Crain
              • Sean Forman
                ... This is with an eye towards further expansion. When data like minor leagues and foreign leagues is added, there will likely be a number of new ID s issued
                Message 7 of 9 , Sep 5, 2002
                View Source
                • 0 Attachment
                  ucraimx wrote:
                  > I had a couple of questions that went unanswered a week or so ago and
                  > was wondering why the database went that way. I'm just asking, NOT
                  > complaining! I still think this is the best product around.
                  >
                  > 1. Why is there now a HOF/Manager ID tied to those players? Maybe I'm
                  > missing it but it seems redundant.


                  This is with an eye towards further expansion. When data like minor
                  leagues and foreign leagues is added, there will likely be a number of
                  new ID's issued that will end up being tied to past, current or future
                  major leaguers. For instance, imagine we had all the minor league data
                  from 2001. We had new ID's for all of those players. Then some of
                  those players appear in the majors. Should we change all of their minor
                  league ID's to conform to a new major league ID, or should their new
                  major league ID be the same as the minor league ID, but this could lead
                  to gaps, etc. So we figured the best way to handle it was a new ID for
                  each class of data. Managers, HOF, announcers, scouts, GM's. Then
                  these will all be tied together in the Master table. Then say we find
                  out ten years later that this scout and player are the same guy, we just
                  have to fiddle with the Master table instead of any of the data tables.


                  > 2. Why was the awards table split up? As a system admin, it seems
                  > that the fewer tables to manage, the better.


                  Same reason as above. The separate manager and player ID's mean the
                  awards tables should be separate, so we have player awards and manager
                  awards.


                  > 3. There are some new fields (Post season GIDP for example) that are
                  > blank. Is anyone working on that data?


                  Not that I am aware of. I figured it would be best to list all of the
                  stats in the Batting table in the Postseason table as well. Same with
                  pitching.


                  > 4. Opponent batting average is 2 places (ie .25). Where is the third
                  > digit?


                  This is the most accurate data I have, this will be fixed at some point.



                  > Mike Crain




                  --
                  Sincerely,
                  Sean Forman

                  Baseball Stats! http://www.Baseball-Reference.com/
                  Baseball Analysis! http://www.BaseballPrimer.com/
                • ucraimx
                  Ah, I understand now. Here s another question. My executive table, upmpire table and coaches table will be done by the end of September.If I upload the tables,
                  Message 8 of 9 , Sep 5, 2002
                  View Source
                  • 0 Attachment
                    Ah, I understand now.


                    Here's another question. My executive table, upmpire table and
                    coaches table will be done by the end of September.If I upload the
                    tables, will someone else assign the ID's to the user?
                  • Paul Wendt
                    ... At the moment, I cannot find Sean Forman s contemporary note, maybe a reply to Mike Crain. I do know my question. Do we plan to assign permanent IDs in
                    Message 9 of 9 , Oct 1, 2002
                    View Source
                    • 0 Attachment
                      5 Sep 2002, Mike Crain wrote, "Re: New Database":

                      > Here's another question. My executive table, upmpire table and coaches
                      > table will be done by the end of September. If I upload the tables,
                      > will someone else assign the ID's to the user?

                      At the moment, I cannot find Sean Forman's contemporary note, maybe a
                      reply to Mike Crain. I do know my question. Do we plan to assign
                      permanent IDs in special classes, such as

                      odayha01u (using suffix 'u') = Hank O'Day as an umpire
                      pickeol01m (using suffix 'm') = Ollie Pickering in the minor leagues, or
                      even one particular minor league

                      and plan to maintain current knowledge of identities explicitly?
                      For example, that 'pickeol01m' is the same Ollie Pickering who played in
                      the major leagues would be one record in a table of known identities.

                      --
                      There is no reason to read this sentence if you already know that current
                      knowledge of identities does change.

                      -- P/\/ \/\/t

                      Paul Wendt, Watertown MA, USA <pgw@...>
                      Chair, 19th Century Committee, SABR
                    Your message has been successfully submitted and would be delivered to recipients shortly.