Loading ...
Sorry, an error occurred while loading the content.

SCHEMA - Oversized MySQL fields?

Expand Messages
  • Randy Fiato
    I realize that this is a relatively minor detail in relation to some of the other things being planned here, but I ve noticed that the current MySQL schema has
    Message 1 of 5 , Dec 31, 2002
    • 0 Attachment
      I realize that this is a relatively minor detail in relation to some of the
      other things being planned here, but I've noticed that the current MySQL
      schema has several fields that are oversized. That is to say, the fields are
      represented as an integer that is far too large for the possible range of
      values.

      For example, in the Master table, birthYear and deathYear are both int(4),
      which in MySQL is a 4-byte integer. smallint(4) unsigned (the same as yearID
      in the other tables) would be better, as this uses only 2 bytes. Similarly,
      the bith/death days and months could be tinyint(2) unsigned instead of
      int(2).

      Another example that occurs in several tables is that games in a season (or
      wins, losses, etc.) can be described as tinyint(3) unsigned (range of 0 -
      255) instead of smallint(3). I don't think MLB is going to start playing
      250-game seasons any time soon. :-)

      I've made these changes, among several others, to my copy and have been able
      to shrink the size of the database by at least a couple of megabytes (not
      sure of the exact amount).

      --
      Randy Fiato
      System Administrator, Big League Forums <http://www.bigleagueforums.net/>
    • wyerscj
      ... some of the ... MySQL ... fields are ... range of ... int(4), ... as yearID ... Similarly, ... of ... season (or ... of 0 - ... playing ... been able ...
      Message 2 of 5 , Oct 18, 2008
      • 0 Attachment
        --- In baseball-databank@yahoogroups.com, "Randy Fiato"
        <sysadmin@...> wrote:
        >
        > I realize that this is a relatively minor detail in relation to
        some of the
        > other things being planned here, but I've noticed that the current
        MySQL
        > schema has several fields that are oversized. That is to say, the
        fields are
        > represented as an integer that is far too large for the possible
        range of
        > values.
        >
        > For example, in the Master table, birthYear and deathYear are both
        int(4),
        > which in MySQL is a 4-byte integer. smallint(4) unsigned (the same
        as yearID
        > in the other tables) would be better, as this uses only 2 bytes.
        Similarly,
        > the bith/death days and months could be tinyint(2) unsigned instead
        of
        > int(2).
        >
        > Another example that occurs in several tables is that games in a
        season (or
        > wins, losses, etc.) can be described as tinyint(3) unsigned (range
        of 0 -
        > 255) instead of smallint(3). I don't think MLB is going to start
        playing
        > 250-game seasons any time soon. :-)
        >
        > I've made these changes, among several others, to my copy and have
        been able
        > to shrink the size of the database by at least a couple of
        megabytes (not
        > sure of the exact amount).
        >
        > --
        > Randy Fiato
        > System Administrator, Big League Forums
        <http://www.bigleagueforums.net/>

        And to think this is the source of most of my problems. (Well, no,
        but I'm trying to add dramatic flair.)

        The problem I'm running into is that, to use these values for a lot
        of purposes I have to recast them as signed variables. I really
        couldn't find any other discussion on these issues - would it be
        really inconvenient to change the schema to not use unsigned
        variables?

        --CW
      • Paul DuBois
        ... If the nature of the data is that it is unsigned, it should be stored in an unsigned column. If the nature of your application is that you use the values
        Message 3 of 5 , Oct 19, 2008
        • 0 Attachment
          On Oct 18, 2008, at 10:56 PM, wyerscj wrote:

          > --- In baseball-databank@yahoogroups.com, "Randy Fiato"
          > <sysadmin@...> wrote:
          >>
          >> I realize that this is a relatively minor detail in relation to
          > some of the
          >> other things being planned here, but I've noticed that the current
          > MySQL
          >> schema has several fields that are oversized. That is to say, the
          > fields are
          >> represented as an integer that is far too large for the possible
          > range of
          >> values.
          >>
          >> For example, in the Master table, birthYear and deathYear are both
          > int(4),
          >> which in MySQL is a 4-byte integer. smallint(4) unsigned (the same
          > as yearID
          >> in the other tables) would be better, as this uses only 2 bytes.
          > Similarly,
          >> the bith/death days and months could be tinyint(2) unsigned instead
          > of
          >> int(2).
          >>
          >> Another example that occurs in several tables is that games in a
          > season (or
          >> wins, losses, etc.) can be described as tinyint(3) unsigned (range
          > of 0 -
          >> 255) instead of smallint(3). I don't think MLB is going to start
          > playing
          >> 250-game seasons any time soon. :-)
          >>
          >> I've made these changes, among several others, to my copy and have
          > been able
          >> to shrink the size of the database by at least a couple of
          > megabytes (not
          >> sure of the exact amount).
          >>
          >> --
          >> Randy Fiato
          >> System Administrator, Big League Forums
          > <http://www.bigleagueforums.net/>
          >
          > And to think this is the source of most of my problems. (Well, no,
          > but I'm trying to add dramatic flair.)
          >
          > The problem I'm running into is that, to use these values for a lot
          > of purposes I have to recast them as signed variables. I really
          > couldn't find any other discussion on these issues - would it be
          > really inconvenient to change the schema to not use unsigned
          > variables?

          If the nature of the data is that it is unsigned, it should be stored
          in an unsigned column. If the nature of your application is that you
          use the values in a signed way, *your application* should make the
          necessary adjustments.

          Otherwise, why should the next person not come along and say, my
          purposes require the values to be floats, can the schema be changed to
          make them floats?
        • James
          ... Anything in baseball that could possibly be a negative number?
          Message 4 of 5 , Oct 19, 2008
          • 0 Attachment
            --- In baseball-databank@yahoogroups.com, Paul DuBois <paul@...> wrote:
            >
            >
            > On Oct 18, 2008, at 10:56 PM, wyerscj wrote:
            >
            > > --- In baseball-databank@yahoogroups.com, "Randy Fiato"
            > > <sysadmin@> wrote:
            > >>
            > >> I realize that this is a relatively minor detail in relation to
            > > some of the
            > >> other things being planned here, but I've noticed that the current
            > > MySQL
            > >> schema has several fields that are oversized. That is to say, the
            > > fields are
            > >> represented as an integer that is far too large for the possible
            > > range of
            > >> values.
            > >>
            > >> For example, in the Master table, birthYear and deathYear are both
            > > int(4),
            > >> which in MySQL is a 4-byte integer. smallint(4) unsigned (the same
            > > as yearID
            > >> in the other tables) would be better, as this uses only 2 bytes.
            > > Similarly,
            > >> the bith/death days and months could be tinyint(2) unsigned instead
            > > of
            > >> int(2).
            > >>
            > >> Another example that occurs in several tables is that games in a
            > > season (or
            > >> wins, losses, etc.) can be described as tinyint(3) unsigned (range
            > > of 0 -
            > >> 255) instead of smallint(3). I don't think MLB is going to start
            > > playing
            > >> 250-game seasons any time soon. :-)
            > >>
            > >> I've made these changes, among several others, to my copy and have
            > > been able
            > >> to shrink the size of the database by at least a couple of
            > > megabytes (not
            > >> sure of the exact amount).
            > >>
            > >> --
            > >> Randy Fiato
            > >> System Administrator, Big League Forums
            > > <http://www.bigleagueforums.net/>
            > >
            > > And to think this is the source of most of my problems. (Well, no,
            > > but I'm trying to add dramatic flair.)
            > >
            > > The problem I'm running into is that, to use these values for a lot
            > > of purposes I have to recast them as signed variables. I really
            > > couldn't find any other discussion on these issues - would it be
            > > really inconvenient to change the schema to not use unsigned
            > > variables?
            >
            > If the nature of the data is that it is unsigned, it should be stored
            > in an unsigned column. If the nature of your application is that you
            > use the values in a signed way, *your application* should make the
            > necessary adjustments.
            >
            > Otherwise, why should the next person not come along and say, my
            > purposes require the values to be floats, can the schema be changed to
            > make them floats?
            >


            Anything in baseball that could possibly be a negative number?
          • Tangotiger
            Regarding the tinyint and int and the like issue: I disagree. I ll give you two good reasons: 1. An 8 GB flash drive costs $29. A 500 GB backup drive costs
            Message 5 of 5 , Oct 21, 2008
            • 0 Attachment
              Regarding the tinyint and int and the like issue: I disagree. I'll give
              you two good reasons:

              1. An 8 GB flash drive costs $29. A 500 GB backup drive costs 75$.
              Saving space should be the least of our concerns. Indeed, I'd prefer
              creating a data warehouse, which would mean heaving denormalized data.

              2. If you do sum(wins) on a tinyint, some DBMS will keep the data type as
              tinyint. You will get into overflow situations.

              Whatever performance issues you can possibly gain pales in comparison to
              potentially corrupt output.

              Tom
            Your message has been successfully submitted and would be delivered to recipients shortly.