Loading ...
Sorry, an error occurred while loading the content.

Re: SCHEMA - Oversized MySQL fields?

Expand Messages
  • wyerscj
    ... some of the ... MySQL ... fields are ... range of ... int(4), ... as yearID ... Similarly, ... of ... season (or ... of 0 - ... playing ... been able ...
    Message 1 of 5 , Oct 18, 2008
    • 0 Attachment
      --- In baseball-databank@yahoogroups.com, "Randy Fiato"
      <sysadmin@...> wrote:
      >
      > I realize that this is a relatively minor detail in relation to
      some of the
      > other things being planned here, but I've noticed that the current
      MySQL
      > schema has several fields that are oversized. That is to say, the
      fields are
      > represented as an integer that is far too large for the possible
      range of
      > values.
      >
      > For example, in the Master table, birthYear and deathYear are both
      int(4),
      > which in MySQL is a 4-byte integer. smallint(4) unsigned (the same
      as yearID
      > in the other tables) would be better, as this uses only 2 bytes.
      Similarly,
      > the bith/death days and months could be tinyint(2) unsigned instead
      of
      > int(2).
      >
      > Another example that occurs in several tables is that games in a
      season (or
      > wins, losses, etc.) can be described as tinyint(3) unsigned (range
      of 0 -
      > 255) instead of smallint(3). I don't think MLB is going to start
      playing
      > 250-game seasons any time soon. :-)
      >
      > I've made these changes, among several others, to my copy and have
      been able
      > to shrink the size of the database by at least a couple of
      megabytes (not
      > sure of the exact amount).
      >
      > --
      > Randy Fiato
      > System Administrator, Big League Forums
      <http://www.bigleagueforums.net/>

      And to think this is the source of most of my problems. (Well, no,
      but I'm trying to add dramatic flair.)

      The problem I'm running into is that, to use these values for a lot
      of purposes I have to recast them as signed variables. I really
      couldn't find any other discussion on these issues - would it be
      really inconvenient to change the schema to not use unsigned
      variables?

      --CW
    • Paul DuBois
      ... If the nature of the data is that it is unsigned, it should be stored in an unsigned column. If the nature of your application is that you use the values
      Message 2 of 5 , Oct 19, 2008
      • 0 Attachment
        On Oct 18, 2008, at 10:56 PM, wyerscj wrote:

        > --- In baseball-databank@yahoogroups.com, "Randy Fiato"
        > <sysadmin@...> wrote:
        >>
        >> I realize that this is a relatively minor detail in relation to
        > some of the
        >> other things being planned here, but I've noticed that the current
        > MySQL
        >> schema has several fields that are oversized. That is to say, the
        > fields are
        >> represented as an integer that is far too large for the possible
        > range of
        >> values.
        >>
        >> For example, in the Master table, birthYear and deathYear are both
        > int(4),
        >> which in MySQL is a 4-byte integer. smallint(4) unsigned (the same
        > as yearID
        >> in the other tables) would be better, as this uses only 2 bytes.
        > Similarly,
        >> the bith/death days and months could be tinyint(2) unsigned instead
        > of
        >> int(2).
        >>
        >> Another example that occurs in several tables is that games in a
        > season (or
        >> wins, losses, etc.) can be described as tinyint(3) unsigned (range
        > of 0 -
        >> 255) instead of smallint(3). I don't think MLB is going to start
        > playing
        >> 250-game seasons any time soon. :-)
        >>
        >> I've made these changes, among several others, to my copy and have
        > been able
        >> to shrink the size of the database by at least a couple of
        > megabytes (not
        >> sure of the exact amount).
        >>
        >> --
        >> Randy Fiato
        >> System Administrator, Big League Forums
        > <http://www.bigleagueforums.net/>
        >
        > And to think this is the source of most of my problems. (Well, no,
        > but I'm trying to add dramatic flair.)
        >
        > The problem I'm running into is that, to use these values for a lot
        > of purposes I have to recast them as signed variables. I really
        > couldn't find any other discussion on these issues - would it be
        > really inconvenient to change the schema to not use unsigned
        > variables?

        If the nature of the data is that it is unsigned, it should be stored
        in an unsigned column. If the nature of your application is that you
        use the values in a signed way, *your application* should make the
        necessary adjustments.

        Otherwise, why should the next person not come along and say, my
        purposes require the values to be floats, can the schema be changed to
        make them floats?
      • James
        ... Anything in baseball that could possibly be a negative number?
        Message 3 of 5 , Oct 19, 2008
        • 0 Attachment
          --- In baseball-databank@yahoogroups.com, Paul DuBois <paul@...> wrote:
          >
          >
          > On Oct 18, 2008, at 10:56 PM, wyerscj wrote:
          >
          > > --- In baseball-databank@yahoogroups.com, "Randy Fiato"
          > > <sysadmin@> wrote:
          > >>
          > >> I realize that this is a relatively minor detail in relation to
          > > some of the
          > >> other things being planned here, but I've noticed that the current
          > > MySQL
          > >> schema has several fields that are oversized. That is to say, the
          > > fields are
          > >> represented as an integer that is far too large for the possible
          > > range of
          > >> values.
          > >>
          > >> For example, in the Master table, birthYear and deathYear are both
          > > int(4),
          > >> which in MySQL is a 4-byte integer. smallint(4) unsigned (the same
          > > as yearID
          > >> in the other tables) would be better, as this uses only 2 bytes.
          > > Similarly,
          > >> the bith/death days and months could be tinyint(2) unsigned instead
          > > of
          > >> int(2).
          > >>
          > >> Another example that occurs in several tables is that games in a
          > > season (or
          > >> wins, losses, etc.) can be described as tinyint(3) unsigned (range
          > > of 0 -
          > >> 255) instead of smallint(3). I don't think MLB is going to start
          > > playing
          > >> 250-game seasons any time soon. :-)
          > >>
          > >> I've made these changes, among several others, to my copy and have
          > > been able
          > >> to shrink the size of the database by at least a couple of
          > > megabytes (not
          > >> sure of the exact amount).
          > >>
          > >> --
          > >> Randy Fiato
          > >> System Administrator, Big League Forums
          > > <http://www.bigleagueforums.net/>
          > >
          > > And to think this is the source of most of my problems. (Well, no,
          > > but I'm trying to add dramatic flair.)
          > >
          > > The problem I'm running into is that, to use these values for a lot
          > > of purposes I have to recast them as signed variables. I really
          > > couldn't find any other discussion on these issues - would it be
          > > really inconvenient to change the schema to not use unsigned
          > > variables?
          >
          > If the nature of the data is that it is unsigned, it should be stored
          > in an unsigned column. If the nature of your application is that you
          > use the values in a signed way, *your application* should make the
          > necessary adjustments.
          >
          > Otherwise, why should the next person not come along and say, my
          > purposes require the values to be floats, can the schema be changed to
          > make them floats?
          >


          Anything in baseball that could possibly be a negative number?
        • Tangotiger
          Regarding the tinyint and int and the like issue: I disagree. I ll give you two good reasons: 1. An 8 GB flash drive costs $29. A 500 GB backup drive costs
          Message 4 of 5 , Oct 21, 2008
          • 0 Attachment
            Regarding the tinyint and int and the like issue: I disagree. I'll give
            you two good reasons:

            1. An 8 GB flash drive costs $29. A 500 GB backup drive costs 75$.
            Saving space should be the least of our concerns. Indeed, I'd prefer
            creating a data warehouse, which would mean heaving denormalized data.

            2. If you do sum(wins) on a tinyint, some DBMS will keep the data type as
            tinyint. You will get into overflow situations.

            Whatever performance issues you can possibly gain pales in comparison to
            potentially corrupt output.

            Tom
          Your message has been successfully submitted and would be delivered to recipients shortly.