Loading ...
Sorry, an error occurred while loading the content.
 

[XP] Database Refactor: Replace Named Columns with a Table of Generic Columns and Metadata

Expand Messages
  • John Carter
    I _love_ sqlite and even better, firefox and sqlitemanager addon.... Your original.... DROP TABLE IF EXISTS user ; CREATE TABLE user ( id INTEGER PRIMARY
    Message 1 of 4 , Jan 10, 2013
      I _love_ sqlite and even better, firefox and sqlitemanager addon....

      Your original....
      DROP TABLE IF EXISTS "user";
      CREATE TABLE "user" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
      , "username" TEXT NOT NULL , "quest" TEXT, "favourite_color" TEXT,
      "sparrow_velocity" REAL);
      INSERT INTO "user" VALUES(1,'phlip','dbrefactor','pink',NULL);
      INSERT INTO "user" VALUES(2,'john','codesweeping',NULL,4.3);


      Normalised "null" free version...
      DROP TABLE IF EXISTS "favourite_color";
      CREATE TABLE "favourite_color" ("id" INTEGER PRIMARY KEY NOT NULL ,
      "color" TEXT NOT NULL );
      INSERT INTO "favourite_color" VALUES(1,'pink');

      DROP TABLE IF EXISTS "quest";
      CREATE TABLE "quest" ("id" INTEGER PRIMARY KEY NOT NULL , "quest"
      TEXT NOT NULL );
      INSERT INTO "quest" VALUES(1,'dbrefactor');
      INSERT INTO "quest" VALUES(2,'streetsweeping');

      DROP TABLE IF EXISTS "user_normalised";
      CREATE TABLE "user_normalised" ("id" INTEGER PRIMARY KEY
      AUTOINCREMENT NOT NULL , "username" TEXT NOT NULL );
      INSERT INTO "user_normalised" VALUES(1,'phlip');
      INSERT INTO "user_normalised" VALUES(2,'john');

      DROP TABLE IF EXISTS "users_sparrow";
      CREATE TABLE "users_sparrow" ("id" INTEGER PRIMARY KEY NOT NULL ,
      "velocity" REAL NOT NULL );
      INSERT INTO "users_sparrow" VALUES(2,4.3);


      A view to serve up _exactly_ what you had initially.
      CREATE VIEW "user_view" AS select
      id,username,quest,favourite_color,sparrow_velocity from (select * from
      (select id,username,color as favourite_color from user_normalised
      left outer join favourite_color using (id)) natural join (select
      id,velocity as sparrow_velocity from user_normalised left outer join
      users_sparrow using (id)) left outer join quest using (id));


      On Mon, Jan 7, 2013 at 5:16 AM, Phlip <phlip2005@...> wrote:
      >
      > I'm looking for a write-up for this refactor. If there ain't none,
      > then this is the write-up.
      >
      > Here's the input table:
      >
      > create table user
      > id <<pk>>
      > username string
      > quest string
      > favorite_color string
      > sparrow_velocity string
      >
      > The lower three fields are optional. A very rough explanation of
      > "database normalization" says that optional fields should not be NULL
      > (or, in the case of strings, blank). They should instead reside in a
      > leaf table - a table whose foreign key points back to a main table.
      > That table stores a NULL (or blank) by leaving its row out of the
      > table.
      >
      > This is the only concept of "database normalization" that I
      > understand, but it has not failed me yet. We start by declaring a
      > table in our programming language, outside our database, to name these
      > strings:
      >
      > USER_PROFILE_METADATA = [
      > [ 1, 'quest' ]
      > [ 2, 'favorite color' ]
      > [ 3, 'sparrow velocity' ]
      > ]
      >
      > (Note that a truly pernicious refactor, that kept _everything_ in the
      > database, would make that into a table, too. That would only improve
      > our design if we intend to allow users to add profile types on the
      > fly. Without that requirement, the design would just be the same as
      > keeping the metadata in our program.)
      >
      > [Skipping over the "transition period" version], the resulting tables are:
      >
      > create table user
      > id <<pk>>
      > username string
      >
      > create table user_profile
      > id <<pk>>
      > user_id <<fk>>
      > metadata_id
      > value string
      >
      > That pattern allows us to replace this View code...
      >
      > <li>What is your quest?</strong></li>
      > <li><strong><%= @... %></strong></li>
      >
      > <li>What is your favorite color?</strong></li>
      > <li><strong><%= @..._color %></strong></li>
      >
      > <li>What is your sparrow velocity?</li>
      > <li><strong><%= @..._velocity %></strong></li>
      >
      > ...with this:
      >
      > <% USER_PROFILE_METADATA.each do |metadata_id, concept| %>
      > <li>What is your <%= concept %>?</li>
      > <li></strong>
      > <%=
      > profile = @..._profile.find_by_metadata_id(metadata_id)
      > profile ? profile.value : '' # TODO move all this inside
      > the User class
      > %>
      > </strong></li>
      > <% end %>
      >
      > The View code is now DRY.
      >
      > Note that we could also have left the table alone, and used
      > USER_PROFILE_METADATA to pull fields out of a single table by their
      > column names. This refactor's value is, among other things, we can add
      > new profile types without further database surgery.
      >
      > --
      > Phlip
      > http://c2.com/cgi/wiki?ZeekLand
      >
      >
      > ------------------------------------
      >
      > To Post a message, send it to: extremeprogramming@...
      >
      > To Unsubscribe, send a blank message to: extremeprogramming-unsubscribe@...
      >
      > ad-free courtesy of objectmentor.comYahoo! Groups Links
      >
      >
      >



      --
      John Carter Phone : (64)(3) 358 6639
      Tait Electronics Fax : (64)(3) 359 4632
      PO Box 1645 Christchurch Email : john.carter@...
      New Zealand



      --
      John Carter Phone : (64)(3) 358 6639
      Tait Electronics Fax : (64)(3) 359 4632
      PO Box 1645 Christchurch Email : john.carter@...
      New Zealand

      --

      ------------------------------
      This email, including any attachments, is only for the intended recipient.
      It is subject to copyright, is confidential and may be the subject of legal
      or other privilege, none of which is waived or lost by reason of this
      transmission.
      If you are not an intended recipient, you may not use, disseminate,
      distribute or reproduce such email, any attachments, or any part thereof.
      If you have received a message in error, please notify the sender
      immediately and erase all copies of the message and any attachments.
      Unfortunately, we cannot warrant that the email has not been altered or
      corrupted during transmission nor can we guarantee that any email or any
      attachments are free from computer viruses or other conditions which may
      damage or interfere with recipient data, hardware or software. The
      recipient relies upon its own procedures and assumes all risk of use and of
      opening any attachments.
      ------------------------------
    • Francis Fish
      You guys are confusing me. Normalisation, as I understand it, is reducing things to third normal form. i.e. http://en.wikipedia.org/wiki/Third_normal_form You
      Message 2 of 4 , Jan 11, 2013
        You guys are confusing me.

        Normalisation, as I understand it, is reducing things to third normal form.
        i.e. http://en.wikipedia.org/wiki/Third_normal_form

        You just seem to be eschewing nulls, which is laudable, and adding a *lot*
        of complexity, which is not. Chris Date talks a lot about avoiding nulls
        wherever you can because you end up with 3 value logic and that way lies
        madness.

        I have seen the user/user_profile split done for pragmatic reasons of not
        wanting to store a lot of null data in a table that might end up largely
        empty, depending on the usage of the data. But it's a pragmatic phyisical
        implementation from the days when we used to have logical and physical
        models of the database.

        In the Ruby example you are violating the "fat model, thin controller"
        principle by putting the find_by logic in the view where it does not
        belong, pushing this into the view just means you end up with stuff that's
        brittle and very hard to test. Moving this into named methods on the model
        class is what you should be doing, because you can do things like catch
        exceptions (say a relationship is empty but you want to display a
        meaningful default, it would be a nightmare to ram this in the view), and,
        holy of holies, test it really easily. To test the view you'd have go into
        a stubbing frenzy and it would be incredibly brittle.

        ActiveRecord also has a nice built in delegation system you can use if you
        must go for this data model and keep it out of the view.

        There is also another separate discussion about how the now pervasive habit
        of using synthetic id's as keys everywhere is a *bad* idea instead of the
        true keys and other data that goes with them, but most ORM's aren't built
        to do this so it's hard to do it right. But not doing it means it's really
        easy to end up with orphaned or duplicate data unless you put in unique
        indexes on the true keys and express the true relationships by defining
        them in the underlying RDBMS.

        I use AR because it's less work and everybody uses it, but it makes the
        database nut in me cringe.

        Thanks and regards,

        Francis

        Follow me on twitter https://twitter.com/fjfish
        Blog at http://www.francisfish.com
        Books at https://leanpub.com/u/fjfish
        CV http://www.pharmarketeer.com/francis.html

        I have no intention of apologizing for believing in people, for insisting
        that we all use this moment and these assets to create some art and improve
        the world around us.
        To do anything less than that is a crime. - Seth Godin


        On Fri, Jan 11, 2013 at 5:04 AM, John Carter <john.carter@...> wrote:

        > **
        >
        >
        > I _love_ sqlite and even better, firefox and sqlitemanager addon....
        >
        > Your original....
        > DROP TABLE IF EXISTS "user";
        > CREATE TABLE "user" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
        > , "username" TEXT NOT NULL , "quest" TEXT, "favourite_color" TEXT,
        > "sparrow_velocity" REAL);
        > INSERT INTO "user" VALUES(1,'phlip','dbrefactor','pink',NULL);
        > INSERT INTO "user" VALUES(2,'john','codesweeping',NULL,4.3);
        >
        > Normalised "null" free version...
        > DROP TABLE IF EXISTS "favourite_color";
        > CREATE TABLE "favourite_color" ("id" INTEGER PRIMARY KEY NOT NULL ,
        > "color" TEXT NOT NULL );
        > INSERT INTO "favourite_color" VALUES(1,'pink');
        >
        > DROP TABLE IF EXISTS "quest";
        > CREATE TABLE "quest" ("id" INTEGER PRIMARY KEY NOT NULL , "quest"
        > TEXT NOT NULL );
        > INSERT INTO "quest" VALUES(1,'dbrefactor');
        > INSERT INTO "quest" VALUES(2,'streetsweeping');
        >
        > DROP TABLE IF EXISTS "user_normalised";
        > CREATE TABLE "user_normalised" ("id" INTEGER PRIMARY KEY
        > AUTOINCREMENT NOT NULL , "username" TEXT NOT NULL );
        > INSERT INTO "user_normalised" VALUES(1,'phlip');
        > INSERT INTO "user_normalised" VALUES(2,'john');
        >
        > DROP TABLE IF EXISTS "users_sparrow";
        > CREATE TABLE "users_sparrow" ("id" INTEGER PRIMARY KEY NOT NULL ,
        > "velocity" REAL NOT NULL );
        > INSERT INTO "users_sparrow" VALUES(2,4.3);
        >
        > A view to serve up _exactly_ what you had initially.
        > CREATE VIEW "user_view" AS select
        > id,username,quest,favourite_color,sparrow_velocity from (select * from
        > (select id,username,color as favourite_color from user_normalised
        > left outer join favourite_color using (id)) natural join (select
        > id,velocity as sparrow_velocity from user_normalised left outer join
        > users_sparrow using (id)) left outer join quest using (id));
        >
        >
        > On Mon, Jan 7, 2013 at 5:16 AM, Phlip phlip2005@...> wrote:
        > >
        > > I'm looking for a write-up for this refactor. If there ain't none,
        > > then this is the write-up.
        > >
        > > Here's the input table:
        > >
        > > create table user
        > > id < >
        > > username string
        > > quest string
        > > favorite_color string
        > > sparrow_velocity string
        > >
        > > The lower three fields are optional. A very rough explanation of
        > > "database normalization" says that optional fields should not be NULL
        > > (or, in the case of strings, blank). They should instead reside in a
        > > leaf table - a table whose foreign key points back to a main table.
        > > That table stores a NULL (or blank) by leaving its row out of the
        > > table.
        > >
        > > This is the only concept of "database normalization" that I
        > > understand, but it has not failed me yet. We start by declaring a
        > > table in our programming language, outside our database, to name these
        > > strings:
        > >
        > > USER_PROFILE_METADATA = [
        > > [ 1, 'quest' ]
        > > [ 2, 'favorite color' ]
        > > [ 3, 'sparrow velocity' ]
        > > ]
        > >
        > > (Note that a truly pernicious refactor, that kept _everything_ in the
        > > database, would make that into a table, too. That would only improve
        > > our design if we intend to allow users to add profile types on the
        > > fly. Without that requirement, the design would just be the same as
        > > keeping the metadata in our program.)
        > >
        > > [Skipping over the "transition period" version], the resulting tables
        > are:
        > >
        > > create table user
        > > id < >
        > > username string
        > >
        > > create table user_profile
        > > id < >
        > > user_id < >
        > > metadata_id
        > > value string
        > >
        > > That pattern allows us to replace this View code...
        > >
        > > What is your quest?
        > > *<%= @... %>*
        > >
        > > What is your favorite color?
        > > *<%= @..._color %>*
        > >
        > > What is your sparrow velocity?
        > > *<%= @..._velocity %>*
        > >
        > > ...with this:
        > >
        > > <% USER_PROFILE_METADATA.each do |metadata_id, concept| %>
        > > What is your <%= concept %>?
        > >
        > > <%=
        > > profile = @..._profile.find_by_metadata_id(metadata_id)
        > > profile ? profile.value : '' # TODO move all this inside
        > > the User class
        > > %>
        > >
        > > <% end %>
        > >
        > > The View code is now DRY.
        > >
        > > Note that we could also have left the table alone, and used
        > > USER_PROFILE_METADATA to pull fields out of a single table by their
        > > column names. This refactor's value is, among other things, we can add
        > > new profile types without further database surgery.
        > >
        > > --
        > > Phlip
        > > http://c2.com/cgi/wiki?ZeekLand
        > >
        > >
        > > ------------------------------------
        >
        > >
        > > To Post a message, send it to: extremeprogramming@...
        > >
        > > To Unsubscribe, send a blank message to:
        > extremeprogramming-unsubscribe@...
        > >
        > > ad-free courtesy of objectmentor.comYahoo! Groups Links
        > >
        > >
        > >
        >
        > --
        > John Carter Phone : (64)(3) 358 6639
        > Tait Electronics Fax : (64)(3) 359 4632
        > PO Box 1645 Christchurch Email : john.carter@...
        > New Zealand
        >
        > --
        > John Carter Phone : (64)(3) 358 6639
        > Tait Electronics Fax : (64)(3) 359 4632
        > PO Box 1645 Christchurch Email : john.carter@...
        > New Zealand
        >
        > --
        >
        > ------------------------------
        > This email, including any attachments, is only for the intended recipient.
        > It is subject to copyright, is confidential and may be the subject of
        > legal
        > or other privilege, none of which is waived or lost by reason of this
        > transmission.
        > If you are not an intended recipient, you may not use, disseminate,
        > distribute or reproduce such email, any attachments, or any part thereof.
        > If you have received a message in error, please notify the sender
        > immediately and erase all copies of the message and any attachments.
        > Unfortunately, we cannot warrant that the email has not been altered or
        > corrupted during transmission nor can we guarantee that any email or any
        > attachments are free from computer viruses or other conditions which may
        > damage or interfere with recipient data, hardware or software. The
        > recipient relies upon its own procedures and assumes all risk of use and
        > of
        > opening any attachments.
        > ------------------------------
        >
        >
        >


        [Non-text portions of this message have been removed]
      • Phlip
        ... Then call what I did a potentially good thing to do to databases . I should not have used the term normalization . ... If, for example, the values and
        Message 3 of 4 , Jan 11, 2013
          On Fri, Jan 11, 2013 at 4:59 AM, Francis Fish <francis@...> wrote:

          > Normalisation, as I understand it, is reducing things to third normal form.
          > i.e. http://en.wikipedia.org/wiki/Third_normal_form

          Then call what I did a "potentially good thing to do to databases". I
          should not have used the term "normalization".

          > You just seem to be eschewing nulls, which is laudable, and adding a *lot*
          > of complexity, which is not. Chris Date talks a lot about avoiding nulls
          > wherever you can because you end up with 3 value logic and that way lies
          > madness.

          If, for example, the values and labels, such as 'swallow_velocity',
          had each gone into their own separate table, then the result would be
          _more_ complex. Even if one then hid the complexity in a view.

          The root of my refactor is "roll up an unrolled loop." Like I said, we
          could also have rolled it up by making a list of fields, instead of
          records, to loop through.

          > In the Ruby example you are violating the "fat model, thin controller"
          > principle by putting the find_by logic in the view where it does not
          > belong, pushing this into the view just means you end up with stuff that's
          > brittle and very hard to test. Moving this into named methods on the model
          > class is what you should be doing, because you can do things like catch
          > exceptions (say a relationship is empty but you want to display a
          > meaningful default, it would be a nightmare to ram this in the view), and,
          > holy of holies, test it really easily.

          I said in a TODO comment to move the excess code to the model.

          > To test the view you'd have go into
          > a stubbing frenzy and it would be incredibly brittle.

          I wouldn't have a problem with testing that through the view, but of
          course the model comes first.

          > ActiveRecord also has a nice built in delegation system you can use if you
          > must go for this data model and keep it out of the view.

          And I was illustrating the actual refactor, not ActiveRecord. Yes it
          could wrap the leaf table.

          > There is also another separate discussion about how the now pervasive habit
          > of using synthetic id's as keys everywhere is a *bad* idea instead of the
          > true keys and other data that goes with them, but most ORM's aren't built
          > to do this so it's hard to do it right. But not doing it means it's really
          > easy to end up with orphaned or duplicate data unless you put in unique
          > indexes on the true keys and express the true relationships by defining
          > them in the underlying RDBMS.

          And I pointed out somewhere the metadata table could have been a real table.

          > I use AR because it's less work and everybody uses it, but it makes the
          > database nut in me cringe.

          Ah, that explains it.

          John Carter wrote:

          >> Normalised "null" free version...
          >> DROP TABLE IF EXISTS "favourite_color";
          >> CREATE TABLE "favourite_color" ("id" INTEGER PRIMARY KEY NOT NULL ,
          >> "color" TEXT NOT NULL );
          >> INSERT INTO "favourite_color" VALUES(1,'pink');

          Wat?

          oookay, if we are still on the "null rule", maybe a better rule is
          "sometimes you should replace several fields that are often all null,
          together, with a leaf table, whose entire record might not be there."

          >> A view to serve up _exactly_ what you had initially.
          >> CREATE VIEW "user_view" AS select

          Thanks but the idea was simplifying the code that non-database-nuts
          care about; the model & view.

          --
          Phlip
          http://c2.com/cgi/wiki?ZeekLand
        Your message has been successfully submitted and would be delivered to recipients shortly.