Loading ...
Sorry, an error occurred while loading the content.

Database Refactor: Replace Named Columns with a Table of Generic Columns and Metadata

Expand Messages
  • Phlip
    I m looking for a write-up for this refactor. If there ain t none, then this is the write-up. Here s the input table: create table user id username
    Message 1 of 4 , Jan 6, 2013
    • 0 Attachment
      I'm looking for a write-up for this refactor. If there ain't none,
      then this is the write-up.

      Here's the input table:

      create table user
      id <<pk>>
      username string
      quest string
      favorite_color string
      sparrow_velocity string

      The lower three fields are optional. A very rough explanation of
      "database normalization" says that optional fields should not be NULL
      (or, in the case of strings, blank). They should instead reside in a
      leaf table - a table whose foreign key points back to a main table.
      That table stores a NULL (or blank) by leaving its row out of the
      table.

      This is the only concept of "database normalization" that I
      understand, but it has not failed me yet. We start by declaring a
      table in our programming language, outside our database, to name these
      strings:

      USER_PROFILE_METADATA = [
      [ 1, 'quest' ]
      [ 2, 'favorite color' ]
      [ 3, 'sparrow velocity' ]
      ]

      (Note that a truly pernicious refactor, that kept _everything_ in the
      database, would make that into a table, too. That would only improve
      our design if we intend to allow users to add profile types on the
      fly. Without that requirement, the design would just be the same as
      keeping the metadata in our program.)

      [Skipping over the "transition period" version], the resulting tables are:

      create table user
      id <<pk>>
      username string

      create table user_profile
      id <<pk>>
      user_id <<fk>>
      metadata_id
      value string

      That pattern allows us to replace this View code...

      <li>What is your quest?</strong></li>
      <li><strong><%= @... %></strong></li>

      <li>What is your favorite color?</strong></li>
      <li><strong><%= @..._color %></strong></li>

      <li>What is your sparrow velocity?</li>
      <li><strong><%= @..._velocity %></strong></li>

      ...with this:

      <% USER_PROFILE_METADATA.each do |metadata_id, concept| %>
      <li>What is your <%= concept %>?</li>
      <li></strong>
      <%=
      profile = @..._profile.find_by_metadata_id(metadata_id)
      profile ? profile.value : '' # TODO move all this inside
      the User class
      %>
      </strong></li>
      <% end %>

      The View code is now DRY.

      Note that we could also have left the table alone, and used
      USER_PROFILE_METADATA to pull fields out of a single table by their
      column names. This refactor's value is, among other things, we can add
      new profile types without further database surgery.

      --
      Phlip
      http://c2.com/cgi/wiki?ZeekLand
    • John Carter
      I _love_ sqlite and even better, firefox and sqlitemanager addon.... Your original.... DROP TABLE IF EXISTS user ; CREATE TABLE user ( id INTEGER PRIMARY
      Message 2 of 4 , Jan 10, 2013
      • 0 Attachment
        I _love_ sqlite and even better, firefox and sqlitemanager addon....

        Your original....
        DROP TABLE IF EXISTS "user";
        CREATE TABLE "user" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
        , "username" TEXT NOT NULL , "quest" TEXT, "favourite_color" TEXT,
        "sparrow_velocity" REAL);
        INSERT INTO "user" VALUES(1,'phlip','dbrefactor','pink',NULL);
        INSERT INTO "user" VALUES(2,'john','codesweeping',NULL,4.3);


        Normalised "null" free version...
        DROP TABLE IF EXISTS "favourite_color";
        CREATE TABLE "favourite_color" ("id" INTEGER PRIMARY KEY NOT NULL ,
        "color" TEXT NOT NULL );
        INSERT INTO "favourite_color" VALUES(1,'pink');

        DROP TABLE IF EXISTS "quest";
        CREATE TABLE "quest" ("id" INTEGER PRIMARY KEY NOT NULL , "quest"
        TEXT NOT NULL );
        INSERT INTO "quest" VALUES(1,'dbrefactor');
        INSERT INTO "quest" VALUES(2,'streetsweeping');

        DROP TABLE IF EXISTS "user_normalised";
        CREATE TABLE "user_normalised" ("id" INTEGER PRIMARY KEY
        AUTOINCREMENT NOT NULL , "username" TEXT NOT NULL );
        INSERT INTO "user_normalised" VALUES(1,'phlip');
        INSERT INTO "user_normalised" VALUES(2,'john');

        DROP TABLE IF EXISTS "users_sparrow";
        CREATE TABLE "users_sparrow" ("id" INTEGER PRIMARY KEY NOT NULL ,
        "velocity" REAL NOT NULL );
        INSERT INTO "users_sparrow" VALUES(2,4.3);


        A view to serve up _exactly_ what you had initially.
        CREATE VIEW "user_view" AS select
        id,username,quest,favourite_color,sparrow_velocity from (select * from
        (select id,username,color as favourite_color from user_normalised
        left outer join favourite_color using (id)) natural join (select
        id,velocity as sparrow_velocity from user_normalised left outer join
        users_sparrow using (id)) left outer join quest using (id));


        On Mon, Jan 7, 2013 at 5:16 AM, Phlip <phlip2005@...> wrote:
        >
        > I'm looking for a write-up for this refactor. If there ain't none,
        > then this is the write-up.
        >
        > Here's the input table:
        >
        > create table user
        > id <<pk>>
        > username string
        > quest string
        > favorite_color string
        > sparrow_velocity string
        >
        > The lower three fields are optional. A very rough explanation of
        > "database normalization" says that optional fields should not be NULL
        > (or, in the case of strings, blank). They should instead reside in a
        > leaf table - a table whose foreign key points back to a main table.
        > That table stores a NULL (or blank) by leaving its row out of the
        > table.
        >
        > This is the only concept of "database normalization" that I
        > understand, but it has not failed me yet. We start by declaring a
        > table in our programming language, outside our database, to name these
        > strings:
        >
        > USER_PROFILE_METADATA = [
        > [ 1, 'quest' ]
        > [ 2, 'favorite color' ]
        > [ 3, 'sparrow velocity' ]
        > ]
        >
        > (Note that a truly pernicious refactor, that kept _everything_ in the
        > database, would make that into a table, too. That would only improve
        > our design if we intend to allow users to add profile types on the
        > fly. Without that requirement, the design would just be the same as
        > keeping the metadata in our program.)
        >
        > [Skipping over the "transition period" version], the resulting tables are:
        >
        > create table user
        > id <<pk>>
        > username string
        >
        > create table user_profile
        > id <<pk>>
        > user_id <<fk>>
        > metadata_id
        > value string
        >
        > That pattern allows us to replace this View code...
        >
        > <li>What is your quest?</strong></li>
        > <li><strong><%= @... %></strong></li>
        >
        > <li>What is your favorite color?</strong></li>
        > <li><strong><%= @..._color %></strong></li>
        >
        > <li>What is your sparrow velocity?</li>
        > <li><strong><%= @..._velocity %></strong></li>
        >
        > ...with this:
        >
        > <% USER_PROFILE_METADATA.each do |metadata_id, concept| %>
        > <li>What is your <%= concept %>?</li>
        > <li></strong>
        > <%=
        > profile = @..._profile.find_by_metadata_id(metadata_id)
        > profile ? profile.value : '' # TODO move all this inside
        > the User class
        > %>
        > </strong></li>
        > <% end %>
        >
        > The View code is now DRY.
        >
        > Note that we could also have left the table alone, and used
        > USER_PROFILE_METADATA to pull fields out of a single table by their
        > column names. This refactor's value is, among other things, we can add
        > new profile types without further database surgery.
        >
        > --
        > Phlip
        > http://c2.com/cgi/wiki?ZeekLand
        >
        >
        > ------------------------------------
        >
        > To Post a message, send it to: extremeprogramming@...
        >
        > To Unsubscribe, send a blank message to: extremeprogramming-unsubscribe@...
        >
        > ad-free courtesy of objectmentor.comYahoo! Groups Links
        >
        >
        >



        --
        John Carter Phone : (64)(3) 358 6639
        Tait Electronics Fax : (64)(3) 359 4632
        PO Box 1645 Christchurch Email : john.carter@...
        New Zealand



        --
        John Carter Phone : (64)(3) 358 6639
        Tait Electronics Fax : (64)(3) 359 4632
        PO Box 1645 Christchurch Email : john.carter@...
        New Zealand

        --

        ------------------------------
        This email, including any attachments, is only for the intended recipient.
        It is subject to copyright, is confidential and may be the subject of legal
        or other privilege, none of which is waived or lost by reason of this
        transmission.
        If you are not an intended recipient, you may not use, disseminate,
        distribute or reproduce such email, any attachments, or any part thereof.
        If you have received a message in error, please notify the sender
        immediately and erase all copies of the message and any attachments.
        Unfortunately, we cannot warrant that the email has not been altered or
        corrupted during transmission nor can we guarantee that any email or any
        attachments are free from computer viruses or other conditions which may
        damage or interfere with recipient data, hardware or software. The
        recipient relies upon its own procedures and assumes all risk of use and of
        opening any attachments.
        ------------------------------
      • Francis Fish
        You guys are confusing me. Normalisation, as I understand it, is reducing things to third normal form. i.e. http://en.wikipedia.org/wiki/Third_normal_form You
        Message 3 of 4 , Jan 11, 2013
        • 0 Attachment
          You guys are confusing me.

          Normalisation, as I understand it, is reducing things to third normal form.
          i.e. http://en.wikipedia.org/wiki/Third_normal_form

          You just seem to be eschewing nulls, which is laudable, and adding a *lot*
          of complexity, which is not. Chris Date talks a lot about avoiding nulls
          wherever you can because you end up with 3 value logic and that way lies
          madness.

          I have seen the user/user_profile split done for pragmatic reasons of not
          wanting to store a lot of null data in a table that might end up largely
          empty, depending on the usage of the data. But it's a pragmatic phyisical
          implementation from the days when we used to have logical and physical
          models of the database.

          In the Ruby example you are violating the "fat model, thin controller"
          principle by putting the find_by logic in the view where it does not
          belong, pushing this into the view just means you end up with stuff that's
          brittle and very hard to test. Moving this into named methods on the model
          class is what you should be doing, because you can do things like catch
          exceptions (say a relationship is empty but you want to display a
          meaningful default, it would be a nightmare to ram this in the view), and,
          holy of holies, test it really easily. To test the view you'd have go into
          a stubbing frenzy and it would be incredibly brittle.

          ActiveRecord also has a nice built in delegation system you can use if you
          must go for this data model and keep it out of the view.

          There is also another separate discussion about how the now pervasive habit
          of using synthetic id's as keys everywhere is a *bad* idea instead of the
          true keys and other data that goes with them, but most ORM's aren't built
          to do this so it's hard to do it right. But not doing it means it's really
          easy to end up with orphaned or duplicate data unless you put in unique
          indexes on the true keys and express the true relationships by defining
          them in the underlying RDBMS.

          I use AR because it's less work and everybody uses it, but it makes the
          database nut in me cringe.

          Thanks and regards,

          Francis

          Follow me on twitter https://twitter.com/fjfish
          Blog at http://www.francisfish.com
          Books at https://leanpub.com/u/fjfish
          CV http://www.pharmarketeer.com/francis.html

          I have no intention of apologizing for believing in people, for insisting
          that we all use this moment and these assets to create some art and improve
          the world around us.
          To do anything less than that is a crime. - Seth Godin


          On Fri, Jan 11, 2013 at 5:04 AM, John Carter <john.carter@...> wrote:

          > **
          >
          >
          > I _love_ sqlite and even better, firefox and sqlitemanager addon....
          >
          > Your original....
          > DROP TABLE IF EXISTS "user";
          > CREATE TABLE "user" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
          > , "username" TEXT NOT NULL , "quest" TEXT, "favourite_color" TEXT,
          > "sparrow_velocity" REAL);
          > INSERT INTO "user" VALUES(1,'phlip','dbrefactor','pink',NULL);
          > INSERT INTO "user" VALUES(2,'john','codesweeping',NULL,4.3);
          >
          > Normalised "null" free version...
          > DROP TABLE IF EXISTS "favourite_color";
          > CREATE TABLE "favourite_color" ("id" INTEGER PRIMARY KEY NOT NULL ,
          > "color" TEXT NOT NULL );
          > INSERT INTO "favourite_color" VALUES(1,'pink');
          >
          > DROP TABLE IF EXISTS "quest";
          > CREATE TABLE "quest" ("id" INTEGER PRIMARY KEY NOT NULL , "quest"
          > TEXT NOT NULL );
          > INSERT INTO "quest" VALUES(1,'dbrefactor');
          > INSERT INTO "quest" VALUES(2,'streetsweeping');
          >
          > DROP TABLE IF EXISTS "user_normalised";
          > CREATE TABLE "user_normalised" ("id" INTEGER PRIMARY KEY
          > AUTOINCREMENT NOT NULL , "username" TEXT NOT NULL );
          > INSERT INTO "user_normalised" VALUES(1,'phlip');
          > INSERT INTO "user_normalised" VALUES(2,'john');
          >
          > DROP TABLE IF EXISTS "users_sparrow";
          > CREATE TABLE "users_sparrow" ("id" INTEGER PRIMARY KEY NOT NULL ,
          > "velocity" REAL NOT NULL );
          > INSERT INTO "users_sparrow" VALUES(2,4.3);
          >
          > A view to serve up _exactly_ what you had initially.
          > CREATE VIEW "user_view" AS select
          > id,username,quest,favourite_color,sparrow_velocity from (select * from
          > (select id,username,color as favourite_color from user_normalised
          > left outer join favourite_color using (id)) natural join (select
          > id,velocity as sparrow_velocity from user_normalised left outer join
          > users_sparrow using (id)) left outer join quest using (id));
          >
          >
          > On Mon, Jan 7, 2013 at 5:16 AM, Phlip phlip2005@...> wrote:
          > >
          > > I'm looking for a write-up for this refactor. If there ain't none,
          > > then this is the write-up.
          > >
          > > Here's the input table:
          > >
          > > create table user
          > > id < >
          > > username string
          > > quest string
          > > favorite_color string
          > > sparrow_velocity string
          > >
          > > The lower three fields are optional. A very rough explanation of
          > > "database normalization" says that optional fields should not be NULL
          > > (or, in the case of strings, blank). They should instead reside in a
          > > leaf table - a table whose foreign key points back to a main table.
          > > That table stores a NULL (or blank) by leaving its row out of the
          > > table.
          > >
          > > This is the only concept of "database normalization" that I
          > > understand, but it has not failed me yet. We start by declaring a
          > > table in our programming language, outside our database, to name these
          > > strings:
          > >
          > > USER_PROFILE_METADATA = [
          > > [ 1, 'quest' ]
          > > [ 2, 'favorite color' ]
          > > [ 3, 'sparrow velocity' ]
          > > ]
          > >
          > > (Note that a truly pernicious refactor, that kept _everything_ in the
          > > database, would make that into a table, too. That would only improve
          > > our design if we intend to allow users to add profile types on the
          > > fly. Without that requirement, the design would just be the same as
          > > keeping the metadata in our program.)
          > >
          > > [Skipping over the "transition period" version], the resulting tables
          > are:
          > >
          > > create table user
          > > id < >
          > > username string
          > >
          > > create table user_profile
          > > id < >
          > > user_id < >
          > > metadata_id
          > > value string
          > >
          > > That pattern allows us to replace this View code...
          > >
          > > What is your quest?
          > > *<%= @... %>*
          > >
          > > What is your favorite color?
          > > *<%= @..._color %>*
          > >
          > > What is your sparrow velocity?
          > > *<%= @..._velocity %>*
          > >
          > > ...with this:
          > >
          > > <% USER_PROFILE_METADATA.each do |metadata_id, concept| %>
          > > What is your <%= concept %>?
          > >
          > > <%=
          > > profile = @..._profile.find_by_metadata_id(metadata_id)
          > > profile ? profile.value : '' # TODO move all this inside
          > > the User class
          > > %>
          > >
          > > <% end %>
          > >
          > > The View code is now DRY.
          > >
          > > Note that we could also have left the table alone, and used
          > > USER_PROFILE_METADATA to pull fields out of a single table by their
          > > column names. This refactor's value is, among other things, we can add
          > > new profile types without further database surgery.
          > >
          > > --
          > > Phlip
          > > http://c2.com/cgi/wiki?ZeekLand
          > >
          > >
          > > ------------------------------------
          >
          > >
          > > To Post a message, send it to: extremeprogramming@...
          > >
          > > To Unsubscribe, send a blank message to:
          > extremeprogramming-unsubscribe@...
          > >
          > > ad-free courtesy of objectmentor.comYahoo! Groups Links
          > >
          > >
          > >
          >
          > --
          > John Carter Phone : (64)(3) 358 6639
          > Tait Electronics Fax : (64)(3) 359 4632
          > PO Box 1645 Christchurch Email : john.carter@...
          > New Zealand
          >
          > --
          > John Carter Phone : (64)(3) 358 6639
          > Tait Electronics Fax : (64)(3) 359 4632
          > PO Box 1645 Christchurch Email : john.carter@...
          > New Zealand
          >
          > --
          >
          > ------------------------------
          > This email, including any attachments, is only for the intended recipient.
          > It is subject to copyright, is confidential and may be the subject of
          > legal
          > or other privilege, none of which is waived or lost by reason of this
          > transmission.
          > If you are not an intended recipient, you may not use, disseminate,
          > distribute or reproduce such email, any attachments, or any part thereof.
          > If you have received a message in error, please notify the sender
          > immediately and erase all copies of the message and any attachments.
          > Unfortunately, we cannot warrant that the email has not been altered or
          > corrupted during transmission nor can we guarantee that any email or any
          > attachments are free from computer viruses or other conditions which may
          > damage or interfere with recipient data, hardware or software. The
          > recipient relies upon its own procedures and assumes all risk of use and
          > of
          > opening any attachments.
          > ------------------------------
          >
          >
          >


          [Non-text portions of this message have been removed]
        • Phlip
          ... Then call what I did a potentially good thing to do to databases . I should not have used the term normalization . ... If, for example, the values and
          Message 4 of 4 , Jan 11, 2013
          • 0 Attachment
            On Fri, Jan 11, 2013 at 4:59 AM, Francis Fish <francis@...> wrote:

            > Normalisation, as I understand it, is reducing things to third normal form.
            > i.e. http://en.wikipedia.org/wiki/Third_normal_form

            Then call what I did a "potentially good thing to do to databases". I
            should not have used the term "normalization".

            > You just seem to be eschewing nulls, which is laudable, and adding a *lot*
            > of complexity, which is not. Chris Date talks a lot about avoiding nulls
            > wherever you can because you end up with 3 value logic and that way lies
            > madness.

            If, for example, the values and labels, such as 'swallow_velocity',
            had each gone into their own separate table, then the result would be
            _more_ complex. Even if one then hid the complexity in a view.

            The root of my refactor is "roll up an unrolled loop." Like I said, we
            could also have rolled it up by making a list of fields, instead of
            records, to loop through.

            > In the Ruby example you are violating the "fat model, thin controller"
            > principle by putting the find_by logic in the view where it does not
            > belong, pushing this into the view just means you end up with stuff that's
            > brittle and very hard to test. Moving this into named methods on the model
            > class is what you should be doing, because you can do things like catch
            > exceptions (say a relationship is empty but you want to display a
            > meaningful default, it would be a nightmare to ram this in the view), and,
            > holy of holies, test it really easily.

            I said in a TODO comment to move the excess code to the model.

            > To test the view you'd have go into
            > a stubbing frenzy and it would be incredibly brittle.

            I wouldn't have a problem with testing that through the view, but of
            course the model comes first.

            > ActiveRecord also has a nice built in delegation system you can use if you
            > must go for this data model and keep it out of the view.

            And I was illustrating the actual refactor, not ActiveRecord. Yes it
            could wrap the leaf table.

            > There is also another separate discussion about how the now pervasive habit
            > of using synthetic id's as keys everywhere is a *bad* idea instead of the
            > true keys and other data that goes with them, but most ORM's aren't built
            > to do this so it's hard to do it right. But not doing it means it's really
            > easy to end up with orphaned or duplicate data unless you put in unique
            > indexes on the true keys and express the true relationships by defining
            > them in the underlying RDBMS.

            And I pointed out somewhere the metadata table could have been a real table.

            > I use AR because it's less work and everybody uses it, but it makes the
            > database nut in me cringe.

            Ah, that explains it.

            John Carter wrote:

            >> Normalised "null" free version...
            >> DROP TABLE IF EXISTS "favourite_color";
            >> CREATE TABLE "favourite_color" ("id" INTEGER PRIMARY KEY NOT NULL ,
            >> "color" TEXT NOT NULL );
            >> INSERT INTO "favourite_color" VALUES(1,'pink');

            Wat?

            oookay, if we are still on the "null rule", maybe a better rule is
            "sometimes you should replace several fields that are often all null,
            together, with a leaf table, whose entire record might not be there."

            >> A view to serve up _exactly_ what you had initially.
            >> CREATE VIEW "user_view" AS select

            Thanks but the idea was simplifying the code that non-database-nuts
            care about; the model & view.

            --
            Phlip
            http://c2.com/cgi/wiki?ZeekLand
          Your message has been successfully submitted and would be delivered to recipients shortly.