Loading ...
Sorry, an error occurred while loading the content.

Re: [XP] Database Refactor: Replace Named Columns with a Table of Generic Columns and Metadata

Expand Messages
  • Francis Fish
    You guys are confusing me. Normalisation, as I understand it, is reducing things to third normal form. i.e. http://en.wikipedia.org/wiki/Third_normal_form You
    Message 1 of 4 , Jan 11, 2013
    • 0 Attachment
      You guys are confusing me.

      Normalisation, as I understand it, is reducing things to third normal form.
      i.e. http://en.wikipedia.org/wiki/Third_normal_form

      You just seem to be eschewing nulls, which is laudable, and adding a *lot*
      of complexity, which is not. Chris Date talks a lot about avoiding nulls
      wherever you can because you end up with 3 value logic and that way lies
      madness.

      I have seen the user/user_profile split done for pragmatic reasons of not
      wanting to store a lot of null data in a table that might end up largely
      empty, depending on the usage of the data. But it's a pragmatic phyisical
      implementation from the days when we used to have logical and physical
      models of the database.

      In the Ruby example you are violating the "fat model, thin controller"
      principle by putting the find_by logic in the view where it does not
      belong, pushing this into the view just means you end up with stuff that's
      brittle and very hard to test. Moving this into named methods on the model
      class is what you should be doing, because you can do things like catch
      exceptions (say a relationship is empty but you want to display a
      meaningful default, it would be a nightmare to ram this in the view), and,
      holy of holies, test it really easily. To test the view you'd have go into
      a stubbing frenzy and it would be incredibly brittle.

      ActiveRecord also has a nice built in delegation system you can use if you
      must go for this data model and keep it out of the view.

      There is also another separate discussion about how the now pervasive habit
      of using synthetic id's as keys everywhere is a *bad* idea instead of the
      true keys and other data that goes with them, but most ORM's aren't built
      to do this so it's hard to do it right. But not doing it means it's really
      easy to end up with orphaned or duplicate data unless you put in unique
      indexes on the true keys and express the true relationships by defining
      them in the underlying RDBMS.

      I use AR because it's less work and everybody uses it, but it makes the
      database nut in me cringe.

      Thanks and regards,

      Francis

      Follow me on twitter https://twitter.com/fjfish
      Blog at http://www.francisfish.com
      Books at https://leanpub.com/u/fjfish
      CV http://www.pharmarketeer.com/francis.html

      I have no intention of apologizing for believing in people, for insisting
      that we all use this moment and these assets to create some art and improve
      the world around us.
      To do anything less than that is a crime. - Seth Godin


      On Fri, Jan 11, 2013 at 5:04 AM, John Carter <john.carter@...> wrote:

      > **
      >
      >
      > I _love_ sqlite and even better, firefox and sqlitemanager addon....
      >
      > Your original....
      > DROP TABLE IF EXISTS "user";
      > CREATE TABLE "user" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
      > , "username" TEXT NOT NULL , "quest" TEXT, "favourite_color" TEXT,
      > "sparrow_velocity" REAL);
      > INSERT INTO "user" VALUES(1,'phlip','dbrefactor','pink',NULL);
      > INSERT INTO "user" VALUES(2,'john','codesweeping',NULL,4.3);
      >
      > Normalised "null" free version...
      > DROP TABLE IF EXISTS "favourite_color";
      > CREATE TABLE "favourite_color" ("id" INTEGER PRIMARY KEY NOT NULL ,
      > "color" TEXT NOT NULL );
      > INSERT INTO "favourite_color" VALUES(1,'pink');
      >
      > DROP TABLE IF EXISTS "quest";
      > CREATE TABLE "quest" ("id" INTEGER PRIMARY KEY NOT NULL , "quest"
      > TEXT NOT NULL );
      > INSERT INTO "quest" VALUES(1,'dbrefactor');
      > INSERT INTO "quest" VALUES(2,'streetsweeping');
      >
      > DROP TABLE IF EXISTS "user_normalised";
      > CREATE TABLE "user_normalised" ("id" INTEGER PRIMARY KEY
      > AUTOINCREMENT NOT NULL , "username" TEXT NOT NULL );
      > INSERT INTO "user_normalised" VALUES(1,'phlip');
      > INSERT INTO "user_normalised" VALUES(2,'john');
      >
      > DROP TABLE IF EXISTS "users_sparrow";
      > CREATE TABLE "users_sparrow" ("id" INTEGER PRIMARY KEY NOT NULL ,
      > "velocity" REAL NOT NULL );
      > INSERT INTO "users_sparrow" VALUES(2,4.3);
      >
      > A view to serve up _exactly_ what you had initially.
      > CREATE VIEW "user_view" AS select
      > id,username,quest,favourite_color,sparrow_velocity from (select * from
      > (select id,username,color as favourite_color from user_normalised
      > left outer join favourite_color using (id)) natural join (select
      > id,velocity as sparrow_velocity from user_normalised left outer join
      > users_sparrow using (id)) left outer join quest using (id));
      >
      >
      > On Mon, Jan 7, 2013 at 5:16 AM, Phlip phlip2005@...> wrote:
      > >
      > > I'm looking for a write-up for this refactor. If there ain't none,
      > > then this is the write-up.
      > >
      > > Here's the input table:
      > >
      > > create table user
      > > id < >
      > > username string
      > > quest string
      > > favorite_color string
      > > sparrow_velocity string
      > >
      > > The lower three fields are optional. A very rough explanation of
      > > "database normalization" says that optional fields should not be NULL
      > > (or, in the case of strings, blank). They should instead reside in a
      > > leaf table - a table whose foreign key points back to a main table.
      > > That table stores a NULL (or blank) by leaving its row out of the
      > > table.
      > >
      > > This is the only concept of "database normalization" that I
      > > understand, but it has not failed me yet. We start by declaring a
      > > table in our programming language, outside our database, to name these
      > > strings:
      > >
      > > USER_PROFILE_METADATA = [
      > > [ 1, 'quest' ]
      > > [ 2, 'favorite color' ]
      > > [ 3, 'sparrow velocity' ]
      > > ]
      > >
      > > (Note that a truly pernicious refactor, that kept _everything_ in the
      > > database, would make that into a table, too. That would only improve
      > > our design if we intend to allow users to add profile types on the
      > > fly. Without that requirement, the design would just be the same as
      > > keeping the metadata in our program.)
      > >
      > > [Skipping over the "transition period" version], the resulting tables
      > are:
      > >
      > > create table user
      > > id < >
      > > username string
      > >
      > > create table user_profile
      > > id < >
      > > user_id < >
      > > metadata_id
      > > value string
      > >
      > > That pattern allows us to replace this View code...
      > >
      > > What is your quest?
      > > *<%= @... %>*
      > >
      > > What is your favorite color?
      > > *<%= @..._color %>*
      > >
      > > What is your sparrow velocity?
      > > *<%= @..._velocity %>*
      > >
      > > ...with this:
      > >
      > > <% USER_PROFILE_METADATA.each do |metadata_id, concept| %>
      > > What is your <%= concept %>?
      > >
      > > <%=
      > > profile = @..._profile.find_by_metadata_id(metadata_id)
      > > profile ? profile.value : '' # TODO move all this inside
      > > the User class
      > > %>
      > >
      > > <% end %>
      > >
      > > The View code is now DRY.
      > >
      > > Note that we could also have left the table alone, and used
      > > USER_PROFILE_METADATA to pull fields out of a single table by their
      > > column names. This refactor's value is, among other things, we can add
      > > new profile types without further database surgery.
      > >
      > > --
      > > Phlip
      > > http://c2.com/cgi/wiki?ZeekLand
      > >
      > >
      > > ------------------------------------
      >
      > >
      > > To Post a message, send it to: extremeprogramming@...
      > >
      > > To Unsubscribe, send a blank message to:
      > extremeprogramming-unsubscribe@...
      > >
      > > ad-free courtesy of objectmentor.comYahoo! Groups Links
      > >
      > >
      > >
      >
      > --
      > John Carter Phone : (64)(3) 358 6639
      > Tait Electronics Fax : (64)(3) 359 4632
      > PO Box 1645 Christchurch Email : john.carter@...
      > New Zealand
      >
      > --
      > John Carter Phone : (64)(3) 358 6639
      > Tait Electronics Fax : (64)(3) 359 4632
      > PO Box 1645 Christchurch Email : john.carter@...
      > New Zealand
      >
      > --
      >
      > ------------------------------
      > This email, including any attachments, is only for the intended recipient.
      > It is subject to copyright, is confidential and may be the subject of
      > legal
      > or other privilege, none of which is waived or lost by reason of this
      > transmission.
      > If you are not an intended recipient, you may not use, disseminate,
      > distribute or reproduce such email, any attachments, or any part thereof.
      > If you have received a message in error, please notify the sender
      > immediately and erase all copies of the message and any attachments.
      > Unfortunately, we cannot warrant that the email has not been altered or
      > corrupted during transmission nor can we guarantee that any email or any
      > attachments are free from computer viruses or other conditions which may
      > damage or interfere with recipient data, hardware or software. The
      > recipient relies upon its own procedures and assumes all risk of use and
      > of
      > opening any attachments.
      > ------------------------------
      >
      >
      >


      [Non-text portions of this message have been removed]
    • Phlip
      ... Then call what I did a potentially good thing to do to databases . I should not have used the term normalization . ... If, for example, the values and
      Message 2 of 4 , Jan 11, 2013
      • 0 Attachment
        On Fri, Jan 11, 2013 at 4:59 AM, Francis Fish <francis@...> wrote:

        > Normalisation, as I understand it, is reducing things to third normal form.
        > i.e. http://en.wikipedia.org/wiki/Third_normal_form

        Then call what I did a "potentially good thing to do to databases". I
        should not have used the term "normalization".

        > You just seem to be eschewing nulls, which is laudable, and adding a *lot*
        > of complexity, which is not. Chris Date talks a lot about avoiding nulls
        > wherever you can because you end up with 3 value logic and that way lies
        > madness.

        If, for example, the values and labels, such as 'swallow_velocity',
        had each gone into their own separate table, then the result would be
        _more_ complex. Even if one then hid the complexity in a view.

        The root of my refactor is "roll up an unrolled loop." Like I said, we
        could also have rolled it up by making a list of fields, instead of
        records, to loop through.

        > In the Ruby example you are violating the "fat model, thin controller"
        > principle by putting the find_by logic in the view where it does not
        > belong, pushing this into the view just means you end up with stuff that's
        > brittle and very hard to test. Moving this into named methods on the model
        > class is what you should be doing, because you can do things like catch
        > exceptions (say a relationship is empty but you want to display a
        > meaningful default, it would be a nightmare to ram this in the view), and,
        > holy of holies, test it really easily.

        I said in a TODO comment to move the excess code to the model.

        > To test the view you'd have go into
        > a stubbing frenzy and it would be incredibly brittle.

        I wouldn't have a problem with testing that through the view, but of
        course the model comes first.

        > ActiveRecord also has a nice built in delegation system you can use if you
        > must go for this data model and keep it out of the view.

        And I was illustrating the actual refactor, not ActiveRecord. Yes it
        could wrap the leaf table.

        > There is also another separate discussion about how the now pervasive habit
        > of using synthetic id's as keys everywhere is a *bad* idea instead of the
        > true keys and other data that goes with them, but most ORM's aren't built
        > to do this so it's hard to do it right. But not doing it means it's really
        > easy to end up with orphaned or duplicate data unless you put in unique
        > indexes on the true keys and express the true relationships by defining
        > them in the underlying RDBMS.

        And I pointed out somewhere the metadata table could have been a real table.

        > I use AR because it's less work and everybody uses it, but it makes the
        > database nut in me cringe.

        Ah, that explains it.

        John Carter wrote:

        >> Normalised "null" free version...
        >> DROP TABLE IF EXISTS "favourite_color";
        >> CREATE TABLE "favourite_color" ("id" INTEGER PRIMARY KEY NOT NULL ,
        >> "color" TEXT NOT NULL );
        >> INSERT INTO "favourite_color" VALUES(1,'pink');

        Wat?

        oookay, if we are still on the "null rule", maybe a better rule is
        "sometimes you should replace several fields that are often all null,
        together, with a leaf table, whose entire record might not be there."

        >> A view to serve up _exactly_ what you had initially.
        >> CREATE VIEW "user_view" AS select

        Thanks but the idea was simplifying the code that non-database-nuts
        care about; the model & view.

        --
        Phlip
        http://c2.com/cgi/wiki?ZeekLand
      Your message has been successfully submitted and would be delivered to recipients shortly.