Loading ...
Sorry, an error occurred while loading the content.

Re: OT: OO data design and search performance

Expand Messages
  • Nick Tonkin
    ... As another poster has remarked, why not have your attributes in a separate table? So long as the ID linking them to the main object is not the primary key
    Message 1 of 12 , Apr 11, 2003
    • 0 Attachment
      On Fri, 11 Apr 2003, Clinton Gormley wrote:

      > Hi all
      >
      > Sorry - this isn't a mod_perl question, but a lot of you will have
      > experience with this kind of issue.
      >
      > I am building a travel web site which allows you to navigate through
      > different objects (eg cities, places to stay etc). Each object has the
      > same base, and then has a number of different attributes depending on
      > the type (eg "Places to stay" needs an attribute about the rooms, while
      > "City" has a list of highlights in that city.) These attributes can be
      > added and removed by changing a single list for each object type.
      >
      > These objects are built from data in a number of different tables in my
      > database. These objects are cached and it's easy to navigate between
      > them.
      >
      > The problem comes when you want to search for something specific.
      >
      > So with an advanced search, you may want to specify a place to stay
      > which is budget, has air-conditioning and a vegetarian restaurant.
      >
      > It is clearly possible to build a search which ploughs through all of
      > the tables, but it would be much faster if I had one flat table for (eg)
      > places to stay with all of the attributes in different columns. My
      > original plan was to do just this - use a cron job to rebuild the flat
      > tables when any objects are changed. But this negates the flexibility of
      > the design ie instead of being able to add and remove attributes by
      > altering a single list, I now also need to change the structure of the
      > flat tables and the SQL which accesses them
      >
      > The search results are cached, but still, all those joins would give
      > quite a performance hit.
      >
      > Is this flattening of the tables a necessary evil, or is there a better
      > way to do it? If flat tables are the way to go, why duplicate
      > information by having the normalised tables as well?
      >
      > thanks
      >
      > Clint

      As another poster has remarked, why not have your attributes in a separate
      table? So long as the ID linking them to the main object is not the primary
      key you should be fine.

      Objects:
      ID Name
      1 foo
      2 bar

      Attributes:
      Object_ID Attribute
      1 blue
      1 squishy
      1 expensive
      2 red
      2 cheap


      etc. Then to add a new attribute you just add a value to the table rather than
      a new column named after the attribute and yes/no values.

      Then you can do something like:

      while (my $data = $sth->fetchrow_hashref) {
      ${ $object->{'attributes'} }{ $data->{'attribute' }++;
      }

      I hope this helps.

      - nick

      --

      ~~~~~~~~~~~~~~~~~~~~
      Nick Tonkin {|8^)>
    • Clinton Gormley
      ... Thanks Nick That s pretty much what I ve got - a table of objects, a table specifying which attributes each object type has, and a series of tables for
      Message 2 of 12 , Apr 11, 2003
      • 0 Attachment
        On Fri, 2003-04-11 at 13:22, Nick Tonkin wrote:
        As another poster has remarked, why not have your attributes in a separate
        table? So long as the ID linking them to the main object is not the primary
        key you should be fine.
        
        Objects:
        ID			Name
        1			foo
        2			bar
        
        Attributes:
        Object_ID	Attribute
        1			blue
        1			squishy
        1			expensive
        2			red
        2			cheap
        
        
        etc. Then to add a new attribute you just add a value to the table rather than
        a new column named after the attribute and yes/no values.
        
        Thanks Nick

        That's pretty much what I've got - a table of objects, a table specifying which attributes each object type has, and a series of tables for attributes of different types (eg lists, text, more complicated structures like address, images etc)

        The only reason I was thinking about denormalising was for performance reasons, but Aaron's reassurances have placated me.

        Clint
      • Martin Moss
        Here s my forpence:- Why Have database Tables Defining which Attributes belong to a Class. Why not Have the attribuites existing in a config section of your
        Message 3 of 12 , Apr 11, 2003
        • 0 Attachment
          
          Here's my forpence:-
           
          Why Have database Tables Defining which Attributes belong to a Class.
          Why not Have the attribuites existing in a config section of your class.
           
           
          e.g here's part of a baseclass.
           
          package My::BaseClass;
           
          sub _table_name
          {
            # this accessor is calling-package-relative
            my $obclass = shift;
            my $class   = ref($obclass) || $obclass;
            my $varname = $class . "::_table_name";
            no strict "refs";   # to access package data symbolically
            return $$varname;
          }
           
          sub _table_id
          {
            # this accessor is calling-package-relative
            my $obclass = shift;
            my $class   = ref($obclass) || $obclass;
            my $varname = $class . "::_table_id";
            no strict "refs";   # to access package data symbolically
            return $$varname;
          }
           
          sub _delete
          {
            my $self = shift;
           
            my $table    = $self->_table_name;
            my $table_id = $self->_table_id;
           
            my $stm = qq{DELETE FROM $table WHERE $table_id = ?};
           
            if ($self->{dbh}->do($stm, undef, $self->id))
            {
              return 1;
            }
            else
            {
              return undef;
            }
          }
           
           
          Then create instance classes which use My::BaseClass as a base but define their own configs and
          e.g.:-
          package My::Restaurant
           
          use base (My::BaseClass);
           
          @fields =qw(
                          RestaurantID
                          Name
                          Description
                          vegetarian
                        );
           
          %Required = (
                   Name                 => 'Name',
                   Description          => 'Description',
                   Vegetarian             => 'Is Vegetarian',
                  );
           
          $_table_name    = 'Restaurants';
          $_table_id = 'RestaurantID';
           
           
           
          ___END___
           
          This would mean that you have seperate backend tables for each instance class. This would save on database Lookups, and give you the ability to override in the instance classes certain functions in the Base class. Indeed it would allow you to create specific SQL commands in an instance class which could do complex things like JOINS.  Ultimately JOINING tables in SQL is going to be more efficient than Joining data in perl.
           
           
           
          ----- Original Message -----
          Sent: Friday, April 11, 2003 12:55 PM
          Subject: Re: OT: OO data design and search performance

          On Fri, 2003-04-11 at 13:22, Nick Tonkin wrote:
          As another poster has remarked, why not have your attributes in a separate
          table? So long as the ID linking them to the main object is not the primary
          key you should be fine.
          
          Objects:
          ID			Name
          1			foo
          2			bar
          
          Attributes:
          Object_ID	Attribute
          1			blue
          1			squishy
          1			expensive
          2			red
          2			cheap
          
          
          etc. Then to add a new attribute you just add a value to the table rather than
          a new column named after the attribute and yes/no values.
          
          Thanks Nick

          That's pretty much what I've got - a table of objects, a table specifying which attributes each object type has, and a series of tables for attributes of different types (eg lists, text, more complicated structures like address, images etc)

          The only reason I was thinking about denormalising was for performance reasons, but Aaron's reassurances have placated me.

          Clint
        • Perrin Harkins
          ... Flattening essentially caches the work of finding all descendants, so naturally it will be faster. Given that you are already building a search index, I
          Message 4 of 12 , Apr 11, 2003
          • 0 Attachment
            On Fri, 2003-04-11 at 06:20, Clinton Gormley wrote:
            > What about this issue : Locations form the first 6 levels of my object
            > tree, linked with parent-child relationships, so
            > World->Continent->Country->Region->City->District. Then (eg) places to
            > stay are linked to Districts.
            >
            > You may want to find all Places to Stay in Syria - would it make sense
            > here to have a flattened hierarchy (purely for searching rather than as
            > the primary storage), or to still dynamically find all descendants of
            > Syria? (Especially given that this filter would be applied with every
            > search)

            Flattening essentially caches the work of finding all descendants, so
            naturally it will be faster. Given that you are already building a
            search index, I would go ahead and add this as well.

            - Perrin
          • Perrin Harkins
            ... The other suggestions offered here sound fine to me, but for a radically different take on indexing while keeping flexibility, see XML::Comma
            Message 5 of 12 , Apr 11, 2003
            • 0 Attachment
              On Fri, 2003-04-11 at 05:42, Clinton Gormley wrote:
              > It is clearly possible to build a search which ploughs through all of
              > the tables, but it would be much faster if I had one flat table for (eg)
              > places to stay with all of the attributes in different columns. My
              > original plan was to do just this - use a cron job to rebuild the flat
              > tables when any objects are changed. But this negates the flexibility of
              > the design ie instead of being able to add and remove attributes by
              > altering a single list, I now also need to change the structure of the
              > flat tables and the SQL which accesses them

              The other suggestions offered here sound fine to me, but for a radically
              different take on indexing while keeping flexibility, see XML::Comma
              (http://xml-comma.org/).

              - Perrin
            • Clinton Gormley
              ... This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them). I wanted to do
              Message 6 of 12 , Apr 11, 2003
              • 0 Attachment
                On Fri, 2003-04-11 at 14:28, Martin Moss wrote:
                
                Here's my forpence:-
                 
                Why Have database Tables Defining which Attributes belong to a Class.
                Why not Have the attribuites existing in a config section of your class.

                 
                 

                This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them).

                I wanted to do :
                http://website.com/browse?object_id=123. I suppose I could have loaded the object type from the database, and then blessed that object into the relevant sub-class, but it's been easier to maintain everything through a single table.

                Having said that, there has been the odd "if type==6" lines...

                <OO tail firmly between legs>

                Clint

              • Martin Moss
                Don t sweat it, in my humble opinion the point of perl is to make life easy for oneself. I guess though that in my experience it s allways better to do it
                Message 7 of 12 , Apr 11, 2003
                • 0 Attachment
                  
                  Don't sweat it, in my humble opinion the point of perl is to make life easy for oneself. I guess though that in my experience it's allways better to do it right first time, coz in 4 months time something usually crops up which results in a 'oh $&$%&&&**'.  Like if somebody says right thats it, we're no longer going to support mysql, we're gonna use oracle instead:-) (always a favourite!)
                   
                  One point of note, if all objects are so simillar, then surely, you can put your perl hat on to write a script which 'creates' all the code modules you need, you just feed it a Dummy Module template and a conf file containing the class specific stuff.
                   
                  Anyway The OO approach would slow down the speed of things, and I guess it all depends on a balance between speed and efficiency against futureproofing and manageability.
                   
                  Break a leg mate!
                   
                  Marty
                   
                  ----- Original Message -----
                  Sent: Friday, April 11, 2003 3:11 PM
                  Subject: Re: OT: OO data design and search performance

                  On Fri, 2003-04-11 at 14:28, Martin Moss wrote:
                  
                  Here's my forpence:-
                   
                  Why Have database Tables Defining which Attributes belong to a Class.
                  Why not Have the attribuites existing in a config section of your class.

                   
                   

                  This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them).

                  I wanted to do :
                  http://website.com/browse?object_id=123. I suppose I could have loaded the object type from the database, and then blessed that object into the relevant sub-class, but it's been easier to maintain everything through a single table.

                  Having said that, there has been the odd "if type==6" lines...

                  <OO tail firmly between legs>

                  Clint

                • Carl Holm
                  I would second this approach. Nothing that you have described about your data modeling task would preclude the use of a fully normalized relational DB. Once
                  Message 8 of 12 , Apr 11, 2003
                  • 0 Attachment
                    I would second this approach. Nothing that you have described about
                    your data modeling task would preclude the
                    use of a fully normalized relational DB. Once you have this built,
                    performance can be improved through the creation of
                    indexes (where applicable) and by optimizing the SQL.

                    Regards,

                    Carl Holm

                    Aaron Trevena wrote:

                    > Clinton Gormley wrote:
                    >
                    >> [ snip]
                    >> So with an advanced search, you may want to specify a place to stay
                    >> which is budget, has air-conditioning and a vegetarian restaurant.
                    >>
                    >> It is clearly possible to build a search which ploughs through all of
                    >> the tables, but it would be much faster if I had one flat table for
                    >> (eg) places to stay with all of the attributes in different columns.
                    >> My original plan was to do just this - use a cron job to rebuild the
                    >> flat tables when any objects are changed. But this negates the
                    >> flexibility of the design ie instead of being able to add and remove
                    >> attributes by altering a single list, I now also need to change the
                    >> structure of the flat tables and the SQL which accesses them
                    >
                    >
                    >
                    > You do not need to denormalise - Your objects should have a many to
                    > many relationship with attributes, like being vegetarian. This many to
                    > many relationship maps to a lookup table containing unique records of
                    > attribute id and object id.
                    >
                    > This has many advantages - extra attributes are easy to add, you have
                    > a normalised data structure and you have a reverse index that requires
                    > only a very very simple join (join attribute table to lookup table and
                    > object table by attribute and lookup id respectively). This join will
                    > make good use of indexes for these tables and be pretty responsive.
                    > the data (or rather the relationship) is only stored in one place and
                    > therefore is always up to date and you don't have to worry about data
                    > being incorrect or out of date.
                    >
                    > You very rarely need to flatten tables - good normalisation means you
                    > can have a decent design and still have queries that perform at a
                    > decent speed.
                    >
                    > regards,
                    >
                    > A.
                    >
                    >
                    >
                  Your message has been successfully submitted and would be delivered to recipients shortly.