Loading ...
Sorry, an error occurred while loading the content.
 

Re: OT: OO data design and search performance

Expand Messages
  • Clinton Gormley
    ... Admittedly, the minority of searches would specify more than a few attributes, so what you say makes sense - my main concern was having to join loads of
    Message 1 of 12 , Apr 11, 2003
      On Fri, 2003-04-11 at 12:03, Aaron Trevena wrote:
      > Clinton Gormley wrote:
      >
      > You do not need to denormalise - Your objects should have a many to many
      > relationship with attributes, like being vegetarian. This many to many
      > relationship maps to a lookup table containing unique records of
      > attribute id and object id.
      >
      > This has many advantages - extra attributes are easy to add, you have a
      > normalised data structure and you have a reverse index that requires
      > only a very very simple join (join attribute table to lookup table and
      > object table by attribute and lookup id respectively). This join will
      > make good use of indexes for these tables and be pretty responsive. the
      > data (or rather the relationship) is only stored in one place and
      > therefore is always up to date and you don't have to worry about data
      > being incorrect or out of date.
      >
      > You very rarely need to flatten tables - good normalisation means you
      > can have a decent design and still have queries that perform at a decent
      > speed.

      Admittedly, the minority of searches would specify more than a few
      attributes, so what you say makes sense - my main concern was having to
      join loads of tables together.

      What about this issue : Locations form the first 6 levels of my object
      tree, linked with parent-child relationships, so
      World->Continent->Country->Region->City->District. Then (eg) places to
      stay are linked to Districts.

      You may want to find all Places to Stay in Syria - would it make sense
      here to have a flattened hierarchy (purely for searching rather than as
      the primary storage), or to still dynamically find all descendants of
      Syria? (Especially given that this filter would be applied with every
      search)

      thanks

      Clint
    • Nick Tonkin
      ... As another poster has remarked, why not have your attributes in a separate table? So long as the ID linking them to the main object is not the primary key
      Message 2 of 12 , Apr 11, 2003
        On Fri, 11 Apr 2003, Clinton Gormley wrote:

        > Hi all
        >
        > Sorry - this isn't a mod_perl question, but a lot of you will have
        > experience with this kind of issue.
        >
        > I am building a travel web site which allows you to navigate through
        > different objects (eg cities, places to stay etc). Each object has the
        > same base, and then has a number of different attributes depending on
        > the type (eg "Places to stay" needs an attribute about the rooms, while
        > "City" has a list of highlights in that city.) These attributes can be
        > added and removed by changing a single list for each object type.
        >
        > These objects are built from data in a number of different tables in my
        > database. These objects are cached and it's easy to navigate between
        > them.
        >
        > The problem comes when you want to search for something specific.
        >
        > So with an advanced search, you may want to specify a place to stay
        > which is budget, has air-conditioning and a vegetarian restaurant.
        >
        > It is clearly possible to build a search which ploughs through all of
        > the tables, but it would be much faster if I had one flat table for (eg)
        > places to stay with all of the attributes in different columns. My
        > original plan was to do just this - use a cron job to rebuild the flat
        > tables when any objects are changed. But this negates the flexibility of
        > the design ie instead of being able to add and remove attributes by
        > altering a single list, I now also need to change the structure of the
        > flat tables and the SQL which accesses them
        >
        > The search results are cached, but still, all those joins would give
        > quite a performance hit.
        >
        > Is this flattening of the tables a necessary evil, or is there a better
        > way to do it? If flat tables are the way to go, why duplicate
        > information by having the normalised tables as well?
        >
        > thanks
        >
        > Clint

        As another poster has remarked, why not have your attributes in a separate
        table? So long as the ID linking them to the main object is not the primary
        key you should be fine.

        Objects:
        ID Name
        1 foo
        2 bar

        Attributes:
        Object_ID Attribute
        1 blue
        1 squishy
        1 expensive
        2 red
        2 cheap


        etc. Then to add a new attribute you just add a value to the table rather than
        a new column named after the attribute and yes/no values.

        Then you can do something like:

        while (my $data = $sth->fetchrow_hashref) {
        ${ $object->{'attributes'} }{ $data->{'attribute' }++;
        }

        I hope this helps.

        - nick

        --

        ~~~~~~~~~~~~~~~~~~~~
        Nick Tonkin {|8^)>
      • Clinton Gormley
        ... Thanks Nick That s pretty much what I ve got - a table of objects, a table specifying which attributes each object type has, and a series of tables for
        Message 3 of 12 , Apr 11, 2003
          On Fri, 2003-04-11 at 13:22, Nick Tonkin wrote:
          As another poster has remarked, why not have your attributes in a separate
          table? So long as the ID linking them to the main object is not the primary
          key you should be fine.
          
          Objects:
          ID			Name
          1			foo
          2			bar
          
          Attributes:
          Object_ID	Attribute
          1			blue
          1			squishy
          1			expensive
          2			red
          2			cheap
          
          
          etc. Then to add a new attribute you just add a value to the table rather than
          a new column named after the attribute and yes/no values.
          
          Thanks Nick

          That's pretty much what I've got - a table of objects, a table specifying which attributes each object type has, and a series of tables for attributes of different types (eg lists, text, more complicated structures like address, images etc)

          The only reason I was thinking about denormalising was for performance reasons, but Aaron's reassurances have placated me.

          Clint
        • Martin Moss
          Here s my forpence:- Why Have database Tables Defining which Attributes belong to a Class. Why not Have the attribuites existing in a config section of your
          Message 4 of 12 , Apr 11, 2003
            
            Here's my forpence:-
             
            Why Have database Tables Defining which Attributes belong to a Class.
            Why not Have the attribuites existing in a config section of your class.
             
             
            e.g here's part of a baseclass.
             
            package My::BaseClass;
             
            sub _table_name
            {
              # this accessor is calling-package-relative
              my $obclass = shift;
              my $class   = ref($obclass) || $obclass;
              my $varname = $class . "::_table_name";
              no strict "refs";   # to access package data symbolically
              return $$varname;
            }
             
            sub _table_id
            {
              # this accessor is calling-package-relative
              my $obclass = shift;
              my $class   = ref($obclass) || $obclass;
              my $varname = $class . "::_table_id";
              no strict "refs";   # to access package data symbolically
              return $$varname;
            }
             
            sub _delete
            {
              my $self = shift;
             
              my $table    = $self->_table_name;
              my $table_id = $self->_table_id;
             
              my $stm = qq{DELETE FROM $table WHERE $table_id = ?};
             
              if ($self->{dbh}->do($stm, undef, $self->id))
              {
                return 1;
              }
              else
              {
                return undef;
              }
            }
             
             
            Then create instance classes which use My::BaseClass as a base but define their own configs and
            e.g.:-
            package My::Restaurant
             
            use base (My::BaseClass);
             
            @fields =qw(
                            RestaurantID
                            Name
                            Description
                            vegetarian
                          );
             
            %Required = (
                     Name                 => 'Name',
                     Description          => 'Description',
                     Vegetarian             => 'Is Vegetarian',
                    );
             
            $_table_name    = 'Restaurants';
            $_table_id = 'RestaurantID';
             
             
             
            ___END___
             
            This would mean that you have seperate backend tables for each instance class. This would save on database Lookups, and give you the ability to override in the instance classes certain functions in the Base class. Indeed it would allow you to create specific SQL commands in an instance class which could do complex things like JOINS.  Ultimately JOINING tables in SQL is going to be more efficient than Joining data in perl.
             
             
             
            ----- Original Message -----
            Sent: Friday, April 11, 2003 12:55 PM
            Subject: Re: OT: OO data design and search performance

            On Fri, 2003-04-11 at 13:22, Nick Tonkin wrote:
            As another poster has remarked, why not have your attributes in a separate
            table? So long as the ID linking them to the main object is not the primary
            key you should be fine.
            
            Objects:
            ID			Name
            1			foo
            2			bar
            
            Attributes:
            Object_ID	Attribute
            1			blue
            1			squishy
            1			expensive
            2			red
            2			cheap
            
            
            etc. Then to add a new attribute you just add a value to the table rather than
            a new column named after the attribute and yes/no values.
            
            Thanks Nick

            That's pretty much what I've got - a table of objects, a table specifying which attributes each object type has, and a series of tables for attributes of different types (eg lists, text, more complicated structures like address, images etc)

            The only reason I was thinking about denormalising was for performance reasons, but Aaron's reassurances have placated me.

            Clint
          • Perrin Harkins
            ... Flattening essentially caches the work of finding all descendants, so naturally it will be faster. Given that you are already building a search index, I
            Message 5 of 12 , Apr 11, 2003
              On Fri, 2003-04-11 at 06:20, Clinton Gormley wrote:
              > What about this issue : Locations form the first 6 levels of my object
              > tree, linked with parent-child relationships, so
              > World->Continent->Country->Region->City->District. Then (eg) places to
              > stay are linked to Districts.
              >
              > You may want to find all Places to Stay in Syria - would it make sense
              > here to have a flattened hierarchy (purely for searching rather than as
              > the primary storage), or to still dynamically find all descendants of
              > Syria? (Especially given that this filter would be applied with every
              > search)

              Flattening essentially caches the work of finding all descendants, so
              naturally it will be faster. Given that you are already building a
              search index, I would go ahead and add this as well.

              - Perrin
            • Perrin Harkins
              ... The other suggestions offered here sound fine to me, but for a radically different take on indexing while keeping flexibility, see XML::Comma
              Message 6 of 12 , Apr 11, 2003
                On Fri, 2003-04-11 at 05:42, Clinton Gormley wrote:
                > It is clearly possible to build a search which ploughs through all of
                > the tables, but it would be much faster if I had one flat table for (eg)
                > places to stay with all of the attributes in different columns. My
                > original plan was to do just this - use a cron job to rebuild the flat
                > tables when any objects are changed. But this negates the flexibility of
                > the design ie instead of being able to add and remove attributes by
                > altering a single list, I now also need to change the structure of the
                > flat tables and the SQL which accesses them

                The other suggestions offered here sound fine to me, but for a radically
                different take on indexing while keeping flexibility, see XML::Comma
                (http://xml-comma.org/).

                - Perrin
              • Clinton Gormley
                ... This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them). I wanted to do
                Message 7 of 12 , Apr 11, 2003
                  On Fri, 2003-04-11 at 14:28, Martin Moss wrote:
                  
                  Here's my forpence:-
                   
                  Why Have database Tables Defining which Attributes belong to a Class.
                  Why not Have the attribuites existing in a config section of your class.

                   
                   

                  This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them).

                  I wanted to do :
                  http://website.com/browse?object_id=123. I suppose I could have loaded the object type from the database, and then blessed that object into the relevant sub-class, but it's been easier to maintain everything through a single table.

                  Having said that, there has been the odd "if type==6" lines...

                  <OO tail firmly between legs>

                  Clint

                • Martin Moss
                  Don t sweat it, in my humble opinion the point of perl is to make life easy for oneself. I guess though that in my experience it s allways better to do it
                  Message 8 of 12 , Apr 11, 2003
                    
                    Don't sweat it, in my humble opinion the point of perl is to make life easy for oneself. I guess though that in my experience it's allways better to do it right first time, coz in 4 months time something usually crops up which results in a 'oh $&$%&&&**'.  Like if somebody says right thats it, we're no longer going to support mysql, we're gonna use oracle instead:-) (always a favourite!)
                     
                    One point of note, if all objects are so simillar, then surely, you can put your perl hat on to write a script which 'creates' all the code modules you need, you just feed it a Dummy Module template and a conf file containing the class specific stuff.
                     
                    Anyway The OO approach would slow down the speed of things, and I guess it all depends on a balance between speed and efficiency against futureproofing and manageability.
                     
                    Break a leg mate!
                     
                    Marty
                     
                    ----- Original Message -----
                    Sent: Friday, April 11, 2003 3:11 PM
                    Subject: Re: OT: OO data design and search performance

                    On Fri, 2003-04-11 at 14:28, Martin Moss wrote:
                    
                    Here's my forpence:-
                     
                    Why Have database Tables Defining which Attributes belong to a Class.
                    Why not Have the attribuites existing in a config section of your class.

                     
                     

                    This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them).

                    I wanted to do :
                    http://website.com/browse?object_id=123. I suppose I could have loaded the object type from the database, and then blessed that object into the relevant sub-class, but it's been easier to maintain everything through a single table.

                    Having said that, there has been the odd "if type==6" lines...

                    <OO tail firmly between legs>

                    Clint

                  • Carl Holm
                    I would second this approach. Nothing that you have described about your data modeling task would preclude the use of a fully normalized relational DB. Once
                    Message 9 of 12 , Apr 11, 2003
                      I would second this approach. Nothing that you have described about
                      your data modeling task would preclude the
                      use of a fully normalized relational DB. Once you have this built,
                      performance can be improved through the creation of
                      indexes (where applicable) and by optimizing the SQL.

                      Regards,

                      Carl Holm

                      Aaron Trevena wrote:

                      > Clinton Gormley wrote:
                      >
                      >> [ snip]
                      >> So with an advanced search, you may want to specify a place to stay
                      >> which is budget, has air-conditioning and a vegetarian restaurant.
                      >>
                      >> It is clearly possible to build a search which ploughs through all of
                      >> the tables, but it would be much faster if I had one flat table for
                      >> (eg) places to stay with all of the attributes in different columns.
                      >> My original plan was to do just this - use a cron job to rebuild the
                      >> flat tables when any objects are changed. But this negates the
                      >> flexibility of the design ie instead of being able to add and remove
                      >> attributes by altering a single list, I now also need to change the
                      >> structure of the flat tables and the SQL which accesses them
                      >
                      >
                      >
                      > You do not need to denormalise - Your objects should have a many to
                      > many relationship with attributes, like being vegetarian. This many to
                      > many relationship maps to a lookup table containing unique records of
                      > attribute id and object id.
                      >
                      > This has many advantages - extra attributes are easy to add, you have
                      > a normalised data structure and you have a reverse index that requires
                      > only a very very simple join (join attribute table to lookup table and
                      > object table by attribute and lookup id respectively). This join will
                      > make good use of indexes for these tables and be pretty responsive.
                      > the data (or rather the relationship) is only stored in one place and
                      > therefore is always up to date and you don't have to worry about data
                      > being incorrect or out of date.
                      >
                      > You very rarely need to flatten tables - good normalisation means you
                      > can have a decent design and still have queries that perform at a
                      > decent speed.
                      >
                      > regards,
                      >
                      > A.
                      >
                      >
                      >
                    Your message has been successfully submitted and would be delivered to recipients shortly.