Loading ...
Sorry, an error occurred while loading the content.
Skip to search.
 

RE: OO data design and search performance

Expand Messages
  • Gareth Kirwan
    The first thought I had here was VIEW. Gareth ... From: Clinton Gormley [mailto:clint@drtech.co.uk] Sent: 11 April 2003 10:42 To: modperl Subject: OT: OO data
    Message 1 of 12 , Apr 11, 2003
      The first thought I had here was VIEW.

      Gareth
      -----Original Message-----
      From: Clinton Gormley [mailto:clint@...]
      Sent: 11 April 2003 10:42
      To: modperl
      Subject: OT: OO data design and search performance

      Hi all

      Sorry - this isn't a mod_perl question, but a lot of you will have experience with this kind of issue.

      I am building a travel web site which allows you to navigate through different objects (eg cities, places to stay etc).  Each object has the same base, and then has a number of different attributes depending on the type (eg "Places to stay" needs an attribute about the rooms, while "City" has a list of highlights in that city.) These attributes can be added and removed by changing a single list for each object type.

      These objects are built from data in a number of different tables in my database. These objects are cached and it's easy to navigate between them.

      The problem comes when you want to search for something specific. 

      So with an advanced search, you may want  to specify a place to stay which is budget, has air-conditioning and a vegetarian restaurant.

      It is clearly possible to build a search which ploughs through all of the tables, but it would be much faster if I had one flat table for (eg) places to stay with all of the attributes in different columns. My original plan was to do just this - use a cron job to rebuild the flat tables when any objects are changed. But this negates the flexibility of the design ie instead of being able to add and remove attributes by altering a single list, I now also need to change the structure of the flat tables and the SQL which accesses them

      The search results are cached, but still, all those joins would give quite a performance hit.

      Is this flattening of the tables a necessary evil, or is there a better way to do it? If flat tables are the way to go, why duplicate information by having the normalised tables as well?

      thanks

      Clint


    • Clinton Gormley
      ... Admittedly, the minority of searches would specify more than a few attributes, so what you say makes sense - my main concern was having to join loads of
      Message 2 of 12 , Apr 11, 2003
        On Fri, 2003-04-11 at 12:03, Aaron Trevena wrote:
        > Clinton Gormley wrote:
        >
        > You do not need to denormalise - Your objects should have a many to many
        > relationship with attributes, like being vegetarian. This many to many
        > relationship maps to a lookup table containing unique records of
        > attribute id and object id.
        >
        > This has many advantages - extra attributes are easy to add, you have a
        > normalised data structure and you have a reverse index that requires
        > only a very very simple join (join attribute table to lookup table and
        > object table by attribute and lookup id respectively). This join will
        > make good use of indexes for these tables and be pretty responsive. the
        > data (or rather the relationship) is only stored in one place and
        > therefore is always up to date and you don't have to worry about data
        > being incorrect or out of date.
        >
        > You very rarely need to flatten tables - good normalisation means you
        > can have a decent design and still have queries that perform at a decent
        > speed.

        Admittedly, the minority of searches would specify more than a few
        attributes, so what you say makes sense - my main concern was having to
        join loads of tables together.

        What about this issue : Locations form the first 6 levels of my object
        tree, linked with parent-child relationships, so
        World->Continent->Country->Region->City->District. Then (eg) places to
        stay are linked to Districts.

        You may want to find all Places to Stay in Syria - would it make sense
        here to have a flattened hierarchy (purely for searching rather than as
        the primary storage), or to still dynamically find all descendants of
        Syria? (Especially given that this filter would be applied with every
        search)

        thanks

        Clint
      • Nick Tonkin
        ... As another poster has remarked, why not have your attributes in a separate table? So long as the ID linking them to the main object is not the primary key
        Message 3 of 12 , Apr 11, 2003
          On Fri, 11 Apr 2003, Clinton Gormley wrote:

          > Hi all
          >
          > Sorry - this isn't a mod_perl question, but a lot of you will have
          > experience with this kind of issue.
          >
          > I am building a travel web site which allows you to navigate through
          > different objects (eg cities, places to stay etc). Each object has the
          > same base, and then has a number of different attributes depending on
          > the type (eg "Places to stay" needs an attribute about the rooms, while
          > "City" has a list of highlights in that city.) These attributes can be
          > added and removed by changing a single list for each object type.
          >
          > These objects are built from data in a number of different tables in my
          > database. These objects are cached and it's easy to navigate between
          > them.
          >
          > The problem comes when you want to search for something specific.
          >
          > So with an advanced search, you may want to specify a place to stay
          > which is budget, has air-conditioning and a vegetarian restaurant.
          >
          > It is clearly possible to build a search which ploughs through all of
          > the tables, but it would be much faster if I had one flat table for (eg)
          > places to stay with all of the attributes in different columns. My
          > original plan was to do just this - use a cron job to rebuild the flat
          > tables when any objects are changed. But this negates the flexibility of
          > the design ie instead of being able to add and remove attributes by
          > altering a single list, I now also need to change the structure of the
          > flat tables and the SQL which accesses them
          >
          > The search results are cached, but still, all those joins would give
          > quite a performance hit.
          >
          > Is this flattening of the tables a necessary evil, or is there a better
          > way to do it? If flat tables are the way to go, why duplicate
          > information by having the normalised tables as well?
          >
          > thanks
          >
          > Clint

          As another poster has remarked, why not have your attributes in a separate
          table? So long as the ID linking them to the main object is not the primary
          key you should be fine.

          Objects:
          ID Name
          1 foo
          2 bar

          Attributes:
          Object_ID Attribute
          1 blue
          1 squishy
          1 expensive
          2 red
          2 cheap


          etc. Then to add a new attribute you just add a value to the table rather than
          a new column named after the attribute and yes/no values.

          Then you can do something like:

          while (my $data = $sth->fetchrow_hashref) {
          ${ $object->{'attributes'} }{ $data->{'attribute' }++;
          }

          I hope this helps.

          - nick

          --

          ~~~~~~~~~~~~~~~~~~~~
          Nick Tonkin {|8^)>
        • Clinton Gormley
          ... Thanks Nick That s pretty much what I ve got - a table of objects, a table specifying which attributes each object type has, and a series of tables for
          Message 4 of 12 , Apr 11, 2003
            On Fri, 2003-04-11 at 13:22, Nick Tonkin wrote:
            As another poster has remarked, why not have your attributes in a separate
            table? So long as the ID linking them to the main object is not the primary
            key you should be fine.
            
            Objects:
            ID			Name
            1			foo
            2			bar
            
            Attributes:
            Object_ID	Attribute
            1			blue
            1			squishy
            1			expensive
            2			red
            2			cheap
            
            
            etc. Then to add a new attribute you just add a value to the table rather than
            a new column named after the attribute and yes/no values.
            
            Thanks Nick

            That's pretty much what I've got - a table of objects, a table specifying which attributes each object type has, and a series of tables for attributes of different types (eg lists, text, more complicated structures like address, images etc)

            The only reason I was thinking about denormalising was for performance reasons, but Aaron's reassurances have placated me.

            Clint
          • Martin Moss
            Here s my forpence:- Why Have database Tables Defining which Attributes belong to a Class. Why not Have the attribuites existing in a config section of your
            Message 5 of 12 , Apr 11, 2003
              
              Here's my forpence:-
               
              Why Have database Tables Defining which Attributes belong to a Class.
              Why not Have the attribuites existing in a config section of your class.
               
               
              e.g here's part of a baseclass.
               
              package My::BaseClass;
               
              sub _table_name
              {
                # this accessor is calling-package-relative
                my $obclass = shift;
                my $class   = ref($obclass) || $obclass;
                my $varname = $class . "::_table_name";
                no strict "refs";   # to access package data symbolically
                return $$varname;
              }
               
              sub _table_id
              {
                # this accessor is calling-package-relative
                my $obclass = shift;
                my $class   = ref($obclass) || $obclass;
                my $varname = $class . "::_table_id";
                no strict "refs";   # to access package data symbolically
                return $$varname;
              }
               
              sub _delete
              {
                my $self = shift;
               
                my $table    = $self->_table_name;
                my $table_id = $self->_table_id;
               
                my $stm = qq{DELETE FROM $table WHERE $table_id = ?};
               
                if ($self->{dbh}->do($stm, undef, $self->id))
                {
                  return 1;
                }
                else
                {
                  return undef;
                }
              }
               
               
              Then create instance classes which use My::BaseClass as a base but define their own configs and
              e.g.:-
              package My::Restaurant
               
              use base (My::BaseClass);
               
              @fields =qw(
                              RestaurantID
                              Name
                              Description
                              vegetarian
                            );
               
              %Required = (
                       Name                 => 'Name',
                       Description          => 'Description',
                       Vegetarian             => 'Is Vegetarian',
                      );
               
              $_table_name    = 'Restaurants';
              $_table_id = 'RestaurantID';
               
               
               
              ___END___
               
              This would mean that you have seperate backend tables for each instance class. This would save on database Lookups, and give you the ability to override in the instance classes certain functions in the Base class. Indeed it would allow you to create specific SQL commands in an instance class which could do complex things like JOINS.  Ultimately JOINING tables in SQL is going to be more efficient than Joining data in perl.
               
               
               
              ----- Original Message -----
              Sent: Friday, April 11, 2003 12:55 PM
              Subject: Re: OT: OO data design and search performance

              On Fri, 2003-04-11 at 13:22, Nick Tonkin wrote:
              As another poster has remarked, why not have your attributes in a separate
              table? So long as the ID linking them to the main object is not the primary
              key you should be fine.
              
              Objects:
              ID			Name
              1			foo
              2			bar
              
              Attributes:
              Object_ID	Attribute
              1			blue
              1			squishy
              1			expensive
              2			red
              2			cheap
              
              
              etc. Then to add a new attribute you just add a value to the table rather than
              a new column named after the attribute and yes/no values.
              
              Thanks Nick

              That's pretty much what I've got - a table of objects, a table specifying which attributes each object type has, and a series of tables for attributes of different types (eg lists, text, more complicated structures like address, images etc)

              The only reason I was thinking about denormalising was for performance reasons, but Aaron's reassurances have placated me.

              Clint
            • Perrin Harkins
              ... Flattening essentially caches the work of finding all descendants, so naturally it will be faster. Given that you are already building a search index, I
              Message 6 of 12 , Apr 11, 2003
                On Fri, 2003-04-11 at 06:20, Clinton Gormley wrote:
                > What about this issue : Locations form the first 6 levels of my object
                > tree, linked with parent-child relationships, so
                > World->Continent->Country->Region->City->District. Then (eg) places to
                > stay are linked to Districts.
                >
                > You may want to find all Places to Stay in Syria - would it make sense
                > here to have a flattened hierarchy (purely for searching rather than as
                > the primary storage), or to still dynamically find all descendants of
                > Syria? (Especially given that this filter would be applied with every
                > search)

                Flattening essentially caches the work of finding all descendants, so
                naturally it will be faster. Given that you are already building a
                search index, I would go ahead and add this as well.

                - Perrin
              • Perrin Harkins
                ... The other suggestions offered here sound fine to me, but for a radically different take on indexing while keeping flexibility, see XML::Comma
                Message 7 of 12 , Apr 11, 2003
                  On Fri, 2003-04-11 at 05:42, Clinton Gormley wrote:
                  > It is clearly possible to build a search which ploughs through all of
                  > the tables, but it would be much faster if I had one flat table for (eg)
                  > places to stay with all of the attributes in different columns. My
                  > original plan was to do just this - use a cron job to rebuild the flat
                  > tables when any objects are changed. But this negates the flexibility of
                  > the design ie instead of being able to add and remove attributes by
                  > altering a single list, I now also need to change the structure of the
                  > flat tables and the SQL which accesses them

                  The other suggestions offered here sound fine to me, but for a radically
                  different take on indexing while keeping flexibility, see XML::Comma
                  (http://xml-comma.org/).

                  - Perrin
                • Clinton Gormley
                  ... This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them). I wanted to do
                  Message 8 of 12 , Apr 11, 2003
                    On Fri, 2003-04-11 at 14:28, Martin Moss wrote:
                    
                    Here's my forpence:-
                     
                    Why Have database Tables Defining which Attributes belong to a Class.
                    Why not Have the attribuites existing in a config section of your class.

                     
                     

                    This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them).

                    I wanted to do :
                    http://website.com/browse?object_id=123. I suppose I could have loaded the object type from the database, and then blessed that object into the relevant sub-class, but it's been easier to maintain everything through a single table.

                    Having said that, there has been the odd "if type==6" lines...

                    <OO tail firmly between legs>

                    Clint

                  • Martin Moss
                    Don t sweat it, in my humble opinion the point of perl is to make life easy for oneself. I guess though that in my experience it s allways better to do it
                    Message 9 of 12 , Apr 11, 2003
                      
                      Don't sweat it, in my humble opinion the point of perl is to make life easy for oneself. I guess though that in my experience it's allways better to do it right first time, coz in 4 months time something usually crops up which results in a 'oh $&$%&&&**'.  Like if somebody says right thats it, we're no longer going to support mysql, we're gonna use oracle instead:-) (always a favourite!)
                       
                      One point of note, if all objects are so simillar, then surely, you can put your perl hat on to write a script which 'creates' all the code modules you need, you just feed it a Dummy Module template and a conf file containing the class specific stuff.
                       
                      Anyway The OO approach would slow down the speed of things, and I guess it all depends on a balance between speed and efficiency against futureproofing and manageability.
                       
                      Break a leg mate!
                       
                      Marty
                       
                      ----- Original Message -----
                      Sent: Friday, April 11, 2003 3:11 PM
                      Subject: Re: OT: OO data design and search performance

                      On Fri, 2003-04-11 at 14:28, Martin Moss wrote:
                      
                      Here's my forpence:-
                       
                      Why Have database Tables Defining which Attributes belong to a Class.
                      Why not Have the attribuites existing in a config section of your class.

                       
                       

                      This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them).

                      I wanted to do :
                      http://website.com/browse?object_id=123. I suppose I could have loaded the object type from the database, and then blessed that object into the relevant sub-class, but it's been easier to maintain everything through a single table.

                      Having said that, there has been the odd "if type==6" lines...

                      <OO tail firmly between legs>

                      Clint

                    • Carl Holm
                      I would second this approach. Nothing that you have described about your data modeling task would preclude the use of a fully normalized relational DB. Once
                      Message 10 of 12 , Apr 11, 2003
                        I would second this approach. Nothing that you have described about
                        your data modeling task would preclude the
                        use of a fully normalized relational DB. Once you have this built,
                        performance can be improved through the creation of
                        indexes (where applicable) and by optimizing the SQL.

                        Regards,

                        Carl Holm

                        Aaron Trevena wrote:

                        > Clinton Gormley wrote:
                        >
                        >> [ snip]
                        >> So with an advanced search, you may want to specify a place to stay
                        >> which is budget, has air-conditioning and a vegetarian restaurant.
                        >>
                        >> It is clearly possible to build a search which ploughs through all of
                        >> the tables, but it would be much faster if I had one flat table for
                        >> (eg) places to stay with all of the attributes in different columns.
                        >> My original plan was to do just this - use a cron job to rebuild the
                        >> flat tables when any objects are changed. But this negates the
                        >> flexibility of the design ie instead of being able to add and remove
                        >> attributes by altering a single list, I now also need to change the
                        >> structure of the flat tables and the SQL which accesses them
                        >
                        >
                        >
                        > You do not need to denormalise - Your objects should have a many to
                        > many relationship with attributes, like being vegetarian. This many to
                        > many relationship maps to a lookup table containing unique records of
                        > attribute id and object id.
                        >
                        > This has many advantages - extra attributes are easy to add, you have
                        > a normalised data structure and you have a reverse index that requires
                        > only a very very simple join (join attribute table to lookup table and
                        > object table by attribute and lookup id respectively). This join will
                        > make good use of indexes for these tables and be pretty responsive.
                        > the data (or rather the relationship) is only stored in one place and
                        > therefore is always up to date and you don't have to worry about data
                        > being incorrect or out of date.
                        >
                        > You very rarely need to flatten tables - good normalisation means you
                        > can have a decent design and still have queries that perform at a
                        > decent speed.
                        >
                        > regards,
                        >
                        > A.
                        >
                        >
                        >
                      Your message has been successfully submitted and would be delivered to recipients shortly.