Loading ...
Sorry, an error occurred while loading the content.
 

Re: OT: OO data design and search performance

Expand Messages
  • Aaron Trevena
    ... You do not need to denormalise - Your objects should have a many to many relationship with attributes, like being vegetarian. This many to many
    Message 1 of 12 , Apr 11, 2003
      Clinton Gormley wrote:

      > [ snip]
      > So with an advanced search, you may want to specify a place to stay
      > which is budget, has air-conditioning and a vegetarian restaurant.
      >
      > It is clearly possible to build a search which ploughs through all of
      > the tables, but it would be much faster if I had one flat table for
      > (eg) places to stay with all of the attributes in different columns.
      > My original plan was to do just this - use a cron job to rebuild the
      > flat tables when any objects are changed. But this negates the
      > flexibility of the design ie instead of being able to add and remove
      > attributes by altering a single list, I now also need to change the
      > structure of the flat tables and the SQL which accesses them


      You do not need to denormalise - Your objects should have a many to many
      relationship with attributes, like being vegetarian. This many to many
      relationship maps to a lookup table containing unique records of
      attribute id and object id.

      This has many advantages - extra attributes are easy to add, you have a
      normalised data structure and you have a reverse index that requires
      only a very very simple join (join attribute table to lookup table and
      object table by attribute and lookup id respectively). This join will
      make good use of indexes for these tables and be pretty responsive. the
      data (or rather the relationship) is only stored in one place and
      therefore is always up to date and you don't have to worry about data
      being incorrect or out of date.

      You very rarely need to flatten tables - good normalisation means you
      can have a decent design and still have queries that perform at a decent
      speed.

      regards,

      A.
    • Gareth Kirwan
      The first thought I had here was VIEW. Gareth ... From: Clinton Gormley [mailto:clint@drtech.co.uk] Sent: 11 April 2003 10:42 To: modperl Subject: OT: OO data
      Message 2 of 12 , Apr 11, 2003
        The first thought I had here was VIEW.

        Gareth
        -----Original Message-----
        From: Clinton Gormley [mailto:clint@...]
        Sent: 11 April 2003 10:42
        To: modperl
        Subject: OT: OO data design and search performance

        Hi all

        Sorry - this isn't a mod_perl question, but a lot of you will have experience with this kind of issue.

        I am building a travel web site which allows you to navigate through different objects (eg cities, places to stay etc).  Each object has the same base, and then has a number of different attributes depending on the type (eg "Places to stay" needs an attribute about the rooms, while "City" has a list of highlights in that city.) These attributes can be added and removed by changing a single list for each object type.

        These objects are built from data in a number of different tables in my database. These objects are cached and it's easy to navigate between them.

        The problem comes when you want to search for something specific. 

        So with an advanced search, you may want  to specify a place to stay which is budget, has air-conditioning and a vegetarian restaurant.

        It is clearly possible to build a search which ploughs through all of the tables, but it would be much faster if I had one flat table for (eg) places to stay with all of the attributes in different columns. My original plan was to do just this - use a cron job to rebuild the flat tables when any objects are changed. But this negates the flexibility of the design ie instead of being able to add and remove attributes by altering a single list, I now also need to change the structure of the flat tables and the SQL which accesses them

        The search results are cached, but still, all those joins would give quite a performance hit.

        Is this flattening of the tables a necessary evil, or is there a better way to do it? If flat tables are the way to go, why duplicate information by having the normalised tables as well?

        thanks

        Clint


      • Clinton Gormley
        ... Admittedly, the minority of searches would specify more than a few attributes, so what you say makes sense - my main concern was having to join loads of
        Message 3 of 12 , Apr 11, 2003
          On Fri, 2003-04-11 at 12:03, Aaron Trevena wrote:
          > Clinton Gormley wrote:
          >
          > You do not need to denormalise - Your objects should have a many to many
          > relationship with attributes, like being vegetarian. This many to many
          > relationship maps to a lookup table containing unique records of
          > attribute id and object id.
          >
          > This has many advantages - extra attributes are easy to add, you have a
          > normalised data structure and you have a reverse index that requires
          > only a very very simple join (join attribute table to lookup table and
          > object table by attribute and lookup id respectively). This join will
          > make good use of indexes for these tables and be pretty responsive. the
          > data (or rather the relationship) is only stored in one place and
          > therefore is always up to date and you don't have to worry about data
          > being incorrect or out of date.
          >
          > You very rarely need to flatten tables - good normalisation means you
          > can have a decent design and still have queries that perform at a decent
          > speed.

          Admittedly, the minority of searches would specify more than a few
          attributes, so what you say makes sense - my main concern was having to
          join loads of tables together.

          What about this issue : Locations form the first 6 levels of my object
          tree, linked with parent-child relationships, so
          World->Continent->Country->Region->City->District. Then (eg) places to
          stay are linked to Districts.

          You may want to find all Places to Stay in Syria - would it make sense
          here to have a flattened hierarchy (purely for searching rather than as
          the primary storage), or to still dynamically find all descendants of
          Syria? (Especially given that this filter would be applied with every
          search)

          thanks

          Clint
        • Nick Tonkin
          ... As another poster has remarked, why not have your attributes in a separate table? So long as the ID linking them to the main object is not the primary key
          Message 4 of 12 , Apr 11, 2003
            On Fri, 11 Apr 2003, Clinton Gormley wrote:

            > Hi all
            >
            > Sorry - this isn't a mod_perl question, but a lot of you will have
            > experience with this kind of issue.
            >
            > I am building a travel web site which allows you to navigate through
            > different objects (eg cities, places to stay etc). Each object has the
            > same base, and then has a number of different attributes depending on
            > the type (eg "Places to stay" needs an attribute about the rooms, while
            > "City" has a list of highlights in that city.) These attributes can be
            > added and removed by changing a single list for each object type.
            >
            > These objects are built from data in a number of different tables in my
            > database. These objects are cached and it's easy to navigate between
            > them.
            >
            > The problem comes when you want to search for something specific.
            >
            > So with an advanced search, you may want to specify a place to stay
            > which is budget, has air-conditioning and a vegetarian restaurant.
            >
            > It is clearly possible to build a search which ploughs through all of
            > the tables, but it would be much faster if I had one flat table for (eg)
            > places to stay with all of the attributes in different columns. My
            > original plan was to do just this - use a cron job to rebuild the flat
            > tables when any objects are changed. But this negates the flexibility of
            > the design ie instead of being able to add and remove attributes by
            > altering a single list, I now also need to change the structure of the
            > flat tables and the SQL which accesses them
            >
            > The search results are cached, but still, all those joins would give
            > quite a performance hit.
            >
            > Is this flattening of the tables a necessary evil, or is there a better
            > way to do it? If flat tables are the way to go, why duplicate
            > information by having the normalised tables as well?
            >
            > thanks
            >
            > Clint

            As another poster has remarked, why not have your attributes in a separate
            table? So long as the ID linking them to the main object is not the primary
            key you should be fine.

            Objects:
            ID Name
            1 foo
            2 bar

            Attributes:
            Object_ID Attribute
            1 blue
            1 squishy
            1 expensive
            2 red
            2 cheap


            etc. Then to add a new attribute you just add a value to the table rather than
            a new column named after the attribute and yes/no values.

            Then you can do something like:

            while (my $data = $sth->fetchrow_hashref) {
            ${ $object->{'attributes'} }{ $data->{'attribute' }++;
            }

            I hope this helps.

            - nick

            --

            ~~~~~~~~~~~~~~~~~~~~
            Nick Tonkin {|8^)>
          • Clinton Gormley
            ... Thanks Nick That s pretty much what I ve got - a table of objects, a table specifying which attributes each object type has, and a series of tables for
            Message 5 of 12 , Apr 11, 2003
              On Fri, 2003-04-11 at 13:22, Nick Tonkin wrote:
              As another poster has remarked, why not have your attributes in a separate
              table? So long as the ID linking them to the main object is not the primary
              key you should be fine.
              
              Objects:
              ID			Name
              1			foo
              2			bar
              
              Attributes:
              Object_ID	Attribute
              1			blue
              1			squishy
              1			expensive
              2			red
              2			cheap
              
              
              etc. Then to add a new attribute you just add a value to the table rather than
              a new column named after the attribute and yes/no values.
              
              Thanks Nick

              That's pretty much what I've got - a table of objects, a table specifying which attributes each object type has, and a series of tables for attributes of different types (eg lists, text, more complicated structures like address, images etc)

              The only reason I was thinking about denormalising was for performance reasons, but Aaron's reassurances have placated me.

              Clint
            • Martin Moss
              Here s my forpence:- Why Have database Tables Defining which Attributes belong to a Class. Why not Have the attribuites existing in a config section of your
              Message 6 of 12 , Apr 11, 2003
                
                Here's my forpence:-
                 
                Why Have database Tables Defining which Attributes belong to a Class.
                Why not Have the attribuites existing in a config section of your class.
                 
                 
                e.g here's part of a baseclass.
                 
                package My::BaseClass;
                 
                sub _table_name
                {
                  # this accessor is calling-package-relative
                  my $obclass = shift;
                  my $class   = ref($obclass) || $obclass;
                  my $varname = $class . "::_table_name";
                  no strict "refs";   # to access package data symbolically
                  return $$varname;
                }
                 
                sub _table_id
                {
                  # this accessor is calling-package-relative
                  my $obclass = shift;
                  my $class   = ref($obclass) || $obclass;
                  my $varname = $class . "::_table_id";
                  no strict "refs";   # to access package data symbolically
                  return $$varname;
                }
                 
                sub _delete
                {
                  my $self = shift;
                 
                  my $table    = $self->_table_name;
                  my $table_id = $self->_table_id;
                 
                  my $stm = qq{DELETE FROM $table WHERE $table_id = ?};
                 
                  if ($self->{dbh}->do($stm, undef, $self->id))
                  {
                    return 1;
                  }
                  else
                  {
                    return undef;
                  }
                }
                 
                 
                Then create instance classes which use My::BaseClass as a base but define their own configs and
                e.g.:-
                package My::Restaurant
                 
                use base (My::BaseClass);
                 
                @fields =qw(
                                RestaurantID
                                Name
                                Description
                                vegetarian
                              );
                 
                %Required = (
                         Name                 => 'Name',
                         Description          => 'Description',
                         Vegetarian             => 'Is Vegetarian',
                        );
                 
                $_table_name    = 'Restaurants';
                $_table_id = 'RestaurantID';
                 
                 
                 
                ___END___
                 
                This would mean that you have seperate backend tables for each instance class. This would save on database Lookups, and give you the ability to override in the instance classes certain functions in the Base class. Indeed it would allow you to create specific SQL commands in an instance class which could do complex things like JOINS.  Ultimately JOINING tables in SQL is going to be more efficient than Joining data in perl.
                 
                 
                 
                ----- Original Message -----
                Sent: Friday, April 11, 2003 12:55 PM
                Subject: Re: OT: OO data design and search performance

                On Fri, 2003-04-11 at 13:22, Nick Tonkin wrote:
                As another poster has remarked, why not have your attributes in a separate
                table? So long as the ID linking them to the main object is not the primary
                key you should be fine.
                
                Objects:
                ID			Name
                1			foo
                2			bar
                
                Attributes:
                Object_ID	Attribute
                1			blue
                1			squishy
                1			expensive
                2			red
                2			cheap
                
                
                etc. Then to add a new attribute you just add a value to the table rather than
                a new column named after the attribute and yes/no values.
                
                Thanks Nick

                That's pretty much what I've got - a table of objects, a table specifying which attributes each object type has, and a series of tables for attributes of different types (eg lists, text, more complicated structures like address, images etc)

                The only reason I was thinking about denormalising was for performance reasons, but Aaron's reassurances have placated me.

                Clint
              • Perrin Harkins
                ... Flattening essentially caches the work of finding all descendants, so naturally it will be faster. Given that you are already building a search index, I
                Message 7 of 12 , Apr 11, 2003
                  On Fri, 2003-04-11 at 06:20, Clinton Gormley wrote:
                  > What about this issue : Locations form the first 6 levels of my object
                  > tree, linked with parent-child relationships, so
                  > World->Continent->Country->Region->City->District. Then (eg) places to
                  > stay are linked to Districts.
                  >
                  > You may want to find all Places to Stay in Syria - would it make sense
                  > here to have a flattened hierarchy (purely for searching rather than as
                  > the primary storage), or to still dynamically find all descendants of
                  > Syria? (Especially given that this filter would be applied with every
                  > search)

                  Flattening essentially caches the work of finding all descendants, so
                  naturally it will be faster. Given that you are already building a
                  search index, I would go ahead and add this as well.

                  - Perrin
                • Perrin Harkins
                  ... The other suggestions offered here sound fine to me, but for a radically different take on indexing while keeping flexibility, see XML::Comma
                  Message 8 of 12 , Apr 11, 2003
                    On Fri, 2003-04-11 at 05:42, Clinton Gormley wrote:
                    > It is clearly possible to build a search which ploughs through all of
                    > the tables, but it would be much faster if I had one flat table for (eg)
                    > places to stay with all of the attributes in different columns. My
                    > original plan was to do just this - use a cron job to rebuild the flat
                    > tables when any objects are changed. But this negates the flexibility of
                    > the design ie instead of being able to add and remove attributes by
                    > altering a single list, I now also need to change the structure of the
                    > flat tables and the SQL which accesses them

                    The other suggestions offered here sound fine to me, but for a radically
                    different take on indexing while keeping flexibility, see XML::Comma
                    (http://xml-comma.org/).

                    - Perrin
                  • Clinton Gormley
                    ... This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them). I wanted to do
                    Message 9 of 12 , Apr 11, 2003
                      On Fri, 2003-04-11 at 14:28, Martin Moss wrote:
                      
                      Here's my forpence:-
                       
                      Why Have database Tables Defining which Attributes belong to a Class.
                      Why not Have the attribuites existing in a config section of your class.

                       
                       

                      This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them).

                      I wanted to do :
                      http://website.com/browse?object_id=123. I suppose I could have loaded the object type from the database, and then blessed that object into the relevant sub-class, but it's been easier to maintain everything through a single table.

                      Having said that, there has been the odd "if type==6" lines...

                      <OO tail firmly between legs>

                      Clint

                    • Martin Moss
                      Don t sweat it, in my humble opinion the point of perl is to make life easy for oneself. I guess though that in my experience it s allways better to do it
                      Message 10 of 12 , Apr 11, 2003
                        
                        Don't sweat it, in my humble opinion the point of perl is to make life easy for oneself. I guess though that in my experience it's allways better to do it right first time, coz in 4 months time something usually crops up which results in a 'oh $&$%&&&**'.  Like if somebody says right thats it, we're no longer going to support mysql, we're gonna use oracle instead:-) (always a favourite!)
                         
                        One point of note, if all objects are so simillar, then surely, you can put your perl hat on to write a script which 'creates' all the code modules you need, you just feed it a Dummy Module template and a conf file containing the class specific stuff.
                         
                        Anyway The OO approach would slow down the speed of things, and I guess it all depends on a balance between speed and efficiency against futureproofing and manageability.
                         
                        Break a leg mate!
                         
                        Marty
                         
                        ----- Original Message -----
                        Sent: Friday, April 11, 2003 3:11 PM
                        Subject: Re: OT: OO data design and search performance

                        On Fri, 2003-04-11 at 14:28, Martin Moss wrote:
                        
                        Here's my forpence:-
                         
                        Why Have database Tables Defining which Attributes belong to a Class.
                        Why not Have the attribuites existing in a config section of your class.

                         
                         

                        This was my original plan, but with very few exceptions, the objects are exactly the same (bar the list of attributes associated with them).

                        I wanted to do :
                        http://website.com/browse?object_id=123. I suppose I could have loaded the object type from the database, and then blessed that object into the relevant sub-class, but it's been easier to maintain everything through a single table.

                        Having said that, there has been the odd "if type==6" lines...

                        <OO tail firmly between legs>

                        Clint

                      • Carl Holm
                        I would second this approach. Nothing that you have described about your data modeling task would preclude the use of a fully normalized relational DB. Once
                        Message 11 of 12 , Apr 11, 2003
                          I would second this approach. Nothing that you have described about
                          your data modeling task would preclude the
                          use of a fully normalized relational DB. Once you have this built,
                          performance can be improved through the creation of
                          indexes (where applicable) and by optimizing the SQL.

                          Regards,

                          Carl Holm

                          Aaron Trevena wrote:

                          > Clinton Gormley wrote:
                          >
                          >> [ snip]
                          >> So with an advanced search, you may want to specify a place to stay
                          >> which is budget, has air-conditioning and a vegetarian restaurant.
                          >>
                          >> It is clearly possible to build a search which ploughs through all of
                          >> the tables, but it would be much faster if I had one flat table for
                          >> (eg) places to stay with all of the attributes in different columns.
                          >> My original plan was to do just this - use a cron job to rebuild the
                          >> flat tables when any objects are changed. But this negates the
                          >> flexibility of the design ie instead of being able to add and remove
                          >> attributes by altering a single list, I now also need to change the
                          >> structure of the flat tables and the SQL which accesses them
                          >
                          >
                          >
                          > You do not need to denormalise - Your objects should have a many to
                          > many relationship with attributes, like being vegetarian. This many to
                          > many relationship maps to a lookup table containing unique records of
                          > attribute id and object id.
                          >
                          > This has many advantages - extra attributes are easy to add, you have
                          > a normalised data structure and you have a reverse index that requires
                          > only a very very simple join (join attribute table to lookup table and
                          > object table by attribute and lookup id respectively). This join will
                          > make good use of indexes for these tables and be pretty responsive.
                          > the data (or rather the relationship) is only stored in one place and
                          > therefore is always up to date and you don't have to worry about data
                          > being incorrect or out of date.
                          >
                          > You very rarely need to flatten tables - good normalisation means you
                          > can have a decent design and still have queries that perform at a
                          > decent speed.
                          >
                          > regards,
                          >
                          > A.
                          >
                          >
                          >
                        Your message has been successfully submitted and would be delivered to recipients shortly.