Loading ...
Sorry, an error occurred while loading the content.

Re: [firebird-support] MS SQL Vs. Firebird Again

Expand Messages
  • Alexandre Benson Smith
    At 05:15 01/10/2004 -0700, you wrote: ... I though that because FB indexes are compressed (RLE Encoded AFAIR) one could not traverse on the reverse way. Do we
    Message 1 of 14 , Feb 1, 2004
    • 0 Attachment
      At 05:15 01/10/2004 -0700, you wrote:
      ... big snip ...
      > Count, at worst, _should_ do an index space scan on the primary key
      > rather than a table space scan. Min and max should be able to operate
      > equally well against ascending and descending indeces. There typically
      > is only one or two more I/O (10 to 20 milliseconds) involved in locating
      > the last entry in a balanced index versus the first entry. These are
      > obvious optimizations, and when I have time I will dig into the optimizer
      > and identify the places to correct this, I will post the suggested change
      > back here. Note that "improve the optimizer" is on the todo list on the
      > Firebird sourceforge page.
      > /dj

      I though that because FB indexes are compressed (RLE Encoded AFAIR) one
      could not traverse on the reverse way.

      Do we still needs compressed indexes ?
      What we have:
      1.) More entries per index page (very good)
      2.) Index uses less space (disk space should be treated as high priority
      nowadays ?)
      3.) ????

      An option on create index statement to not compress could be a choice ?

      The index could be stored compressed on disk-data-page, but once read the
      cached-page could store the uncompressed version of the same page ? If so,
      once those pages are in memory could we use the index on both directions ?



      > In this case, may also be identifying a minor weakness in the
      > optimizer. Without changing the basic design premise, Count(*) should
      > run against the index with the smallest footprint that is guaranteed to
      > include pointers to all rows rather than the table space itself.

      I have asked once why FB does not use just the index when some select asks
      just for indexed columns like:

      select name from person where name like 'Ale%'

      if you have an index on column "name" the engine could just scan the index
      to find the data you want, and don't need to access each data-page.

      But because of MGA, FB HAVE to look on the data-page to get transaction #
      for each record to determine if this record is or isn't visible to such
      transaction. So I think count(*) have to read each record for the same
      reason, therefore a table scan is more cheap...



      I think that with the new C++ code we will get interesting things bringed
      by inheritance.

      I thought we could have diferent kinds of tables (memory table for
      example), indexes (compressed, uncompressed, b*tree, hash, clustered (a la
      MSSQL), etcs...).

      See you...


      Alexandre Benson Smith
      Development
      THOR Software e Comercial Ltda.
      Santo Andre - Sao Paulo - Brazil
      www.thorsoftware.com.br

      ----------


      ---
      Outgoing mail is certified Virus Free.
      Checked by AVG anti-virus system (http://www.grisoft.com).
      Version: 6.0.576 / Virus Database: 365 - Release Date: 30/01/2004


      [Non-text portions of this message have been removed]
    • Aage Johansen
      ... One reason for not being able to use a pure index scan is that Fb has to look at the records itself to determine whether your transaction really can
      Message 2 of 14 , Feb 1, 2004
      • 0 Attachment
        On Sun, 1 Feb 2004 21:07:59 +0000 (UTC), David Johnson wrote:

        >> ...
        >
        > Hmmm ... so Firebird does not support index-only scans. That is
        > interesting and important to know. Index only scans can result in orders
        > of improvement in performance in some queries. It does not impact my
        > immediate project, but I have seen the addition of an index for the sole
        > purpose of acquiring index only scans on other DBMS's improve performance
        > of a complex operation from 30 minutes (after prepare) to 10 seconds
        > (after prepare). The actual prepare took 10 minutes in both with and
        > without the additional index, but using precompiled and static SQL it
        > never impacted my users. It looks like I need to write a couple more
        > tests. :o)


        One reason for not being able to use a 'pure index scan' is that Fb has to
        look at the records itself to determine whether your transaction really can
        'see' this record.
        AFAIU, this is also the reason why a count(*) has to look at all the
        records - i.e. doing a table scan.


        --
        Aage J.
      • Alan McDonald
        ... So this is a small but valid argument against using GUID/UUIDs for the primary key? Alan
        Message 3 of 14 , Feb 1, 2004
        • 0 Attachment
          > Using 8k page size and 50 character more or less keys+pointers
          > entries, you get roughly 160 index entries per page. The first
          > two layers of the B+ tree can address 160^2 records if the
          > indeces are full. A third layer addresses 4,096,000 records. A
          > max operation against an ascending index on a 4,000,000 row table
          > should only need to read three pages maximum. This is consistent
          > with the results I achieved in the random read tests. Note that
          > my primary key is quite large - using an integer key would result
          > in even greater efficiency in indexing.

          So this is a small but valid argument against using GUID/UUIDs for the
          primary key?
          Alan
        • David Johnson
          It is a valid point, but you need to do the math for the specific problem being resolved to decide how valid. With small index column sizes you need much more
          Message 4 of 14 , Feb 1, 2004
          • 0 Attachment
            It is a valid point, but you need to do the math for the specific problem being resolved to decide how valid. With small index column sizes you need much more detail than I have at my fingertips to compute a real value, but let's say that you go to a 4 byte integer key. You lose 34 bytes from the index entry footprint, giving 50-34=16 bytes per entry plus some overheads. On an 8k page this gives you roughly 500 entries. At two levels in the B+ tree you address 250,000 rows. At three levels you can address 125,000,000 rows.

            If you table space is expected to grow to 4,000,000 rows then you still have to expand the index to three levels, but you can go a lot further on three levels than you can with a larger key.

            GUID's are ideal for distributed systems, but are not the most efficient way of handling centralized systems.

            It's a matter of the favorite hammer. Sometimes you need a paintbrush, and a hammer just won't do. You can't let your (or my) favorite solution be the only solution you know because it is not appropriate for every situation.
            ----- Original Message -----
            From: Alan McDonald
            To: firebird-support@yahoogroups.com
            Sent: Sunday, February 01, 2004 2:32 PM
            Subject: RE: [firebird-support] MS SQL Vs. Firebird Again


            > Using 8k page size and 50 character more or less keys+pointers
            > entries, you get roughly 160 index entries per page. The first
            > two layers of the B+ tree can address 160^2 records if the
            > indeces are full. A third layer addresses 4,096,000 records. A
            > max operation against an ascending index on a 4,000,000 row table
            > should only need to read three pages maximum. This is consistent
            > with the results I achieved in the random read tests. Note that
            > my primary key is quite large - using an integer key would result
            > in even greater efficiency in indexing.

            So this is a small but valid argument against using GUID/UUIDs for the
            primary key?
            Alan



            Yahoo! Groups Sponsor
            ADVERTISEMENT





            ------------------------------------------------------------------------------
            Yahoo! Groups Links

            a.. To visit your group on the web, go to:
            http://groups.yahoo.com/group/firebird-support/

            b.. To unsubscribe from this group, send an email to:
            firebird-support-unsubscribe@yahoogroups.com

            c.. Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.



            [Non-text portions of this message have been removed]
          • Ann W. Harrison
            ... Multi-generational concurrency, as implemented in Firebird, prevents counts from using an index rather than the table space. Two simultaneous transactions
            Message 5 of 14 , Feb 2, 2004
            • 0 Attachment
              >David Johnson wrote:
              > > Count, at worst, _should_ do an index space scan on the primary key
              > > rather than a table space scan.

              Multi-generational concurrency, as implemented in Firebird,
              prevents counts from using an index rather than the table
              space. Two simultaneous transactions can have different
              views of the same database and different counts of various
              sections of tables. The consistency of a transaction is
              maintained with transaction id's which are not available
              in the index. The only way to verify that an index entry
              indicates a record that is committed and appropriate to a
              particular transaction is to read the associated record.

              "Well," you say, "Why not put the transaction id in the
              index?" Two reasons - it increases the size of index entries
              considerably, makes all accesses slower, and triples the
              number of page writes to update a record.


              Suppose you've got some record - call it Sam - which
              has a key value that has been changed four times by
              transactions 3, 5, 7, and 9. 3 stored the record with a
              key value of 'a'. 5 changed the value to 'b'. 7 changed
              it back to 'a'. 9 changed it to 'f'. All transactions
              committed sequentially.

              Under the current index management scheme, there would be
              three index entries pointing to Sam - 'a', 'b', and 'f'.
              Note that the two instances of 'a' are represented by a
              single index entry. Adding transaction ids to the record
              versions would require a second 'a' entry.

              That may sound trivial, but most key fields are stable
              so most records have only one entry in each index. Having
              to update each index each time a record is modified would
              significantly increase the I/O load.

              Suppose you're transaction 8 and you're looking for 'a's.
              You'd find two, both pointing to Sam. Ok, fine, so you
              add a filter to your count that eliminates duplicate references
              to the same record.

              Suppose you're transaction 6 and you're looking for 'a's.
              You find the one created by transaction 3. There are no
              duplicates, so you keep it. The fact that the actual value
              that your transaction would see is 'b' never enters the
              picture.

              OK, so keep both the id of the transaction that created the
              version and the id of the transaction that obsoleted it.
              That's bigger, sure, and will require even more I/O as two
              entries are modified for every index at ever modification
              of a record, and entries must be made when a record is
              deleted.

              If counts are a semantic part of your application, then keep
              them as logical parts of the schema rather than relying on
              physical artifacts. Counts can be maintained deadlock free -
              there are explanations in the mers archive on ibphoenix.com

              > > Min and max should be able to operate
              > > equally well against ascending and descending indeces. There typically
              > > is only one or two more I/O (10 to 20 milliseconds) involved in locating
              > > the last entry in a balanced index versus the first entry. These are
              > > obvious optimizations, and when I have time I will dig into the optimizer
              > > and identify the places to correct this, I will post the suggested change
              > > back here.

              Be slightly careful of your assumptions here. The index structure
              is dynamic and can not be locked. Walking down the left hand side
              of the tree is reasonably easy because there's an "absolute 0" node
              at the beginning of each level. Walking down the right hand side
              is more difficult because between the time you pick up the right
              hand pointer to the next level down and the time that you get that
              page, the page may have split. In a split the old page is to the
              left of the new page (part of the reason walking the left is easier).

              Suppose you get down to the leaf level, find a target max value,
              read the data and discover that the record is not appropriate for
              your transaction - too old, too new, or rolled back. In getting
              the record, you've lost your lock on the leaf page and have to
              start again at the top.

              The ODS was designed to be deadlock free. Deadlock free structures
              can not loop in the pointer graph. Although indexes now contain
              left as well as right pointers, the left pointers are not guaranteed
              to work. The second rule of being deadlock free is to hold no more
              that one page lock at any time. Perhaps you see why this is hard?

              At 02:29 PM 2/1/2004, Alexandre Benson Smith wrote:

              >I though that because FB indexes are compressed (RLE Encoded AFAIR) one
              >could not traverse on the reverse way.

              It's not RLE, it's prefix and suffix compression. Trailing spaces
              in text keys and trailing zeros are compressed in numeric (double
              precision) keys. The prefix compression eliminates all leading
              bytes that duplicate the preceding key value.

              Prefix compression is not performed on the first node on each
              page. You can't walk a single page backward, but you can - or
              could absent concurrency issues - walk the leaf level backward.

              >Do we still needs compressed indexes ?
              >What we have:
              >1.) More entries per index page (very good)
              >2.) Index uses less space (disk space should be treated as high priority
              >nowadays ?)

              It's not space, its the I/O load. More entries per page -> fewer
              reads to get an index range.

              >An option on create index statement to not compress could be a choice ?

              No - compression is not the reason why you can't walk the right side
              of an index. Concurrent activity is the reason.

              >The index could be stored compressed on disk-data-page, but once read the
              >cached-page could store the uncompressed version of the same page ? If so,
              >once those pages are in memory could we use the index on both directions ?

              As above. That's not the problem

              > David Johnson again.
              > > In this case, may also be identifying a minor weakness in the
              > > optimizer. Without changing the basic design premise, Count(*) should
              > > run against the index with the smallest footprint that is guaranteed to
              > > include pointers to all rows rather than the table space itself.

              As above, not true.

              Alexandre Benson Smith wrote:

              >I have asked once why FB does not use just the index when some select asks
              >just for indexed columns like:
              >
              >select name from person where name like 'Ale%'

              In that case, it will use an index to locate all candidate records,
              but must then read each of the candidates to insure that it does
              actually fall into the set of records available to the transaction.

              >I thought we could have diferent kinds of tables (memory table for
              >example), indexes (compressed, uncompressed, b*tree, hash, clustered (a la
              >MSSQL), etcs...).

              One of the major design goals for Firebird was simplicity and part
              of simplicity is finding the best way to work with a limited set of
              tuning parameters. In your list above, the only thing I agree with
              is memory tables, and even then, I'm not at all sure they're necessary.
              I see no benefit to uncompressed indexes. Bitmap index access greatly
              reduces the need for clustered indexes without introducing the need
              for the database designer to see into the future when defining his
              table structures. And bitmap index access works for all indexes, not
              just the primary key. Hash indexes are fine in memory, but an untuned
              hash index is worse than no index at all.

              Regards,


              Ann
            • Alexandre Benson Smith
              At 12:39 02/02/2004 -0500, you wrote: ... Ann, one mor time, thank you for your explanations... I really like to read your comments.. ... Ok... So item 1 (More
              Message 6 of 14 , Feb 2, 2004
              • 0 Attachment
                At 12:39 02/02/2004 -0500, you wrote:

                ... snip ...

                > >I though that because FB indexes are compressed (RLE Encoded AFAIR) one
                > >could not traverse on the reverse way.
                >
                >It's not RLE, it's prefix and suffix compression. Trailing spaces
                >in text keys and trailing zeros are compressed in numeric (double
                >precision) keys. The prefix compression eliminates all leading
                >bytes that duplicate the preceding key value.
                >
                >Prefix compression is not performed on the first node on each
                >page. You can't walk a single page backward, but you can - or
                >could absent concurrency issues - walk the leaf level backward.


                Ann, one mor time, thank you for your explanations... I really like to read
                your comments..

                > >Do we still needs compressed indexes ?
                > >What we have:
                > >1.) More entries per index page (very good)
                > >2.) Index uses less space (disk space should be treated as high priority
                > >nowadays ?)
                >
                >It's not space, its the I/O load. More entries per page -> fewer
                >reads to get an index range.

                Ok... So item 1 (More entries per index page (very good))... is the real
                reason... Crystal clear....

                > >An option on create index statement to not compress could be a choice ?
                >
                >No - compression is not the reason why you can't walk the right side
                >of an index. Concurrent activity is the reason.

                ok well explained above....

                > >The index could be stored compressed on disk-data-page, but once read the
                > >cached-page could store the uncompressed version of the same page ? If so,
                > >once those pages are in memory could we use the index on both directions ?
                >
                >As above. That's not the problem
                >
                > > David Johnson again.
                > > > In this case, may also be identifying a minor weakness in the
                > > > optimizer. Without changing the basic design premise, Count(*) should
                > > > run against the index with the smallest footprint that is guaranteed to
                > > > include pointers to all rows rather than the table space itself.
                >
                >As above, not true.
                >
                >Alexandre Benson Smith wrote:
                >
                > >I have asked once why FB does not use just the index when some select asks
                > >just for indexed columns like:
                > >
                > >select name from person where name like 'Ale%'
                >
                >In that case, it will use an index to locate all candidate records,
                >but must then read each of the candidates to insure that it does
                >actually fall into the set of records available to the transaction.

                Yeah... you teached me this some months ago... In my original post I told
                to David that I asked the same thing... and why it cannot be
                accomplished... Understood...


                > >I thought we could have diferent kinds of tables (memory table for
                > >example), indexes (compressed, uncompressed, b*tree, hash, clustered (a la
                > >MSSQL), etcs...)
                >
                >One of the major design goals for Firebird was simplicity and part
                >of simplicity is finding the best way to work with a limited set of
                >tuning parameters. In your list above, the only thing I agree with
                >is memory tables, and even then, I'm not at all sure they're necessary.
                >I see no benefit to uncompressed indexes. Bitmap index access greatly
                >reduces the need for clustered indexes without introducing the need
                >for the database designer to see into the future when defining his
                >table structures. And bitmap index access works for all indexes, not
                >just the primary key. Hash indexes are fine in memory, but an untuned
                >hash index is worse than no index at all.



                I love Firebird simplicity... and HATE Oracle monstruosity (a manual with
                more than 1000 pages named "getting started" makes me laught).

                What I intended to say was:

                With FB 2.0 we "could" (not will) get different kinds of indexes and
                tables, I think could be cases when a Clustered Index could be better, and
                will be easier to fellow FB developers to implement diferent approachs to
                search, storage, etc... could be a super classes to abstract the
                implementation details. I don't know if implement another kind of index
                will be better for FB performance (besides other RDBMS does different kinds
                of index, FB uses MGA the others don't... so the best for Oracle
                architeture sould be the worst for FB)... Only her mother knows... :-).

                I think that if different kinds of indexes could bring better performance
                in some kind of tables, will be good if FB implements those, keeps b*tree
                by default if it's the best for overall uses, but have a choice is better
                than don't one...

                The best part of this discussion (or better your discurse)... is that I can
                learn a lot... :)


                >Regards,
                >
                >
                >Ann

                See you, I thank you again to tell us how "your son" works...


                Alexandre Benson Smith
                Development
                THOR Software e Comercial Ltda.
                Santo Andre - Sao Paulo - Brazil
                www.thorsoftware.com.br

                ----------


                ---
                Outgoing mail is certified Virus Free.
                Checked by AVG anti-virus system (http://www.grisoft.com).
                Version: 6.0.576 / Virus Database: 365 - Release Date: 30/01/2004


                [Non-text portions of this message have been removed]
              • David Johnson
                Thank You Ann! The only way to get better info than this is to go through the code, and your explanation is much faster to read and more to the point. For my
                Message 7 of 14 , Feb 2, 2004
                • 0 Attachment
                  Thank You Ann!

                  The only way to get better info than this is to go through the code, and your explanation is much faster to read and more to the point.

                  For my purposes, it is definitely not worth the amount of effort that it sounds like it would be to work out the optimizations I was considering. I will accept this as a design tradeoff in the DBMS'. These do not impact my particular application - only my assumptions/knowledge of how Firebird internals work has changed so far.

                  I was presuming on a one-way index system, such as I am most familiar with in DB2 - i.e. there would be no "right hand pointers".to work with.

                  In the light of your clarification of how active transactions are handled, the presence or absence of a buffer pool would impact the hypothesized optimization of the max/min operators using my approach, making it either straightforward or impossible.

                  Does Firebird support a concept analogous to the buffer pool in DB2? In DB2, as long as a page is in the buffer pool there is no physical I/O against transactions involving that page, except possibly the initial read to the buffer pool. The commit to physical storage (as opposed to the transaction commit) generally occurs in a separate thread from the application transaction processing, freeing the application from the I/O performance constraints most of the time.


                  [Non-text portions of this message have been removed]
                • David Johnson
                  ... From: Helen Borrie To: firebird-support@yahoogroups.com Sent: Saturday, January 31, 2004 6:44 PM Subject: Re: [firebird-support] MS SQL Vs. Firebird Again
                  Message 8 of 14 , Oct 1, 2004
                  • 0 Attachment
                    ----- Original Message -----
                    From: Helen Borrie
                    To: firebird-support@yahoogroups.com
                    Sent: Saturday, January 31, 2004 6:44 PM
                    Subject: Re: [firebird-support] MS SQL Vs. Firebird Again


                    At 07:01 PM 30/09/2004 -0700, you wrote:
                    >I am starting to be of the opinion that the best option is to just use the
                    >GDS32.dll directly. The components are cute, and work well for
                    >prototyping, but they appear to be lacking adequate stability for serious
                    >enterprise scale uses, IMHO.

                    Sounds a fairly uninformed statement...

                    dj
                    Spoken by someone who has used and sworn by (and occasionally at) Borland Pascal Products for almost 20 years, and has been up and down both the published and portions of the unpublished source code in the VCL library. Delphi is still my favorite hammer, but it is not my only one. When you spend more time coding around the bugs than using the tool, it's time to change tools.

                    What I am trying to accomplish right now seems to be exposing all of the bugs and design weaknesses in Borland's database connectivity approach. Much of my own current frustration comes from trying to use capabilities that I know Interbase/Firebird and other RDBMS have, but that the VCL design principles preclude.

                    I am still investigating options, but i am becoming disenchanted with the VCL architecture for some purposes including Interbase/Firebird connectivity.
                    /dj

                    >Also bear in mind that the timestamp conversion function in gds32.dll
                    >drops the milliseconds, so you may want to write your own replacement for
                    >that.

                    That is not necessary. An external function GetExactTimestamp() is
                    available in the fbudf library which is deployed with Firebird.
                    dj
                    I will check this out - but I think I may have been unclear in my statement. Check below.
                    /dj


                    >It's not a big deal, but it'll save you a lot of hunting when you know in
                    >advance that the fractional seconds are truncated at the GDS32 layer from
                    >all times that are stored.

                    Actually, they are truncated on the server - a kind of "lowest common
                    denominator" to genericise the return value to support OS's that can't
                    return subseconds. You can store subseconds down to ten-thousandths of a
                    second and, if you do, the server has no problem processing them and the
                    client has no problem returning them to the application. It seems to me
                    you are deriving a lot of your "wisdom" from using bad client tools...
                    dj
                    I would hesitate to use wisdom to describe my opinions, but I would agree that my findings to date are that many client side tools are not very good at all. I am quite new to Firebird use, but I am not a novice with RDBMS in general.

                    The database data structure clearly supports fractional seconds. But you must not use the isc_encode_xxx and isc_decode_xxx functions in the gds for times (the method used by the Borland connectivity options) if you want to preserve that precision in your own code. Notice that the fractional seconds are explicitly not translated. I was not certain whether it was by design or oversight. You are suggesting it was by design.

                    Here are the code snippets from gds.cpp (Firebird 1.5 RC8 source):

                    void API_ROUTINE isc_decode_timestamp(GDS_TIMESTAMP * date, void *times_arg)
                    {
                    /**************************************
                    *
                    * i s c _ d e c o d e _ t i m e s t a m p
                    *
                    **************************************
                    *
                    * Functional description
                    * Convert from internal timestamp format to UNIX time structure.
                    *
                    * Note: the date arguement is really an ISC_TIMESTAMP -- however the
                    * definition of ISC_TIMESTAMP is not available from all the source
                    * modules that need to use isc_encode_timestamp
                    *
                    **************************************/
                    SLONG minutes;
                    struct tm *times;

                    times = (struct tm *) times_arg;
                    memset(times, 0, sizeof(*times));

                    ndate(date->timestamp_date, times);
                    times->tm_yday = yday(times);
                    if ((times->tm_wday = (date->timestamp_date + 3) % 7) < 0)
                    times->tm_wday += 7;

                    minutes = date->timestamp_time / (ISC_TIME_SECONDS_PRECISION * 60);
                    times->tm_hour = minutes / 60;
                    times->tm_min = minutes % 60;
                    times->tm_sec = (date->timestamp_time / ISC_TIME_SECONDS_PRECISION) % 60;
                    }

                    void API_ROUTINE isc_encode_timestamp(void *times_arg, GDS_TIMESTAMP * date)
                    {
                    /**************************************
                    *
                    * i s c _ e n c o d e _ t i m e s t a m p
                    *
                    **************************************
                    *
                    * Functional description
                    * Convert from UNIX time structure to internal timestamp format.
                    *
                    * Note: the date arguement is really an ISC_TIMESTAMP -- however the
                    * definition of ISC_TIMESTAMP is not available from all the source
                    * modules that need to use isc_encode_timestamp
                    *
                    **************************************/
                    struct tm *times;

                    times = (struct tm *) times_arg;

                    date->timestamp_date = nday(times);
                    date->timestamp_time =
                    ((times->tm_hour * 60 + times->tm_min) * 60 +
                    times->tm_sec) * ISC_TIME_SECONDS_PRECISION;
                    }

                    /dj

                    > worst case). Firebird appears to keep its indexes
                    > well balanced because there was no observable
                    > performance difference between that experienced
                    > following an insert of 4,000,000 rows and that
                    > experienced after dropping and rebuilding the indexes.

                    It's not always the case. It depends on the index.

                    dj
                    I'll have to do more tests to identify where it does not balance indeces.
                    /dj


                    > Caution: Beware of the DBXpress and IBXpress
                    > components used by some Delphi and BCB apps. My
                    > testing has exposed memory leaks and other serious
                    > issues with the middle layers,

                    Agreed, both of these layers have been very poorly architected.

                    >and the Delphi VCL
                    > architecture imposes minimum 3% maximum 30% overheads
                    > on top of the database performance.

                    That's misleading. The VCL architecture + the Borland implementations for
                    generic (vendor-independent) data access cause such problems. The
                    Firebird/IB world is richly provided with vendor-specific connectivity
                    layers between Delphi/Kylix/BCPP that -- to varying degrees -- avoid the
                    overhead imposed by the generic VCL data access components. Anyone
                    considering using the Borland client development environments owes it to
                    him/herself to evaluate the best of breed from this comprehensive collection.

                    dj
                    The source for the 30% was the time spent in moving date first into variants in TParam objects, and then from TParam objects into the connectivity tool's internal parameters analog, before moving the internal params analog into the GDS buffers during an insert operation. The time moving data from the internal params analog into the gds buffers was considered part of the normal cost of any system, equivalent to moving the original data into the buffers, rather than VCL overheads and is not part of that measure.

                    The overheads are essentially linear with the number of parameters in the query.

                    Anything using the VCL approach will be limited by the design constraints of the VCL. It provides a sound basis for desktop apps, but the database connectivity architecture does not support reentrancy like a solution architected for server must, and it is not designed for speed.

                    Since the parameters and the result set are both a part of the query object, only one query instance may be used at a time. A server environment requires that the query code be reentrant so multiple users can address the same (prepared) query concurrently. This is the "stateful versus stateless" arguement of OO programming rehashed, and is really independent of the question of Firebird as a paltfom of choice.

                    Interbase, on the other hand, is designed for both speed and reentrancy.
                    /dj

                    > The MAX, MIN, and Count operators are not as well
                    > optimized as they could be either. MAX and MIN can at
                    > least be optimized somewhat be ascending and
                    > descending indeces where they are required. Count on
                    > an entire table will always do a table space scan.

                    That is true; and it behoves the developer to learn the language and
                    discard a lot of the itchy old tricks that s/he used to have to resort to
                    in MSSQL. RDBMS's are not meant to be dependent on storage row order so
                    systems that don't support important relational language features like
                    existential predicates have to store row-counts and compromise multi-userness.

                    dj
                    Like DB2?? :o)

                    Sorry ... the multi-userness of DB2 on 390 architecture is pretty well documented. My daily work has me dealing with a real-time system that services 8,000 concurrent users on multiple platforms, including Delphi client server, CICS COBOL, J2EE, AS/400, AIX, and others. As a design trade off I can agree that the Interbase and DB2 teams made different decisions, and neither decision is clearly superior to the other in all ways, but "multiuserness" is an excuse rather than a reason. The row counts in DB2, rather than compromising multiuserness, are a key part of their multi-user optimizer strategy. I think that the "compromise multi-userness" part of your statement may need to be struck.

                    I do agree that the exists predicate against a table or index is an oversight that IBM engineers should rectify.

                    I also agree, avoid platform specific tricks whenever possible.

                    Count, at worst, _should_ do an index space scan on the primary key rather than a table space scan. Min and max should be able to operate equally well against ascending and descending indeces. There typically is only one or two more I/O (10 to 20 milliseconds) involved in locating the last entry in a balanced index versus the first entry. These are obvious optimizations, and when I have time I will dig into the optimizer and identify the places to correct this, I will post the suggested change back here. Note that "improve the optimizer" is on the todo list on the Firebird sourceforge page.
                    /dj

                    It's a source of comedy to look at so-called "benchmark tests" involving
                    MSSQL and other DBMS's that poorly implement standards, again Firebird and
                    others that implement standards better. There is always a "SELECT
                    COUNT(*)" test in there that the lemmings take to be an indication of
                    comparative performance. It says nothing about performance, of course, but
                    it is a self-documenting commentary on the competence of the test designers.

                    dj
                    What test designers are looking at in the select count(*) is an indication of the speed of internal lookups of something expected to be in memory already. Essentially, all they expect to see, and what they are trying to isolate, is overheads in the communication subsystem. The actual select is presumed to be so trivial that it would not be measurable.

                    The reviewer of those benchmarks needs to be aware of what is actually being measured and ask questions to become more informed of what the results actually mean. In the case of Firebird/Interbase, that figure is not kept in memory so it must be computed on demand. Instead of the expected turnaround on the order of nanoseconds, there is a significant delay(10ms per page) while the DBMS performs a table space scan to count records. In these platforms, the test is measuring the speed of a forced table space scan.

                    Instead of measuring only what it was intended to, it is also serving to identify where the implementers of the different DBMS's made very different design decisions. Identifying those differences in technology is often more important than the raw performance figures.

                    In this case, may also be identifying a minor weakness in the optimizer. Without changing the basic design premise, Count(*) should run against the index with the smallest footprint that is guaranteed to include pointers to all rows rather than the table space itself.

                    Another misunderstood benchmark is the random read. Given a sufficiently large table, and in conjunction with other tests, it isolates the indexing subsystem. It is not an indicator of "real life" application, because in real life your selections tend to be clustered together physically. But it is one fairly reliable way of testing the quality of the indexing on a table after various operations since it renders most of the lookahead technology useless. The test has other purposes too, when combined with other tests.

                    Benchmark tests rarely make sense in broad terms because they are designed to isolate subsystems within a larger system. In real applications you are rarely concerned with a single sub-system. However, when identifying or anticipating bottlenecks and problems, they are useful because they allow you to pinpoint issues based on comparative performance, and to make informed decisions based on your current needs. The key is to become familiar with what the tests are _actually_ measuring rather than to naively presume upon what they appear to be measuring on the surface.
                    /dj



                    [Non-text portions of this message have been removed]
                  • David Johnson
                    Good explanation Alexander ... like I said, I still need to get into the code. But this technical detail explains why it may be a bit more difficult than I
                    Message 9 of 14 , Oct 1, 2004
                    • 0 Attachment
                      Good explanation Alexander ... like I said, I still need to get into the code. But this technical detail explains why it may be a bit more difficult than I expected to resolve the minor (almost non) issue of count performance.

                      > I though that because FB indexes are compressed (RLE Encoded AFAIR) one
                      > could not traverse on the reverse way.

                      You never traverse indeces in reverse, even on systems where the indeces are not compressed. In a B+ tree, to find the last index you load the root page, traverse to the last entry, if it is a leaf node your return the record, if it is a branch node than you repeat the process on the indicated page.

                      Using 8k page size and 50 character more or less keys+pointers entries, you get roughly 160 index entries per page. The first two layers of the B+ tree can address 160^2 records if the indeces are full. A third layer addresses 4,096,000 records. A max operation against an ascending index on a 4,000,000 row table should only need to read three pages maximum. This is consistent with the results I achieved in the random read tests. Note that my primary key is quite large - using an integer key would result in even greater efficiency in indexing.

                      > if you have an index on column "name" the engine could just scan the index
                      > to find the data you want, and don't need to access each data-page.

                      Hmmm ... so Firebird does not support index-only scans. That is interesting and important to know. Index only scans can result in orders of improvement in performance in some queries. It does not impact my immediate project, but I have seen the addition of an index for the sole purpose of acquiring index only scans on other DBMS's improve performance of a complex operation from 30 minutes (after prepare) to 10 seconds (after prepare). The actual prepare took 10 minutes in both with and without the additional index, but using precompiled and static SQL it never impacted my users. It looks like I need to write a couple more tests. :o)

                      I also wonder how much compression is achieved on indeces. Larger and more complex indeces would benefit most from RLE encoding, but simpler indeces may not benefit at all. The benefit of more compact indeces translates to less I/O, at the expense of CPU cycles. I/O is still much more expensive (time wise) than CPU, so there may yet be performance benefit to compression.

                      I am looking forward to Firebird 2 also - from the chatter I have seen, it looks like it will implement many good featuires.



                      [Non-text portions of this message have been removed]
                    Your message has been successfully submitted and would be delivered to recipients shortly.