Loading ...
Sorry, an error occurred while loading the content.

[Fwd: Re: [agileDatabases] ANN: The Skillset of an Agile DBA]

Expand Messages
  • Willem Bogaerts
    (re-send; original somehow got sent to the poster instead of the group) ... Just curious: what does that pseudo code look like? ... No. Not if you bear in mind
    Message 1 of 1 , Feb 25, 2006
    • 0 Attachment
      (re-send; original somehow got sent to the poster instead of the group)

      Curt Sampson wrote:
      >... And it's not uncommon for me to do queries
      > just like this in SQL, though when something starts to get as complex
      > as this, I tend to write it first in a relational pseudo-code and then
      > translate it to the (generally rather more complex) SQL form.

      Just curious: what does that pseudo code look like?

      > You could certainly write a program to implement this query, and that is
      > in fact just what the DBMS does when it prepares a query plan.

      No. Not if you bear in mind that I use lazy collections in an object
      model that holds all the data. Some heavy optimizing could be done on
      the client also. My optimizing may not be that heavy, but half of the
      data of that query might already be in memory and I certainly don't want
      to read it twice, in a structure that is virtually impossible to work
      with. If the data is not already in memory, it must be (in the right
      places of my object model, off course!) so the results of the query can
      be processed. I don't think you want just a list, you also want those
      people to be reminded by e-mail, for example.

      > However, I think that that is a task better to automate than to do by hand.

      Which is exactly what I do. My collections are responsible for handing
      out (including reading from whatever source holds the data, but this may
      be delegated to a special builder class) the objects for THEIR elements.
      I have built lots of applications that used data from different
      databases from different data sources (files, databases from different
      vendors, both file system databases and server databases) and all of
      this remains in the data access layer of my application. Now why on
      earth should I ask ALL my collections to work together to conjure up
      some SQL statement that is beyond understanding, so definitely beyond
      maintenance? Only to get them back together later to have them "eat" the
      results all in one go at the same time like some vultures? This is
      exactly my point: agile applications generally do not work this way.

      >> Almost ever database server program available supports stored
      >> procedures. These procedures could follow the basic commands of "my"
      >> API instead of having a *huge* overhead of parsing every command and
      >> then finding the resources available, and thinking of a strategy to
      >> conquer this dragon in a bit optimal way.
      >
      > Yes. Instead you move that overhead to your system. It really makes
      > little difference.

      On the contrary. There is nothing to parse with the API. Logically,
      because there was nothing to parse to begin with. The API just states
      what I want. This table. That Index. Number 5, please. The SQL has to be
      generated only for the database to parse (un-generate) it again to do
      exactly what I wanted: Open this table, with that index and get me
      number 5. These queries make up 99% of my applications. Queries like you
      have given as an example are sometimes used when I do some digging in
      logs, but not as part of an application. I realized soon enough that the
      more object-oriented I program, the simpler and more efficient my
      queries got.
      One of the drawbacks of SQL is that it supports all kind of difficult
      queries, but causes far too much household work and network traffic to
      do the basic lookups. Lots of times, the queries to ask for the data are
      larger than the results! This is all caused by the language to require
      me to formulate things in some human readable format, which must be
      undone at the server. Stop that. Let me just put the command with a
      token, the table and index names with strings, and the 4 byte integers
      in the same 4 bytes that exists both on the client and the server. I
      want the application to read data. Not translate SQL hence and forth.

      <snip>
      > Now let's see what the problem with your approach is:
      >
      >> I would ask the Payments collection for the "unpaid for 60 days"
      >> subset. The Payments collection then asks the database to get all
      >> records from the Payments table that apply to this rule using the
      >> PaymentStatus/PaymentDate index.
      >
      > Very nice. Except that your system didn't know that out of a million
      > customers, you have one hundred thousand who are unpaid for 60 days,
      > but only 175 who are outside of the United States. Any competently
      > written DBMS would know that, because it would have statistics for that
      > sort of thing, and would do that query first, saving a table scan of
      > the customer table, and thus an enormous amount of disk I/O. It's also
      > saving a massive amount of network I/O if you were proposing to transfer
      > every customer across the network.

      Please bear in mind that the application that wants to use the data is
      an object-oriented one. That means that the data classes have
      responsibility to get their data, with sufficient optimization. IF and
      HOW these optimizations are performed is their responsibility. You might
      even leave that configurable, for special cases like specific imports.
      This becomes extremely hard if you delegate your business logic to the
      database. In my opinion, the database is there for storage and retrieval
      of data. NOT for executing the business logic.

      But let's get back to the query. What do you do with the customers
      inside the US then? are they part of another query? This would not give
      me the option of putting this decision in some subclasses. It would be
      nicer if the unpaid bills were read in in one batch, and leaving the
      appropriate action to the object that must see it performed. So some of
      the data objects may be given to a NonUsFirstReminder class, others to a
      UsSecondReminder class or a NullReminderForTestCases class. Or would you
      have basically the same operations undergo all the heavy parsing for all
      cases? I hope not.

      >> With "my" API, the parsing is done by me, stored in the compiled code,
      >> and _not_ done at runtime for every query.
      >
      > Which means that you are stuck with whatever access methods the
      > programmer set, rather than changing access methods to best suit the
      > current data.

      The more basic the statements are, the more flexible you can use them.
      In fact, my lazy collections ALL have a notion of access method. So some
      of them are lazy, while others are greedy. For link tables, collections
      may be lazy, but generate greedy subcollections for one foreign key
      value. As said before, those collections are responsible for getting
      their data. That is not a void statement.

      >
      >> On the contrary. My API is going to elementary, simple, and will not
      >> have the vast and non-standard options found in SQL implementations of
      >> modern database servers, that force you to eat through the documentation
      >> for a week, resulting in a query that you cannot understand yourself
      >> anymore, but for some reason seems to do what you originally wanted.
      >> The API program can be broken into little steps or subprocedures with
      >> descriptive names, So it can be well documented in code and remain
      >> clear. Even if the business rules change and another search must be
      >> done. I cannot say that of SQL.
      >
      > I agree with most of the criticisms above, although I think you
      > overstate the case a bit. However, I think that you'd be better off
      > writing a library to generate SQL from a better relational language than
      > writing a library to emulate hierarchial access methods over top of a
      > relational database.
      >
      > cjs

      Well, the fact is that I have used the API for years already. It exists.
      It is called DAO, is bloodcurdlingly fast and permits the programmer all
      the flexibility in the world. It works on file system databases, off
      course, but even server databases eventually are nothing more than a
      file system database with a server in between.
      You want to add a field to a table? no problem. Within DAO, a Database
      object has TableDef objects that can be queried and modified (if you
      have the rights, off course). You can query the field definitions, or
      the foreign key relations, if you want. Or the indexes. The current
      transaction level. Or the locking. All at the logical place to look for.
      You can even see if it is a foxpro, access, or CSV table you are working
      with. But that usually does not interest me. Because I knew the type
      already in my application and so I know that CSV files have no indexes,
      for example.

      Best regards,
      Willem Bogaerts
    Your message has been successfully submitted and would be delivered to recipients shortly.