[Fwd: Re: [agileDatabases] ANN: The Skillset of an Agile DBA]
- (re-send; original somehow got sent to the poster instead of the group)
Curt Sampson wrote:
>... And it's not uncommon for me to do queriesJust curious: what does that pseudo code look like?
> just like this in SQL, though when something starts to get as complex
> as this, I tend to write it first in a relational pseudo-code and then
> translate it to the (generally rather more complex) SQL form.
> You could certainly write a program to implement this query, and that isNo. Not if you bear in mind that I use lazy collections in an object
> in fact just what the DBMS does when it prepares a query plan.
model that holds all the data. Some heavy optimizing could be done on
the client also. My optimizing may not be that heavy, but half of the
data of that query might already be in memory and I certainly don't want
to read it twice, in a structure that is virtually impossible to work
with. If the data is not already in memory, it must be (in the right
places of my object model, off course!) so the results of the query can
be processed. I don't think you want just a list, you also want those
people to be reminded by e-mail, for example.
> However, I think that that is a task better to automate than to do by hand.Which is exactly what I do. My collections are responsible for handing
out (including reading from whatever source holds the data, but this may
be delegated to a special builder class) the objects for THEIR elements.
I have built lots of applications that used data from different
databases from different data sources (files, databases from different
vendors, both file system databases and server databases) and all of
this remains in the data access layer of my application. Now why on
earth should I ask ALL my collections to work together to conjure up
some SQL statement that is beyond understanding, so definitely beyond
maintenance? Only to get them back together later to have them "eat" the
results all in one go at the same time like some vultures? This is
exactly my point: agile applications generally do not work this way.
>> Almost ever database server program available supports storedOn the contrary. There is nothing to parse with the API. Logically,
>> procedures. These procedures could follow the basic commands of "my"
>> API instead of having a *huge* overhead of parsing every command and
>> then finding the resources available, and thinking of a strategy to
>> conquer this dragon in a bit optimal way.
> Yes. Instead you move that overhead to your system. It really makes
> little difference.
because there was nothing to parse to begin with. The API just states
what I want. This table. That Index. Number 5, please. The SQL has to be
generated only for the database to parse (un-generate) it again to do
exactly what I wanted: Open this table, with that index and get me
number 5. These queries make up 99% of my applications. Queries like you
have given as an example are sometimes used when I do some digging in
logs, but not as part of an application. I realized soon enough that the
more object-oriented I program, the simpler and more efficient my
One of the drawbacks of SQL is that it supports all kind of difficult
queries, but causes far too much household work and network traffic to
do the basic lookups. Lots of times, the queries to ask for the data are
larger than the results! This is all caused by the language to require
me to formulate things in some human readable format, which must be
undone at the server. Stop that. Let me just put the command with a
token, the table and index names with strings, and the 4 byte integers
in the same 4 bytes that exists both on the client and the server. I
want the application to read data. Not translate SQL hence and forth.
> Now let's see what the problem with your approach is:Please bear in mind that the application that wants to use the data is
>> I would ask the Payments collection for the "unpaid for 60 days"
>> subset. The Payments collection then asks the database to get all
>> records from the Payments table that apply to this rule using the
>> PaymentStatus/PaymentDate index.
> Very nice. Except that your system didn't know that out of a million
> customers, you have one hundred thousand who are unpaid for 60 days,
> but only 175 who are outside of the United States. Any competently
> written DBMS would know that, because it would have statistics for that
> sort of thing, and would do that query first, saving a table scan of
> the customer table, and thus an enormous amount of disk I/O. It's also
> saving a massive amount of network I/O if you were proposing to transfer
> every customer across the network.
an object-oriented one. That means that the data classes have
responsibility to get their data, with sufficient optimization. IF and
HOW these optimizations are performed is their responsibility. You might
even leave that configurable, for special cases like specific imports.
This becomes extremely hard if you delegate your business logic to the
database. In my opinion, the database is there for storage and retrieval
of data. NOT for executing the business logic.
But let's get back to the query. What do you do with the customers
inside the US then? are they part of another query? This would not give
me the option of putting this decision in some subclasses. It would be
nicer if the unpaid bills were read in in one batch, and leaving the
appropriate action to the object that must see it performed. So some of
the data objects may be given to a NonUsFirstReminder class, others to a
UsSecondReminder class or a NullReminderForTestCases class. Or would you
have basically the same operations undergo all the heavy parsing for all
cases? I hope not.
>> With "my" API, the parsing is done by me, stored in the compiled code,The more basic the statements are, the more flexible you can use them.
>> and _not_ done at runtime for every query.
> Which means that you are stuck with whatever access methods the
> programmer set, rather than changing access methods to best suit the
> current data.
In fact, my lazy collections ALL have a notion of access method. So some
of them are lazy, while others are greedy. For link tables, collections
may be lazy, but generate greedy subcollections for one foreign key
value. As said before, those collections are responsible for getting
their data. That is not a void statement.
>Well, the fact is that I have used the API for years already. It exists.
>> On the contrary. My API is going to elementary, simple, and will not
>> have the vast and non-standard options found in SQL implementations of
>> modern database servers, that force you to eat through the documentation
>> for a week, resulting in a query that you cannot understand yourself
>> anymore, but for some reason seems to do what you originally wanted.
>> The API program can be broken into little steps or subprocedures with
>> descriptive names, So it can be well documented in code and remain
>> clear. Even if the business rules change and another search must be
>> done. I cannot say that of SQL.
> I agree with most of the criticisms above, although I think you
> overstate the case a bit. However, I think that you'd be better off
> writing a library to generate SQL from a better relational language than
> writing a library to emulate hierarchial access methods over top of a
> relational database.
It is called DAO, is bloodcurdlingly fast and permits the programmer all
the flexibility in the world. It works on file system databases, off
course, but even server databases eventually are nothing more than a
file system database with a server in between.
You want to add a field to a table? no problem. Within DAO, a Database
object has TableDef objects that can be queried and modified (if you
have the rights, off course). You can query the field definitions, or
the foreign key relations, if you want. Or the indexes. The current
transaction level. Or the locking. All at the logical place to look for.
You can even see if it is a foxpro, access, or CSV table you are working
with. But that usually does not interest me. Because I knew the type
already in my application and so I know that CSV files have no indexes,