Loading ...
Sorry, an error occurred while loading the content.
 

Re: [agileDatabases] ANN: The Skillset of an Agile DBA

Expand Messages
  • Dawn Wolthuis
    ... Hi Scott. I read and enjoyed your Agile Database Techniques book a while back and the same thing that came to my mind at that time popped into my head
    Message 1 of 18 , Feb 15, 2006
      On 2/15/06, Scott W. Ambler <swa@...> wrote:
      > I just posted http://www.agiledata.org/essays/dbaSkills.html and was
      > hoping to get some feedback.

      Hi Scott. I read and enjoyed your Agile Database Techniques book a
      while back and the same thing that came to my mind at that time popped
      into my head when I started reading this page--what is the difference
      between an agile DBA and a software developer?

      I work with shops that do not have many, or sometimes any DBAs, where
      every software developer "is actively involved with the creation and
      evolution of the data aspects of one or more applications." Does this
      mean that each of these software developers is an agile DBA by your
      definition?

      If there is a DBA, that person is involved in change management and
      possibly security, but there is nothing technically or from a
      permissions standpoint in these shops that a software developer cannot
      do that a DBA can, with the possible exception that the DBA has access
      to all production data.

      This is possible and even the norm in many non-RDBMS shops. In
      addition to working with a more agile data model than the RM, these
      shops have the advantage of not having any wall between software
      developers and DBA's or between the s/w developers and the DBMS.

      I'm not sure if you want to be broad enough to be talking about DBAs
      in non-RDBMS shops, but there are a lot of them out there, so I
      thought I would note it. If you look at a DBA in a shop using IBM
      Universe, Cache' or possibly Berkeley DB, for example, you are likely
      to find something quite different and also typically more agile. In
      these environments, there is rarely a DBA team as the roles are
      dispersed into software development, change management, architecture,
      and systems administration roles/teams.

      I don't know if you find this relevant to your area of interest, but
      thought I would mention it.
      Cheers! --dawn
      --
      Dawn M. Wolthuis
      Tincat Group, Inc.
      www.tincat-group.com

      Take and give some delight today!
    • Scott W. Ambler
      ... In an ideal world, not much. However, in many organizations there is a strict difference between DBAs and developers, and never the two shall meet. One
      Message 2 of 18 , Feb 15, 2006
        At 03:42 PM 2/15/2006, you wrote:
        >On 2/15/06, Scott W. Ambler <swa@...> wrote:
        > > I just posted http://www.agiledata.org/essays/dbaSkills.html and was
        > > hoping to get some feedback.
        >
        >Hi Scott. I read and enjoyed your Agile Database Techniques book a
        >while back and the same thing that came to my mind at that time popped
        >into my head when I started reading this page--what is the difference
        >between an agile DBA and a software developer?

        In an ideal world, not much. However, in many organizations there is
        a strict difference between DBAs and developers, and never the two
        shall meet. One of the goals of the Agile Data site is to argue that
        DBAs need developer skills, and developers need DBA skills.


        >I work with shops that do not have many, or sometimes any DBAs, where
        >every software developer "is actively involved with the creation and
        >evolution of the data aspects of one or more applications." Does this
        >mean that each of these software developers is an agile DBA by your
        >definition?

        Are they working in an evolutionary and collaborative manner with
        others? If so, then sounds like they're well on their way.
        ><snip>

        - Scott


        ====================================================
        Scott W. Ambler
        Senior Consultant, Ambysoft Inc.
        www.ambysoft.com/scottAmbler.html
      • Dawn Wolthuis
        ... Agreed and that is what I tend to argue. I prefer no DBA team in a shop, even if there are some roles that are held by various developers. ... definitely
        Message 3 of 18 , Feb 15, 2006
          On 2/15/06, Scott W. Ambler <swa@...> wrote:
          > At 03:42 PM 2/15/2006, you wrote:
          > >On 2/15/06, Scott W. Ambler <swa@...> wrote:
          > > > I just posted http://www.agiledata.org/essays/dbaSkills.html and was
          > > > hoping to get some feedback.
          > >
          > >Hi Scott. I read and enjoyed your Agile Database Techniques book a
          > >while back and the same thing that came to my mind at that time popped
          > >into my head when I started reading this page--what is the difference
          > >between an agile DBA and a software developer?
          >
          > In an ideal world, not much.

          Agreed and that is what I tend to argue. I prefer no DBA team in a
          shop, even if there are some roles that are held by various
          developers.

          > However, in many organizations there is
          > a strict difference between DBAs and developers,

          definitely and the org culture that goes along with this is often
          quite nasty or at least unproductive

          > and never the two
          > shall meet. One of the goals of the Agile Data site is to argue that
          > DBAs need developer skills, and developers need DBA skills.

          And I simply argue against separating out a role called DBA, so you
          just have software developers, even if there are a variety of roles
          and skillsets that these developers must have.

          > >I work with shops that do not have many, or sometimes any DBAs, where
          > >every software developer "is actively involved with the creation and
          > >evolution of the data aspects of one or more applications." Does this
          > >mean that each of these software developers is an agile DBA by your
          > >definition?
          >
          > Are they working in an evolutionary and collaborative manner with
          > others?

          Yes. CMM in shops using older, but more agile, data models than the
          RM (which is what I work with).

          > If so, then sounds like they're well on their way.

          Yes and I contend that the relational model did some damage to agility
          in s/w development which is why you see more agile teams and projects
          often in shops where SQL is not the language of choice for data
          access, for example, although there are certainly agile teams who do
          work with the RM too.

          Cheers! --dawn
          --
          Dawn M. Wolthuis
          Tincat Group, Inc.
          www.tincat-group.com

          Take and give some delight today!
        • Scott W. Ambler
          ... In large organizations you typically need some sort of operational data admin group that support the corporate guidelines, understands the legacy data
          Message 4 of 18 , Feb 15, 2006
            At 04:18 PM 2/15/2006, you wrote:
            >On 2/15/06, Scott W. Ambler <swa@...> wrote:
            > > At 03:42 PM 2/15/2006, you wrote:
            > > >On 2/15/06, Scott W. Ambler <swa@...> wrote:
            > > > > I just posted http://www.agiledata.org/essays/dbaSkills.html and was
            > > > > hoping to get some feedback.
            > > >
            > > >Hi Scott. I read and enjoyed your Agile Database Techniques book a
            > > >while back and the same thing that came to my mind at that time popped
            > > >into my head when I started reading this page--what is the difference
            > > >between an agile DBA and a software developer?
            > >
            > > In an ideal world, not much.
            >
            >Agreed and that is what I tend to argue. I prefer no DBA team in a
            >shop, even if there are some roles that are held by various
            >developers.

            In large organizations you typically need some sort of operational
            data admin group that support the corporate guidelines, understands
            the legacy data sources, and so on. See
            http://www.agiledata.org/essays/enterpriseAdministration.html (I hope
            to do an update of this page in the next day or two) for a vision of
            how to do this agilely. Unfortunately, many data admin groups are a
            complete disaster (more on this in the next month or so, time to
            raise the bar on those folks) in practice, being way too large,
            bureaucratic, and political.


            > > However, in many organizations there is
            > > a strict difference between DBAs and developers,
            >
            >definitely and the org culture that goes along with this is often
            >quite nasty or at least unproductive

            Yes. Many orgs suffer from a Tayloristic, command-and-control mindset.


            > > and never the two
            > > shall meet. One of the goals of the Agile Data site is to argue that
            > > DBAs need developer skills, and developers need DBA skills.
            >
            >And I simply argue against separating out a role called DBA, so you
            >just have software developers, even if there are a variety of roles
            >and skillsets that these developers must have.

            Yes, but realistically many people are going to separate the
            concepts. My goal is that if we can raise people's expectations of
            this role, that eventually we'll find ourselves in the situation that
            you recommend.

            ><snip>
            > > If so, then sounds like they're well on their way.
            >
            >Yes and I contend that the relational model did some damage to agility
            >in s/w development which is why you see more agile teams and projects
            >often in shops where SQL is not the language of choice for data
            >access, for example, although there are certainly agile teams who do
            >work with the RM too.

            Yes, RDBs have clearly made it a bit more difficult for us. However,
            as Pramod and I show in Refactoring Databases
            (www.ambysoft.com/books/refactoringDatabases.html) it is possible to
            improve the situation. The reality is that RDBs are here to stay.

            - Scott
          • Dawn Wolthuis
            On 2/15/06, Scott W. Ambler wrote: ... Like Fortran and COBOL are here to stay. Oracle buying Sleepycat is a good sign that they
            Message 5 of 18 , Feb 15, 2006
              On 2/15/06, Scott W. Ambler <swa@...> wrote:
              <snip>
              > Yes, RDBs have clearly made it a bit more difficult for us. However,
              > as Pramod and I show in Refactoring Databases
              > (www.ambysoft.com/books/refactoringDatabases.html) it is possible to
              > improve the situation. The reality is that RDBs are here to stay.

              Like Fortran and COBOL are here to stay. Oracle buying Sleepycat is a
              good sign that they know that some more agile approaches are needed
              for the future. If interested, you might also check my blog entry at
              http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html which
              starts what I expect to be a year-long effort to address
              flexible/agile data modeling from a variety of angles.

              Cheers! --dawn
              --
              Dawn M. Wolthuis
              Tincat Group, Inc.
              www.tincat-group.com

              Take and give some delight today!
            • Curt Sampson
              ... I entirely disagree. I think that, if anything the relational model, with its emphasis on limiting queries to what data you want, rather than how to access
              Message 6 of 18 , Feb 15, 2006
                On Wed, 15 Feb 2006, Scott W. Ambler wrote:

                > >Yes and I contend that the relational model did some damage to agility
                > >in s/w development which is why you see more agile teams and projects
                > >often in shops where SQL is not the language of choice for data
                > >access, for example, although there are certainly agile teams who do
                > >work with the RM too.
                >
                > Yes, RDBs have clearly made it a bit more difficult for us.

                I entirely disagree. I think that, if anything the relational model,
                with its emphasis on limiting queries to what data you want, rather
                than how to access it, has been a huge boon to agility. I'll admit
                that the current products a have their issues*, but given a half dozen
                applications accessing a database, would you rather have them using
                IASM, where you'd have to change every application even to change
                the indexing, much less any change to the "schema" itself. With the
                relational model, even as currently implemented, you at least have some
                independence between the data model and the methods of data access,
                and as well as a fair amount of ability to change the schema and still
                run the same queries against it, things such as derived relations
                (views) that let you make major schema changes without having to change
                applications at the same time.

                (*Well, let's admit the truth; the current products are a huge crock
                in countless ways. Much of this is due to the standardization SQL,
                a language seemingly designed to make using the relational model
                difficult.)

                > The reality is that RDBs are here to stay.

                And I'm grateful for that. However, we should be campaigning for
                features that would let us become more agile. In fact, perhaps you
                should start a list on your website. Here's a seed or two for it:

                * Transactional support for DDL and schema changes. Excepting
                PostgreSQL, I don't think that there's a single DBMS out there that
                lets me roll back things like a series table changes and drops.
                Having this makes changing the schema easier (especially for testing
                change scripts against the production database) and less risky.

                * A better query langauge. Let's face it, SQL sucks. It's incredibly
                hard to refactor, in large part due to the lack of support for
                decomposing and recomposing operations. I ought to be able easily to
                assign a query to an identifier and re-use that elsewhere (sort of
                like a CREATE TEMPORARY VIEW, but with better syntax), for a start.

                * Better separation of logical and physical model. It's ludicrous
                (and a complete violation of the relational model) that I should
                have to denormalize the logical presentation of table in order to
                change the physical storage layout and methods of access.

                cjs
                --
                Curt Sampson <cjs@...> +81 90 7737 2974
                The power of accurate observation is commonly called cynicism
                by those who have not got it. --George Bernard Shaw
              • Scott W. Ambler
                ... Can you expand on this a bit? Perhaps an example? Thanks. - Scott
                Message 7 of 18 , Feb 16, 2006
                  At 08:34 PM 2/15/2006, Curt wrote:
                  ><snip>
                  > * Better separation of logical and physical model. It's ludicrous
                  > (and a complete violation of the relational model) that I should
                  > have to denormalize the logical presentation of table in order to
                  > change the physical storage layout and methods of access.

                  Can you expand on this a bit? Perhaps an example?

                  Thanks.

                  - Scott
                • Willem Bogaerts
                  ... ... The thing that I like the LEAST of SQL is the fact that it _is_ a language. I would like an object-oriented structure that exists on the
                  Message 8 of 18 , Feb 16, 2006
                    <snip>
                    > And I'm grateful for that. However, we should be campaigning for
                    > features that would let us become more agile. In fact, perhaps you
                    > should start a list on your website. Here's a seed or two for it:
                    >
                    <snip>
                    >
                    > * A better query langauge. Let's face it, SQL sucks. It's incredibly
                    > hard to refactor, in large part due to the lack of support for
                    > decomposing and recomposing operations. I ought to be able easily to
                    > assign a query to an identifier and re-use that elsewhere (sort of
                    > like a CREATE TEMPORARY VIEW, but with better syntax), for a start.

                    The thing that I like the LEAST of SQL is the fact that it _is_ a
                    language. I would like an object-oriented structure that exists on the
                    database server as well. I can build any object oriented strucutre I
                    like, but it still has the overhead of building SQL commands, with the
                    extra overhead of the server to parse it again.
                    I would just want a command that would pass the table name, the index
                    name and the index value, resulting in the corresponding record. MySQL's
                    HANDLER command does this, but it is still a command that has to be
                    parsed. Why would I want a database server to parse a structure that I
                    have designed myself or that has at least a known design?

                    Also, I don't want anything unnecessary over the network. Most SQL
                    commands I use today are bigger then their results. A good API would
                    only need a token to communicate what method was called, and the
                    parameters. Nothing more. No parsing necessary. And no statement
                    building necessary.

                    I don't want a language, I want an API.

                    Best regards,
                    Willem Bogaerts
                  • Curt Sampson
                    ... You ve got to consider, for this API, how you re going to do the more complex queries that keep network traffic and database I/O load down. For queries
                    Message 9 of 18 , Feb 17, 2006
                      On Fri, 17 Feb 2006, Willem Bogaerts wrote:

                      > Also, I don't want anything unnecessary over the network. Most SQL
                      > commands I use today are bigger then their results.
                      > ...
                      > I don't want a language, I want an API.

                      You've got to consider, for this API, how you're going to do the more
                      complex queries that keep network traffic and database I/O load down.
                      For queries along the lines of

                      Give me a list of all the customers and the total unpaid amounts
                      each has due on invoices due more than sixty days ago for customers
                      outside of the United States whose terms are 30 days or less.

                      you're going to have a pretty interesting API there. You could do
                      simpler queries and do the processing locally, but then if you've got
                      thousands of customers, tens of thousands of invoices, and hundreds of
                      thousands of line items on those invoices, you could face dragging a lot
                      of extra data across the network, and greater load on the DBMS server
                      due to having to pull a lot of extra data off of the disk.

                      cjs
                      --
                      Curt Sampson <cjs@...> +81 90 7737 2974
                      The power of accurate observation is commonly called cynicism
                      by those who have not got it. --George Bernard Shaw
                    • Curt Sampson
                      ... A simple example would be a table with one column that tends to be large compared to the rest of the data. Say I ve got a row overhead of 42 bytes,
                      Message 10 of 18 , Feb 17, 2006
                        On Thu, 16 Feb 2006, Scott W. Ambler wrote:

                        > At 08:34 PM 2/15/2006, Curt wrote:
                        > ><snip>
                        > > * Better separation of logical and physical model. It's ludicrous
                        > > (and a complete violation of the relational model) that I should
                        > > have to denormalize the logical presentation of table in order to
                        > > change the physical storage layout and methods of access.
                        >
                        > Can you expand on this a bit? Perhaps an example?

                        A simple example would be a table with one column that tends to be large
                        compared to the rest of the data. Say I've got a row overhead of 42
                        bytes, name/address/etc. data averaging 200 bytes per person, and a
                        picture averaging 4 KB. That 4 KB picture is too small for PostgreSQL to
                        TOAST (i.e., move it out of the table pages into a different area), so a
                        table of 100,000 people is going to take up 413 MB or so. With current
                        practice, I might split the table into two, one with the id and picture,
                        and another with the id and everything else. This will make retrievals
                        with the picture slightly slower, but table scans without the picture
                        much faster (and much less likely to blow out your cache). But now I
                        have to have two logical tables as well, and if I want to avoid people
                        "vanishing" when I join the two tables, I need to implement some rather
                        painful cross-table constraints.

                        I'd like to be able to specify that storage split while maintaining the
                        same logical view of a person as being id, name, address, etc., and
                        image, all together.

                        A useful but more complex feature the DBMS could give me would be
                        memoizing queries. Say, for example, I've got a set of accounts, each
                        of which may have thousands or tens of thousands of entries over the
                        years. To generate a "real" current balance for an account, I need to
                        add up all the entries. This can involve a nasty amount of disk I/O.
                        Here, the typical solution is to have "archive" tables and, every year
                        or month or whatever, move the current entries to the archive table, and
                        update the current table to have an initial entry for a starting balance
                        as calculated from the archive entries. Now we're splitting the table
                        horizontally instead of vertically as in the above example, but we're
                        still ending up with two tables where logically we'd like only one. It
                        would be nice if we could tell the database to calculate the monthly
                        results for every month's worth of data in a table, cache that value,
                        and use that, when possible rather than recalculating. Of course, should
                        data affecting a cached value change, the cached value would need to be
                        invalidated and recalculated the next time it was requested. Then we
                        need only one table and the whole archiving problem goes away.

                        I'm sure we've all had times when we found ourselves changing our
                        logical model for peformance reasons. In an ideal world, we ought never
                        need do that.

                        cjs
                        --
                        Curt Sampson <cjs@...> +81 90 7737 2974
                        The power of accurate observation is commonly called cynicism
                        by those who have not got it. --George Bernard Shaw
                      • Nolen, Terry
                        I ve been watching the correspondence and I m intrigued by the debate. I ve been administrating databases for quite a long time, even before relational was
                        Message 11 of 18 , Feb 17, 2006

                          I’ve been watching the correspondence and I’m intrigued by the debate.  I’ve been administrating databases for quite a long time, even before relational was viable, and there is a single reason as to why relational became prominent in the early days.  Reporting is the reason.  Back in the late 70’s and early 80’s, it was recognized that you had to have developers to write reports for the executives, and it took time and money for something very simple.  Relational databases provided a way for non-programmers to get information out, which was traditionally locked away by developers.  This in essence freed an analytic community to use the data more effectively.  The relational databases were slow back then, but they’ve gained ground.  By comparison, they’re still slower than some of the older technology, but more robust.

                           

                          As for me today, I work in an eXtreme Programming lab, and I’m developing code to reduce some of the drudgery of database and data changes for DBAs as well as developers to accelerate the iterative cycles.  I’ve watched the database lifecycle of “we don’t need DBAs”, to “the database is broken, where’s the DBA”; happen at least 4 times over the past 25 years.  I think technology has changed enough now, that it’s finally time for change in database development methodology.  We have to get faster and better without compromising data performance and quality. 

                           

                          The Development DBAs (I like to call them DBEs for Database Engineers) need to adapt to a faster pace, and be very involved directly in the development and programming efforts.  I don’t think the over-the-wall will last much longer when agile methods really take hold.  Change management of both will begin to blend seamlessly.

                           

                          I’m here now to embrace the change, and insure the transactional models and the reporting models adapt to it.  I’m looking forward to joining in the challenge.

                           

                          Terry Nolen

                          Principal Software Architect

                          Sabre Holdings

                           

                           


                          From: agileDatabases@yahoogroups.com [mailto:agileDatabases@yahoogroups.com] On Behalf Of Curt Sampson
                          Sent: Wednesday, February 15, 2006 7:34 PM
                          To: agileDatabases@yahoogroups.com
                          Subject: Re: [agileDatabases] ANN: The Skillset of an Agile DBA

                           

                          On Wed, 15 Feb 2006, Scott W. Ambler wrote:

                          > >Yes and I contend that the relational model did some damage to agility
                          > >in s/w development which is why you see more agile teams and projects
                          > >often in shops where SQL is not the language of choice for data
                          > >access, for example, although there are certainly agile teams who do
                          > >work with the RM too.
                          >
                          > Yes, RDBs have clearly made it a bit more difficult for us.

                          I entirely disagree. I think that, if anything the relational model,
                          with its emphasis on limiting queries to what data you want, rather
                          than how to access it, has been a huge boon to agility. I'll admit
                          that the current products a have their issues*, but given a half dozen
                          applications accessing a database, would you rather have them using
                          IASM, where you'd have to change every application even to change
                          the indexing, much less any change to the "schema" itself. With the
                          relational model, even as currently implemented, you at least have some
                          independence between the data model and the methods of data access,
                          and as well as a fair amount of ability to change the schema and still
                          run the same queries against it, things such as derived relations
                          (views) that let you make major schema changes without having to change
                          applications at the same time.

                          (*Well, let's admit the truth; the current products are a huge crock
                          in countless ways. Much of this is due to the standardization SQL,
                          a language seemingly designed to make using the relational model
                          difficult.)

                          > The reality is that RDBs are here to stay.

                          And I'm grateful for that. However, we should be campaigning for
                          features that would let us become more agile. In fact, perhaps you
                          should start a list on your website. Here's a seed or two for it:

                               * Transactional support for DDL and schema changes. Excepting
                               PostgreSQL, I don't think that there's a single DBMS out there that
                               lets me roll back things like a series table changes and drops.
                               Having this makes changing the schema easier (especially for testing
                               change scripts against the production database) and less risky.

                               * A better query langauge. Let's face it, SQL sucks. It's incredibly
                               hard to refactor, in large part due to the lack of support for
                               decomposing and recomposing operations. I ought to be able easily to
                               assign a query to an identifier and re-use that elsewhere (sort of
                               like a CREATE TEMPORARY VIEW, but with better syntax), for a start.

                               * Better separation of logical and physical model. It's ludicrous
                               (and a complete violation of the relational model) that I should
                               have to denormalize the logical presentation of table in order to
                               change the physical storage layout and methods of access.

                          cjs
                          --
                          Curt Sampson            <cjs@...>             +81 90 7737 2974
                             The power of accurate observation is commonly called cynicism
                             by those who have not got it.    --George Bernard Shaw



                        • Alex Weatherall
                          Hi, * Transactional support for DDL and schema changes. Excepting PostgreSQL, I don t think that there s a single DBMS out there that lets me roll back
                          Message 12 of 18 , Feb 17, 2006
                            Hi,

                            <snip>
                            * Transactional support for DDL and schema changes. Excepting
                            PostgreSQL, I don't think that there's a single DBMS out there that
                            lets me roll back things like a series table changes and drops.
                            Having this makes changing the schema easier (especially for testing
                            change scripts against the production database) and less risky.
                            </snip>

                            Have you used Red-Gate's SQL Compare product. That works against SQL
                            Server and performs "transactional" DDL and schema changes. It isn't a
                            proprietary feature of SQL Server 2000 admittedly but it works well
                            and I use their technique in most/all schema changes that I do.

                            See the attached file for an example script (this script will error
                            and therefore rollback changes - change the DDL to see it succeed :-)
                            ).

                            The mechanism uses a temporary table to maintain the error state of
                            the script (i.e. not errored or errored).
                            It begins a transaction.
                            Then performs each schema change batch in turn.
                            Then after each batch it checks if it errored and updates the temporary table.
                            Then it checks the temporary table and if there has been an error
                            rollsback the transaction (this has the effect of rolling back all
                            previous changes). It then starts a new transaction for the next
                            batch.

                            At the end we can see if our update failed or not and if it failed the
                            database is returned to its previous state.

                            Note: This technique does lock the database from access to the objects
                            affected for the entirety of the update.

                            Hope this helps.

                            Thanks,

                            Alex




                            On 16/02/06, Curt Sampson <yahoo@...> wrote:
                            > On Wed, 15 Feb 2006, Scott W. Ambler wrote:
                            >
                            > > >Yes and I contend that the relational model did some damage to agility
                            > > >in s/w development which is why you see more agile teams and projects
                            > > >often in shops where SQL is not the language of choice for data
                            > > >access, for example, although there are certainly agile teams who do
                            > > >work with the RM too.
                            > >
                            > > Yes, RDBs have clearly made it a bit more difficult for us.
                            >
                            > I entirely disagree. I think that, if anything the relational model,
                            > with its emphasis on limiting queries to what data you want, rather
                            > than how to access it, has been a huge boon to agility. I'll admit
                            > that the current products a have their issues*, but given a half dozen
                            > applications accessing a database, would you rather have them using
                            > IASM, where you'd have to change every application even to change
                            > the indexing, much less any change to the "schema" itself. With the
                            > relational model, even as currently implemented, you at least have some
                            > independence between the data model and the methods of data access,
                            > and as well as a fair amount of ability to change the schema and still
                            > run the same queries against it, things such as derived relations
                            > (views) that let you make major schema changes without having to change
                            > applications at the same time.
                            >
                            > (*Well, let's admit the truth; the current products are a huge crock
                            > in countless ways. Much of this is due to the standardization SQL,
                            > a language seemingly designed to make using the relational model
                            > difficult.)
                            >
                            > > The reality is that RDBs are here to stay.
                            >
                            > And I'm grateful for that. However, we should be campaigning for
                            > features that would let us become more agile. In fact, perhaps you
                            > should start a list on your website. Here's a seed or two for it:
                            >
                            > * Transactional support for DDL and schema changes. Excepting
                            > PostgreSQL, I don't think that there's a single DBMS out there that
                            > lets me roll back things like a series table changes and drops.
                            > Having this makes changing the schema easier (especially for testing
                            > change scripts against the production database) and less risky.
                            >
                            > * A better query langauge. Let's face it, SQL sucks. It's incredibly
                            > hard to refactor, in large part due to the lack of support for
                            > decomposing and recomposing operations. I ought to be able easily to
                            > assign a query to an identifier and re-use that elsewhere (sort of
                            > like a CREATE TEMPORARY VIEW, but with better syntax), for a start.
                            >
                            > * Better separation of logical and physical model. It's ludicrous
                            > (and a complete violation of the relational model) that I should
                            > have to denormalize the logical presentation of table in order to
                            > change the physical storage layout and methods of access.
                            >
                            > cjs
                            > --
                            > Curt Sampson <cjs@...> +81 90 7737 2974
                            > The power of accurate observation is commonly called cynicism
                            > by those who have not got it. --George Bernard Shaw
                            >
                            >
                            >
                            >
                            >
                            > ________________________________
                            YAHOO! GROUPS LINKS
                            >
                            >
                            > Visit your group "agileDatabases" on the web.
                            >
                            > To unsubscribe from this group, send an email to:
                            > agileDatabases-unsubscribe@yahoogroups.com
                            >
                            > Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
                            >
                            > ________________________________
                          • Willem Bogaerts
                            ... That s exactly what I mean. How on earth do you want to incorporate this in an object oriented program? What does it mean actually? What you describe is
                            Message 13 of 18 , Feb 19, 2006
                              >>Also, I don't want anything unnecessary over the network. Most SQL
                              >>commands I use today are bigger then their results.
                              >>...
                              >>I don't want a language, I want an API.
                              >
                              >
                              > You've got to consider, for this API, how you're going to do the more
                              > complex queries that keep network traffic and database I/O load down.
                              > For queries along the lines of
                              >
                              > Give me a list of all the customers and the total unpaid amounts
                              > each has due on invoices due more than sixty days ago for customers
                              > outside of the United States whose terms are 30 days or less.

                              That's exactly what I mean. How on earth do you want to incorporate this
                              in an object oriented program? What does it mean actually?
                              What you describe is not a query, it is a program. Almost ever database
                              server program available supports stored procedures. These procedures
                              could follow the basic commands of "my" API instead of having a *huge*
                              overhead of parsing every command and then finding the resources
                              available, and thinking of a strategy to conquer this dragon in a bit
                              optimal way.
                              Get serious. I know you can abuse SQL to generate a full webpage for
                              you, with templates and languages and all, in one statement. But that is
                              not what a database server is for.

                              But, for the sake of completeness, let's do this with "my" API, OK?
                              My code will feature a Payments collection with Payment objects, a
                              Customers collection with Customer objects and a Countries collection
                              with Country objects. All collections are implemented as lazy collections.
                              I would ask the Payments collection for the "unpaid for 60 days" subset.
                              The Payments collection then asks the database to get all records from
                              the Payments table that apply to this rule using the
                              PaymentStatus/PaymentDate index.
                              From these results, the Customers collection is consulted. As this is a
                              lazy collection, Each applicable customer is loaded only once. As it should.
                              With these results, the Countries collection would be consulted. Again,
                              with an index (the primary key) and lazily. If there are a thousand
                              customers that would be listed in only three countries, only the three
                              countries are requested from the database server. That is, only if they
                              are not already in memory.

                              So let's look at the network traffic. I may have to filter more on the
                              client (but I may use a stored procedure to prevent that), but instead
                              of pumping the same country names again and again over the network, I
                              get only the results I am interested in.
                              As you see, there is nothing to parse here. I _know_ what indexes I can
                              use on what tables, so why should the database server try to find what I
                              already knew before the query even existed? With "my" API, the parsing
                              is done by me, stored in the compiled code, and _not_ done at runtime
                              for every query.

                              > you're going to have a pretty interesting API there. You could do
                              > simpler queries and do the processing locally, but then if you've got
                              > thousands of customers, tens of thousands of invoices, and hundreds of
                              > thousands of line items on those invoices, you could face dragging a lot
                              > of extra data across the network, and greater load on the DBMS server
                              > due to having to pull a lot of extra data off of the disk.
                              >
                              > cjs

                              On the contrary. My API is going to elementary, simple, and will not
                              have the vast and non-standard options found in SQL implementations of
                              modern database servers, that force you to eat through the documentation
                              for a week, resulting in a query that you cannot understand yourself
                              anymore, but for some reason seems to do what you originally wanted.
                              The API program can be broken into little steps or subprocedures with
                              descriptive names, So it can be well documented in code and remain
                              clear. Even if the business rules change and another search must be
                              done. I cannot say that of SQL.

                              Best regards,
                              Willem Bogaerts
                            • Curt Sampson
                              ... It is certainly a query. It declaratively specifies a particular set of data you d like to see. And it s not uncommon for me to do queries just like this
                              Message 14 of 18 , Feb 21, 2006
                                On Sun, 19 Feb 2006, Willem Bogaerts wrote:

                                > > Give me a list of all the customers and the total unpaid amounts
                                > > each has due on invoices due more than sixty days ago for
                                > customers
                                > > outside of the United States whose terms are 30 days or less.
                                >
                                > That's exactly what I mean. How on earth do you want to incorporate this
                                > in an object oriented program? What does it mean actually?
                                > What you describe is not a query, it is a program.

                                It is certainly a query. It declaratively specifies a particular set
                                of data you'd like to see. And it's not uncommon for me to do queries
                                just like this in SQL, though when something starts to get as complex
                                as this, I tend to write it first in a relational pseudo-code and then
                                translate it to the (generally rather more complex) SQL form.

                                You could certainly write a program to implement this query, and that is
                                in fact just what the DBMS does when it prepares a query plan. However,
                                I think that that is a task better to automate than to do by hand. While
                                I'm willing to provide assistance to the DBMS by indicating what sort of
                                queries I expect (done in a very limited way in SQL by creating indexes
                                and suchlike), I don't care to translate every query into directions
                                about how to access things on disk any more than I care to translate
                                every program I write into the target system's machine language by hand.
                                I have compilers to do that.

                                > Almost ever database server program available supports stored
                                > procedures. These procedures could follow the basic commands of "my"
                                > API instead of having a *huge* overhead of parsing every command and
                                > then finding the resources available, and thinking of a strategy to
                                > conquer this dragon in a bit optimal way.

                                Yes. Instead you move that overhead to your system. It really makes
                                little difference.

                                > Get serious. I know you can abuse SQL to generate a full webpage for
                                > you, with templates and languages and all, in one statement. But that is
                                > not what a database server is for.

                                And that is certainly not at all what I did above, is it?

                                Now let's see what the problem with your approach is:

                                > I would ask the Payments collection for the "unpaid for 60 days"
                                > subset. The Payments collection then asks the database to get all
                                > records from the Payments table that apply to this rule using the
                                > PaymentStatus/PaymentDate index.

                                Very nice. Except that your system didn't know that out of a million
                                customers, you have one hundred thousand who are unpaid for 60 days,
                                but only 175 who are outside of the United States. Any competently
                                written DBMS would know that, because it would have statistics for that
                                sort of thing, and would do that query first, saving a table scan of
                                the customer table, and thus an enormous amount of disk I/O. It's also
                                saving a massive amount of network I/O if you were proposing to transfer
                                every customer across the network.

                                > With "my" API, the parsing is done by me, stored in the compiled code,
                                > and _not_ done at runtime for every query.

                                Which means that you are stuck with whatever access methods the
                                programer set, rather than changing access methods to best suit the
                                current data.

                                > On the contrary. My API is going to elementary, simple, and will not
                                > have the vast and non-standard options found in SQL implementations of
                                > modern database servers, that force you to eat through the documentation
                                > for a week, resulting in a query that you cannot understand yourself
                                > anymore, but for some reason seems to do what you originally wanted.
                                > The API program can be broken into little steps or subprocedures with
                                > descriptive names, So it can be well documented in code and remain
                                > clear. Even if the business rules change and another search must be
                                > done. I cannot say that of SQL.

                                I agree with most of the criticisms above, although I think you
                                overstate the case a bit. However, I think that you'd be better off
                                writing a library to generate SQL from a better relational language than
                                writing a library to emulate hierarchial access methods over top of a
                                relational database.

                                cjs
                                --
                                Curt Sampson <cjs@...> +81 90 7737 2974
                                The power of accurate observation is commonly called cynicism
                                by those who have not got it. --George Bernard Shaw
                              • Dawn Wolthuis
                                ... ... Or skip the relational language in the solution altogether. My experience has been that end-to-end efforts that leave out the RM provide more
                                Message 15 of 18 , Feb 21, 2006
                                  On 2/21/06, Curt Sampson <yahoo@...> wrote:
                                  > On Sun, 19 Feb 2006, Willem Bogaerts wrote:
                                  <snip>
                                  > However, I think that you'd be better off
                                  > writing a library to generate SQL from a better relational language than
                                  > writing a library to emulate hierarchial access methods over top of a
                                  > relational database.

                                  Or skip the relational language in the solution altogether. My
                                  experience has been that end-to-end efforts that leave out the RM
                                  provide more flexible solutions. However, theory suggests that the RM
                                  would be better. This discrepancy has bothered me enough that I
                                  started to write on that subject (with a first entry being "Is Codd
                                  Dead?") in a blog at http://www.tincat-group.com/mewsings . I think
                                  that it is time to introduce lists as attribute values back into the
                                  mix, for example, for much more agile modeling and implementation.
                                  Cheers! --dawn
                                  --
                                  Dawn M. Wolthuis
                                  Tincat Group, Inc.
                                  www.tincat-group.com

                                  Take and give some delight today!
                                • Curt Sampson
                                  ... This seems to me typical of where people have gone wrong in the agile vs. database battle. I would suspect that most of us would agree with the following
                                  Message 16 of 18 , Feb 22, 2006
                                    On Tue, 21 Feb 2006, Dawn Wolthuis wrote:

                                    > I think that it is time to introduce lists as attribute values
                                    > back into the mix, for example, for much more agile modeling and
                                    > implementation.

                                    This seems to me typical of where people have gone wrong in the agile
                                    vs. database battle.

                                    I would suspect that most of us would agree with the following statements:

                                    a) The ability to rapidly and drastically change code (I like to
                                    call this "rototilling") and deploy those changes, yet do so safely,
                                    is wonderful and leads to faster and better software development.

                                    b) The relational model, by qualitatively simplifying the structure
                                    of databases, freeing one from the bounds of having to chose data
                                    access paths, and separating physical and logical concerns, is good
                                    thing.

                                    c) Current implementations of the relational model, while falling
                                    far short of the ideal, do indeed give us some of the advantages of
                                    the model, though far from all of them.

                                    Our current DBMSes certainly make us suffer when we try to be agile.
                                    It's a problem we need to attack.

                                    But I argue that this is far from a new problem, and previous solutions
                                    did not follow the currently proposed method of attack.

                                    Procedural languages such as C make it very hard to be agile. One can
                                    barely do unit testing, much less write a DSL, and as soon as one starts
                                    to do something complex, the syntax barriers are formidable. However,
                                    we didn't solve this by continuing to use C; we created other languages
                                    that put fewer barriers in our way.

                                    In the same way, it's not at all the relational model itself that is
                                    interfering with our ability to be agile in the DBMS world; it's the
                                    current DBMS products. Why is it so hard to concisely specify several
                                    different interfaces to a set of base data? Why is it so hard to do even
                                    simple renaming refactorings? Why can't we write our queries in a much,
                                    much more modular fashion? None of this has anything to do with the
                                    relational model; it has to do with vendors giving us products that are,
                                    from the agile point of view, crap.

                                    So rather than dumping the good stuff because we're getting bad stuff
                                    along with it, we should be trying to get rid of the bad stuff.

                                    Ideally, I thike we should be moving the relational model into our OO
                                    programs, as well.

                                    OO introduced one level of abstraction in data access by allowing us to
                                    substitute similar storage and search algorithms for each other. For
                                    example, in Java, when I want to read the second element stored in a
                                    List object, I have no need to care about whether it's an ArrayList or a
                                    LinkedList. I can delegate the responsiblity for deciding the specific
                                    storage layout and access methods to the most appropriate place in my
                                    program, and I can change it at will.

                                    OO should move to the next level of abstration, the relational one, by
                                    extending this abstraction to more complex searches. Given a collection
                                    of items, I ought to be able to say, "give me all of the ones where the
                                    e-mail address ends in @...", without having to worry about
                                    how the data are stored or how the search will be done. Being able to
                                    delegate that kind of thing to the most appropriate part of the program
                                    gives me further agility.

                                    Think about how much time you spend constructing hash tables and pointer
                                    chains, and trying to remember just how things are stored and indexed.
                                    That could all go away, if the language and/or libraries were more
                                    powerful.

                                    cjs
                                    --
                                    Curt Sampson <cjs@...> +81 90 7737 2974
                                    The power of accurate observation is commonly called cynicism
                                    by those who have not got it. --George Bernard Shaw
                                  • Dawn Wolthuis
                                    ... I suspect you are correct that most would agree with that statement. I don t. I recognized that in my experience (strictly anecdotal), I spent far fewer
                                    Message 17 of 18 , Feb 23, 2006
                                      On 2/22/06, Curt Sampson <yahoo@...> wrote:
                                      > On Tue, 21 Feb 2006, Dawn Wolthuis wrote:
                                      >
                                      > > I think that it is time to introduce lists as attribute values
                                      > > back into the mix, for example, for much more agile modeling and
                                      > > implementation.
                                      >
                                      > This seems to me typical of where people have gone wrong in the agile
                                      > vs. database battle.
                                      >
                                      > I would suspect that most of us would agree with the following statements:
                                      >
                                      > a) The ability to rapidly and drastically change code (I like to
                                      > call this "rototilling") and deploy those changes, yet do so safely,
                                      > is wonderful and leads to faster and better software development.
                                      >
                                      > b) The relational model, by qualitatively simplifying the structure
                                      > of databases, freeing one from the bounds of having to chose data
                                      > access paths, and separating physical and logical concerns, is good
                                      > thing.

                                      I suspect you are correct that most would agree with that statement.
                                      I don't. I recognized that in my experience (strictly anecdotal), I
                                      spent far fewer dollars getting teams working with a non-RDBMS product
                                      to accomplish more than with SQL-DBMS (or IMS) tools. So a couple of
                                      years ago I started studying why this might be - was it just my
                                      experience or was there some meat to this?

                                      Obviously there is nothing obviously wrong with relational theory and
                                      it is mathematically fine, but I think there is something about
                                      applying that model for typical software development that makes it
                                      less agile than if you employ other models. I don't know if you
                                      looked at my blog, but instead of reproducing the same information
                                      here, I'll just suggest you might be interested in
                                      www.tincat-group.com/mewsings . There is no slam dunk case, but I
                                      think there is an excellent case for the industry to move away from
                                      use of the relational model for interfacing with databases.

                                      > c) Current implementations of the relational model, while falling
                                      > far short of the ideal, do indeed give us some of the advantages of
                                      > the model, though far from all of them.

                                      There are pros and cons to any solution, but there have been few
                                      agility improvements with databases such as Oracle over products that
                                      came before. Can you think of any way in which an Oracle is more
                                      agile for developers than Cache' (Intersystems) or U2 (IBM), for
                                      example? I can think of some other advantages, but agility doesn't
                                      make the list.

                                      > Our current DBMSes certainly make us suffer when we try to be agile.
                                      > It's a problem we need to attack.

                                      Yes, that is a focus for my work right now.

                                      > But I argue that this is far from a new problem, and previous solutions
                                      > did not follow the currently proposed method of attack.

                                      I often work with users of a very old solution that has many failings
                                      itself, but permits very agile software development and maintenance
                                      (PICK, aka, MultiValue). While I would suggest there are issues with
                                      any solution, I would like to see how we can take this more agile data
                                      model and improve it with what works well in relational databases
                                      without losing the agility or, possibly (although I'm less hopeful
                                      about this) add in those aspects that make it agile into existing
                                      RDBMS's. Most old data models, such as Berkeley DB (name/value
                                      pairs), U2 (PICK), and Cache' (MUMPS), are hosted by products that
                                      look their age. But in spite of that, they have some significant
                                      advantages over SQL-DBMS tools.

                                      > Procedural languages such as C make it very hard to be agile. One can
                                      > barely do unit testing, much less write a DSL, and as soon as one starts
                                      > to do something complex, the syntax barriers are formidable. However,
                                      > we didn't solve this by continuing to use C; we created other languages
                                      > that put fewer barriers in our way.

                                      I'm sure there are many who will continue to use SQL til the cows come
                                      home, but I agree that other languages are in order. My opinion is
                                      that these other languages should implement data models other than the
                                      relational model (XQuery would be an example), while others think that
                                      Tutorial-D (better relational model language than SQL) is in order.

                                      > In the same way, it's not at all the relational model itself that is
                                      > interfering with our ability to be agile in the DBMS world;

                                      Yes, it is. As I mention in my writing this week, the RM is not
                                      sufficient for software development. That means we necessarily must
                                      cross a chasm, even if sometimes small, to use it. Remove that gap
                                      and you gain agility.

                                      > it's the
                                      > current DBMS products.

                                      those too

                                      > Why is it so hard to concisely specify several
                                      > different interfaces to a set of base data? Why is it so hard to do even
                                      > simple renaming refactorings? Why can't we write our queries in a much,
                                      > much more modular fashion? None of this has anything to do with the
                                      > relational model; it has to do with vendors giving us products that are,
                                      > from the agile point of view, crap.

                                      Yes, it is related to the relational model in that the model has to do
                                      with the representation (that is Codd's claim with which I agree) of
                                      data structures with relations as the only first class citizen. If
                                      you were to put another interface to the data and permit lists in it,
                                      for example, it would cease to be the relational model. The RM is not
                                      about how data are stored (in theory) but how they are represented in
                                      the interface between the database and the world.

                                      > So rather than dumping the good stuff because we're getting bad stuff
                                      > along with it, we should be trying to get rid of the bad stuff.

                                      That was my original thinking until I did further research. I now
                                      believe we need to leave the RM behind, possibly by enhancing it so it
                                      is no longer the RM, but it might be preferable to start with other
                                      successful data models and improve the products employing those. They
                                      really do seem to lead to significant gains in agility.

                                      > Ideally, I thike we should be moving the relational model into our OO
                                      > programs, as well.

                                      There are many people who think that. I think that is the opposite of
                                      what needs to happen. Because it is impossible to create a view that
                                      meets the requirements of the RM (the Information Principle) and have
                                      that as a single view of the data for a UI (which I recognize was not
                                      the purpose of the RM) you simply cannot employ the RM head to toe in
                                      software development. You must switch models at some point to get
                                      such constructs as ordered lists. I think I say this better in my
                                      blog writing, but hopefully this gives an idea.

                                      > OO introduced one level of abstraction in data access by allowing us to
                                      > substitute similar storage and search algorithms for each other. For
                                      > example, in Java, when I want to read the second element stored in a
                                      > List object, I have no need to care about whether it's an ArrayList or a
                                      > LinkedList. I can delegate the responsiblity for deciding the specific
                                      > storage layout and access methods to the most appropriate place in my
                                      > program, and I can change it at will.

                                      Yes. And you cannot employ such abstractions and retain the
                                      relational model. The RM was seeming slimy to me in that whenever I
                                      tried to capture what it was, someone said it wasn't that. That is
                                      one of the reasons I decided to do more research. Nailing down
                                      precisely what the RM is might lead you, as it did me, to see that we
                                      need to move beyond it.

                                      > OO should move to the next level of abstration, the relational one, by
                                      > extending this abstraction to more complex searches. Given a collection
                                      > of items, I ought to be able to say, "give me all of the ones where the
                                      > e-mail address ends in @...", without having to worry about
                                      > how the data are stored or how the search will be done.

                                      Yes! But that is not the RM that gives you that. You can get that in
                                      other languages too (such as XQuery, although I'm not as much an
                                      XML/XQuery proponent as I might sound) without having to reduce all of
                                      your data structures to relations. You can work with a "web"
                                      (di-graph with tress on the nodes) of data.

                                      > Being able to
                                      > delegate that kind of thing to the most appropriate part of the program
                                      > gives me further agility.
                                      >
                                      > Think about how much time you spend constructing hash tables and pointer
                                      > chains, and trying to remember just how things are stored and indexed.
                                      > That could all go away, if the language and/or libraries were more
                                      > powerful.

                                      I agree with this last statement and my goals are similar to yours. I
                                      simply think that we, as a profession, need to ditch the RM in order
                                      to move further faster. Cheers! --dawn

                                      --
                                      Dawn M. Wolthuis
                                      Tincat Group, Inc.
                                      www.tincat-group.com

                                      Take and give some delight today!
                                    Your message has been successfully submitted and would be delivered to recipients shortly.