Loading ...
Sorry, an error occurred while loading the content.
 

Re: Database Version Management

Expand Messages
  • Eli Golovinsky
    You ve raised some serious issues here. ... It doesn t seem to me that everybody uses SQL*Plus to manage their databases. And even if it is common with Oracle,
    Message 1 of 9 , Nov 11, 2005
      You've raised some serious issues here.

      > The upgrade scripts may be generated, but the changes must have been
      > applied to the database in the first place. Add the fact that the
      > data migrations still need to be written, and you've still manually
      > produced everything that will be generated, haven't you?

      It doesn't seem to me that everybody uses SQL*Plus to manage their
      databases. And even if it is common with Oracle, it is not so with
      other databases. So it seems quite possible to me that when people
      change the structure of their database, they usually don't write the
      DDL statements themselves. The data updating is something that needs
      to be done by hand, but Wilfred's ideas about automating the
      refactoring process (which I will respond to in a separate message) is
      a way to reduce the amount of manual work.

      Another reason I find storing a state file in source control a good
      thing is the ability to examine the history of changes made to a
      database. It could be as useful as it is with code source control. It
      might be possible to implement a 'blame' command which shows for each
      database object who changed it last and why.

      As someone has pointed out to me, it's possible to implement a simple
      source control system through the following steps:

      <quote>
      Here's how a tool that does not use XML might work:
      ** Inputs **:
      1) Existing "Upgrade Script" (base case is a blank file)
      2) New "SQL block" that the user wishes to add
      ** Outputs **:
      1) Updated "Upgrade Script" (concatenation of Inputs 1 and 2 with
      input 2 surrounded by a some SQL IF statement to ensure it is never
      executed again if upgrade script is re-executed. This may require
      creating a table to keep track of versions - only done in the base case).
      2) "New Database Script" (generated by tool after it executes the
      "Upgrade Script" on a blank database).
      </quote>

      This is a quite possible implementation, but I think a limiting one.
      Without the ability to parse the information stored in source control
      (in this case creation and upgrading scripts), there are very few
      intelligent things a program can do. And I really want to try and
      avoid the need to write and maintain an SQL parser for each flavor of
      SQL in existence.

      > The second problem I have with the approach is the lack of testing.

      I've taken a look at your blog and read some of the things you said
      about pre- and post-condition testing. I really like the idea and I
      have a few ideas about how to combine your theories with mine.

      A far as I understand there are two categories of conditions (both
      pre- and post-conditions) that you might want to test - structural and
      data related. The structural conditions demand that the database
      include certain tables, columns, priamry and foreign keys and so on.
      The data related conditions might include some constraints on the data
      size, type or relation between data in different parts of the
      database. When changing a datatype of a column for instance, you'd
      want to make sure you could actually fit the old data into the new type.

      I believe that the structural pre-conditions and post-conditions could
      be checked by comparing a database with a pre-saved state file,
      perhaps with some constraints relaxed (we would not normally want to
      compare the existance of indexes for instance as their existence or
      absense may be due to a local optimization).

      Lets say that STATE1 is what the database is supposed to be like when
      we start the conversion process and STATE2 is what the database will
      be like when we are done. There might be intermediate states we'll
      have to go through, but I'll ignore it for now. Before converting we
      compare the database with STATE1 and go over the differences to see
      whether they are premitted or not. If there are differences, the
      upgrade will not run. On the other hand, if all the differences are
      tolerable, we move on to perform the actual conversion - running both
      the automatic and the manual scripts. Now, after the conversion is
      complete we can compare the resulting database with STATE2 and check
      for differences. If there are any serious ones, the conversion will
      have failed, probably because of a bug in our automatic tool. It might
      also be useful to check the database is in the state we expect it to
      be several times during the upgrade, between each pair of states if
      we're moving from STATE1 to STATE5 through 2, 3 and 4.

      The data pre- and post-conditions are something that will have to be
      done by hand, but could be incorporated into the automatic scripts in
      a similar manner as the update scripts. It's also possible to do some
      research and define types of the various data conditions and create a
      UI that will help build them with ease.

      Like I said, I really like your ideas about database design by
      contract and I think that they should play along nicely with what I
      had in mind. It will greatly inhance Daversy's usefullness, so I will
      look into implementing them.

      > I also hope that one day I will take my own ideas on how database
      > upgrades should be produced and get an open source project togetehr
      > to implement it for a wider audience.

      By all means. I'm sure that that will do all of us a lot of good.
    • Eli Golovinsky
      Hi Curt, ... I ve been thinking about making Daversy an automatic refactoring tool sometime in the future, but I wasn t quite sure how to approach this issue.
      Message 2 of 9 , Nov 11, 2005
        Hi Curt,

        > But for a refactoring like this that I do "by hand," I'd be really
        > impressed if your tool could generate this change set.

        I've been thinking about making Daversy an automatic refactoring tool
        sometime in the future, but I wasn't quite sure how to approach this
        issue.

        I don't want Daversy to become a tool you use to manage your database,
        because those tasks are very database specific and I'd like to try and
        keep it neutral. On the other hand, database refatoring is something
        that could really be useful. Your idea on just creating a tool that
        could generate a refactoring script is a good one.

        The tool will have to read the current database structure, present you
        with several refactoring options and generate a script that will
        perform them for you. Those scripts can then be edited by hand to
        fine-tune them.

        The problem arises when I think about incorporating such features with
        source control for the database. If each and every change to the
        database is made through one or more refactorings, there is no
        problem. I could keeps the refactorings in the source control and
        generate upgrade scritps as neccesary, but I don't think it will even
        come to that - some changes are bound to be applied directly to the
        database. And here's the problem - the upgrade scripts could either be
        generated from the comparison of two states or from the refactorings,
        because both might contain the same DDL statements.

        On the other hand, the refactorings could be generated in such a way
        that they handle only the data migrations and not the structure changes.

        Hmm.. I think I have an idea.

        Say we have a database in STATE1 and we want to upgrade it to STATE2.
        We've made some of the changes through refactoring, and we want others
        to be inferred from the differences between the two states.

        We dry-run (in the tool, without the use of a database) the
        refactorings on STATE1 to get STATE1A. We then compare STATE1A to
        STATE2 to get the changes that were made without using refactorings.

        The actual script that we will run will constist of the refactorings,
        the DDL changes between STATE1A and STATE2 and possibly some custom
        scripts.

        I've now become even more convinced that there is no way to separate a
        source control tool for an automatic migration tool and a refactoring
        tool. All three of them must be thought of and implemented together.

        What do you think?

        >
        > I'd think it might be easier just to generate a single script that does
        > whatever Davsey can figure out, and then point the developer at that to
        > go and tweak it to do the right thing. Then the developer can deal with
        > issues such as changing
        >
        > ALTER TABLE coupon ADD COLUMN shop_id NOT NULL REFERENCES shop
        >
        > to
        >
        > ALTER TABLE coupon ADD COLUMN shop_id NULL REFERENCES shop
        > -- Insert valid values into the column.
        > ALTER TABLE coupon ALTER COLUMN shop_id SET NOT NULL
        >
        > And this, of course, gives the developer the ultimate in easy
        > flexability, since he doesn't have to worry about separate patch files
        > or anything like that; just generate and go. And it doesn't preclude
        > adding refactoring tools later that could accept a command like "rename
        > table foo to bar" and generate all of the right stuff for that.
        >
        > If the SQL was getting too nasty, I'd be really tempted to just build
        > a DSL over top of it that still lets you get down and dirty for those
        > moments when you have a really tricky upgrade.
        >
        > cjs
        > --
        > Curt Sampson <cjs@c...> +81 90 7737 2974
        > Make up enjoying your city life...produced by BIC CAMERA
        >
      • Curt Sampson
        ... I think you re just getting yourself into a whole other set of limitations. Even on incredibly simple refactorings, such as dropping a column, Daversy
        Message 3 of 9 , Nov 13, 2005
          On Fri, 11 Nov 2005, Eli Golovinsky wrote:

          > This is a quite possible implementation, but I think a limiting one.
          > Without the ability to parse the information stored in source control
          > (in this case creation and upgrading scripts), there are very few
          > intelligent things a program can do. And I really want to try and
          > avoid the need to write and maintain an SQL parser for each flavor of
          > SQL in existence.

          I think you're just getting yourself into a whole other set of
          limitations. Even on incredibly simple refactorings, such as dropping
          a column, Daversy already produces upgrade scripts containing things
          such as, "-- Can't remove column DESCRIPTION from table SALGRADE : not
          supported in SQLite." So what's a poor DBA to do? Write out by hand the
          code to deal with this, of course. What happens when you get to more
          sophisticated refactorings, such as the example I gave before, that
          require multiple steps? What happens when functions and triggers need
          to change the way they operate on tables you've changed? I think you'll
          have a lot of cases that can only be dealt with by a DBA.

          And this XML intermediate representation has its own problems as well,
          because every DBMS has its own set of special features that now has to
          be converted back and forth between its flavour of SQL and your XML. How
          are you going to deal with things such as PostgreSQL table inheritance?
          You get into a lot of special cases anyway, and you're going to end up
          doing a lot more coding.

          I've been looking at this issue for a few years now, and at this point
          I'm entirely convinced that the only way to do this at reasonable
          expense is to have a tool that will let you, regardless of what else you
          do, continue to use direct-coded SQL statements particular to your DBMS.
          Otherwise you're going to be overwhelmed with the programming effort for
          your database connectors, or you're going to be limited to just what the
          generic program supports, and you'll never be able to use some of the
          special features of your particular DBMS.

          > I've been thinking about making Daversy an automatic refactoring
          > tool...creating a tool that could generate a refactoring script is a
          > good [idea].
          >
          > The tool will have to read the current database structure, present you
          > with several refactoring options and generate a script that will
          > perform them for you. Those scripts can then be edited by hand to
          > fine-tune them.
          >
          > The problem arises when I think about incorporating such features with
          > source control for the database.

          If you're using my text-file methods, I don't see a problem. You start
          out with a file containing a CREATE TABLE statement. When you're done,
          you have a changed version of that file, and an upgrade script. E.g.,
          starting with:

          employee.sql:

          CREATE TABLE employee (
          employee_id serial PRIMARY KEY,
          lastname text NOT NULL,
          firstname text NOT NULL
          );

          You apply two "rename column" refactorings, using the automated too. Now
          you have one changed file and one new file:

          employee.sql

          CREATE TABLE employee (
          employee_id serial PRIMARY KEY,
          family_name text NOT NULL,
          given_name text NOT NULL
          );

          update-2005-11-14.sql:

          ALTER TABLE employee RENAME COLUMN lastname TO family_name;
          ALTER TABLE employee RENAME COLUMN firstname TO given_name;

          Add the new file to revision control, commit the changes, and you're
          done. Unless the update script is not correct, of course, in which case
          you can tweak it as necessary. Perhaps you want to add some tests or
          queries before and after the actual upgrade itself, so that you can confirm
          that the upgrade script is working properly.

          If you're using Subversion, finding out what changed and when is dead
          easy; just diff the changeset and you'll get both the changes to the
          CREATE TABLE statement and the change script for moving from the
          previous revision to the next. If you need to look at more changes, just
          diff two revisions several changesets apart, and you'll get what you
          need.

          You're right that writing parsers for every version of SQL out there
          is going to be a bit of work, but I'm not entirely sure it's as bad
          as you make it out to be. After all, for anything outside of what the
          refactoring tool actually refactors, it doesn't need to understand the
          semantics of the code; it just needs to be able to parse it and spit
          it back out again.

          But I think that it might possibly be more productive to approach from
          the other direction: write a domain specific language that can be
          combined with raw SQL that will encapsulate the things common to all
          DBMSes, and then just use the DSL and SQL together in your text files.
          The refactoring tool uses the DSL and ignores the SQL. And the DSL alone
          could save a lot of tedious typing effort, given how verbose SQLs tend
          to be to express relatively simple concepts.

          > Say we have a database in STATE1 and we want to upgrade it to STATE2.
          > We've made some of the changes through refactoring, and we want others
          > to be inferred from the differences between the two states.
          >
          > We dry-run (in the tool, without the use of a database) the
          > refactorings on STATE1 to get STATE1A. We then compare STATE1A to
          > STATE2 to get the changes that were made without using refactorings.
          >
          > The actual script that we will run will constist of the refactorings,
          > the DDL changes between STATE1A and STATE2 and possibly some custom
          > scripts.

          Again, seems like a lot of programming work for not so much gain,
          compared to just having the refactoring tool write the part of the
          script it knows, and leaving me to write the rest.

          On one of my projects, in the last nine months I've written 23 database
          update scripts (out of some sixty-odd) that contain the word ALTER.
          Looking through them, four of these changes are simple column renames
          and suchlike that could be handled by a relatively unsophisticated tool,
          four more might be able to be handled by a relatively sophisitcated
          tool (due to multi-stage ALTER COLUMN stuff being required to deal with
          integrity issues and suchlike), and the other fifteen all would have
          required human intervention, because the SQL itself doesn't contain
          enough information about the meanings of the columns and changes.

          For such small returns, I don't even know that I'd use such a tool even
          if it did exist. I'd far rather have something that helps me deal nicely
          with, e.g., creating functions, where the syntax in PostgreSQL is overly
          complex, verbose, and repetitive, and where things I want to make sure
          I do (e.g., explicitly setting it volatile/stable/immutable) are not
          flagged when absent.

          cjs
          --
          Curt Sampson <cjs@...> +81 90 7737 2974
          Make up enjoying your city life...produced by BIC CAMERA
        Your message has been successfully submitted and would be delivered to recipients shortly.