Loading ...
Sorry, an error occurred while loading the content.
 

Re: Database Version Management

Expand Messages
  • Eli Golovinsky
    Hi Curt, ... I ve been thinking about making Daversy an automatic refactoring tool sometime in the future, but I wasn t quite sure how to approach this issue.
    Message 1 of 9 , Nov 11, 2005
      Hi Curt,

      > But for a refactoring like this that I do "by hand," I'd be really
      > impressed if your tool could generate this change set.

      I've been thinking about making Daversy an automatic refactoring tool
      sometime in the future, but I wasn't quite sure how to approach this
      issue.

      I don't want Daversy to become a tool you use to manage your database,
      because those tasks are very database specific and I'd like to try and
      keep it neutral. On the other hand, database refatoring is something
      that could really be useful. Your idea on just creating a tool that
      could generate a refactoring script is a good one.

      The tool will have to read the current database structure, present you
      with several refactoring options and generate a script that will
      perform them for you. Those scripts can then be edited by hand to
      fine-tune them.

      The problem arises when I think about incorporating such features with
      source control for the database. If each and every change to the
      database is made through one or more refactorings, there is no
      problem. I could keeps the refactorings in the source control and
      generate upgrade scritps as neccesary, but I don't think it will even
      come to that - some changes are bound to be applied directly to the
      database. And here's the problem - the upgrade scripts could either be
      generated from the comparison of two states or from the refactorings,
      because both might contain the same DDL statements.

      On the other hand, the refactorings could be generated in such a way
      that they handle only the data migrations and not the structure changes.

      Hmm.. I think I have an idea.

      Say we have a database in STATE1 and we want to upgrade it to STATE2.
      We've made some of the changes through refactoring, and we want others
      to be inferred from the differences between the two states.

      We dry-run (in the tool, without the use of a database) the
      refactorings on STATE1 to get STATE1A. We then compare STATE1A to
      STATE2 to get the changes that were made without using refactorings.

      The actual script that we will run will constist of the refactorings,
      the DDL changes between STATE1A and STATE2 and possibly some custom
      scripts.

      I've now become even more convinced that there is no way to separate a
      source control tool for an automatic migration tool and a refactoring
      tool. All three of them must be thought of and implemented together.

      What do you think?

      >
      > I'd think it might be easier just to generate a single script that does
      > whatever Davsey can figure out, and then point the developer at that to
      > go and tweak it to do the right thing. Then the developer can deal with
      > issues such as changing
      >
      > ALTER TABLE coupon ADD COLUMN shop_id NOT NULL REFERENCES shop
      >
      > to
      >
      > ALTER TABLE coupon ADD COLUMN shop_id NULL REFERENCES shop
      > -- Insert valid values into the column.
      > ALTER TABLE coupon ALTER COLUMN shop_id SET NOT NULL
      >
      > And this, of course, gives the developer the ultimate in easy
      > flexability, since he doesn't have to worry about separate patch files
      > or anything like that; just generate and go. And it doesn't preclude
      > adding refactoring tools later that could accept a command like "rename
      > table foo to bar" and generate all of the right stuff for that.
      >
      > If the SQL was getting too nasty, I'd be really tempted to just build
      > a DSL over top of it that still lets you get down and dirty for those
      > moments when you have a really tricky upgrade.
      >
      > cjs
      > --
      > Curt Sampson <cjs@c...> +81 90 7737 2974
      > Make up enjoying your city life...produced by BIC CAMERA
      >
    • Curt Sampson
      ... I think you re just getting yourself into a whole other set of limitations. Even on incredibly simple refactorings, such as dropping a column, Daversy
      Message 2 of 9 , Nov 13, 2005
        On Fri, 11 Nov 2005, Eli Golovinsky wrote:

        > This is a quite possible implementation, but I think a limiting one.
        > Without the ability to parse the information stored in source control
        > (in this case creation and upgrading scripts), there are very few
        > intelligent things a program can do. And I really want to try and
        > avoid the need to write and maintain an SQL parser for each flavor of
        > SQL in existence.

        I think you're just getting yourself into a whole other set of
        limitations. Even on incredibly simple refactorings, such as dropping
        a column, Daversy already produces upgrade scripts containing things
        such as, "-- Can't remove column DESCRIPTION from table SALGRADE : not
        supported in SQLite." So what's a poor DBA to do? Write out by hand the
        code to deal with this, of course. What happens when you get to more
        sophisticated refactorings, such as the example I gave before, that
        require multiple steps? What happens when functions and triggers need
        to change the way they operate on tables you've changed? I think you'll
        have a lot of cases that can only be dealt with by a DBA.

        And this XML intermediate representation has its own problems as well,
        because every DBMS has its own set of special features that now has to
        be converted back and forth between its flavour of SQL and your XML. How
        are you going to deal with things such as PostgreSQL table inheritance?
        You get into a lot of special cases anyway, and you're going to end up
        doing a lot more coding.

        I've been looking at this issue for a few years now, and at this point
        I'm entirely convinced that the only way to do this at reasonable
        expense is to have a tool that will let you, regardless of what else you
        do, continue to use direct-coded SQL statements particular to your DBMS.
        Otherwise you're going to be overwhelmed with the programming effort for
        your database connectors, or you're going to be limited to just what the
        generic program supports, and you'll never be able to use some of the
        special features of your particular DBMS.

        > I've been thinking about making Daversy an automatic refactoring
        > tool...creating a tool that could generate a refactoring script is a
        > good [idea].
        >
        > The tool will have to read the current database structure, present you
        > with several refactoring options and generate a script that will
        > perform them for you. Those scripts can then be edited by hand to
        > fine-tune them.
        >
        > The problem arises when I think about incorporating such features with
        > source control for the database.

        If you're using my text-file methods, I don't see a problem. You start
        out with a file containing a CREATE TABLE statement. When you're done,
        you have a changed version of that file, and an upgrade script. E.g.,
        starting with:

        employee.sql:

        CREATE TABLE employee (
        employee_id serial PRIMARY KEY,
        lastname text NOT NULL,
        firstname text NOT NULL
        );

        You apply two "rename column" refactorings, using the automated too. Now
        you have one changed file and one new file:

        employee.sql

        CREATE TABLE employee (
        employee_id serial PRIMARY KEY,
        family_name text NOT NULL,
        given_name text NOT NULL
        );

        update-2005-11-14.sql:

        ALTER TABLE employee RENAME COLUMN lastname TO family_name;
        ALTER TABLE employee RENAME COLUMN firstname TO given_name;

        Add the new file to revision control, commit the changes, and you're
        done. Unless the update script is not correct, of course, in which case
        you can tweak it as necessary. Perhaps you want to add some tests or
        queries before and after the actual upgrade itself, so that you can confirm
        that the upgrade script is working properly.

        If you're using Subversion, finding out what changed and when is dead
        easy; just diff the changeset and you'll get both the changes to the
        CREATE TABLE statement and the change script for moving from the
        previous revision to the next. If you need to look at more changes, just
        diff two revisions several changesets apart, and you'll get what you
        need.

        You're right that writing parsers for every version of SQL out there
        is going to be a bit of work, but I'm not entirely sure it's as bad
        as you make it out to be. After all, for anything outside of what the
        refactoring tool actually refactors, it doesn't need to understand the
        semantics of the code; it just needs to be able to parse it and spit
        it back out again.

        But I think that it might possibly be more productive to approach from
        the other direction: write a domain specific language that can be
        combined with raw SQL that will encapsulate the things common to all
        DBMSes, and then just use the DSL and SQL together in your text files.
        The refactoring tool uses the DSL and ignores the SQL. And the DSL alone
        could save a lot of tedious typing effort, given how verbose SQLs tend
        to be to express relatively simple concepts.

        > Say we have a database in STATE1 and we want to upgrade it to STATE2.
        > We've made some of the changes through refactoring, and we want others
        > to be inferred from the differences between the two states.
        >
        > We dry-run (in the tool, without the use of a database) the
        > refactorings on STATE1 to get STATE1A. We then compare STATE1A to
        > STATE2 to get the changes that were made without using refactorings.
        >
        > The actual script that we will run will constist of the refactorings,
        > the DDL changes between STATE1A and STATE2 and possibly some custom
        > scripts.

        Again, seems like a lot of programming work for not so much gain,
        compared to just having the refactoring tool write the part of the
        script it knows, and leaving me to write the rest.

        On one of my projects, in the last nine months I've written 23 database
        update scripts (out of some sixty-odd) that contain the word ALTER.
        Looking through them, four of these changes are simple column renames
        and suchlike that could be handled by a relatively unsophisticated tool,
        four more might be able to be handled by a relatively sophisitcated
        tool (due to multi-stage ALTER COLUMN stuff being required to deal with
        integrity issues and suchlike), and the other fifteen all would have
        required human intervention, because the SQL itself doesn't contain
        enough information about the meanings of the columns and changes.

        For such small returns, I don't even know that I'd use such a tool even
        if it did exist. I'd far rather have something that helps me deal nicely
        with, e.g., creating functions, where the syntax in PostgreSQL is overly
        complex, verbose, and repetitive, and where things I want to make sure
        I do (e.g., explicitly setting it volatile/stable/immutable) are not
        flagged when absent.

        cjs
        --
        Curt Sampson <cjs@...> +81 90 7737 2974
        Make up enjoying your city life...produced by BIC CAMERA
      Your message has been successfully submitted and would be delivered to recipients shortly.