Re: Database Version Management
- Hi Curt,
> But for a refactoring like this that I do "by hand," I'd be reallyI've been thinking about making Daversy an automatic refactoring tool
> impressed if your tool could generate this change set.
sometime in the future, but I wasn't quite sure how to approach this
I don't want Daversy to become a tool you use to manage your database,
because those tasks are very database specific and I'd like to try and
keep it neutral. On the other hand, database refatoring is something
that could really be useful. Your idea on just creating a tool that
could generate a refactoring script is a good one.
The tool will have to read the current database structure, present you
with several refactoring options and generate a script that will
perform them for you. Those scripts can then be edited by hand to
The problem arises when I think about incorporating such features with
source control for the database. If each and every change to the
database is made through one or more refactorings, there is no
problem. I could keeps the refactorings in the source control and
generate upgrade scritps as neccesary, but I don't think it will even
come to that - some changes are bound to be applied directly to the
database. And here's the problem - the upgrade scripts could either be
generated from the comparison of two states or from the refactorings,
because both might contain the same DDL statements.
On the other hand, the refactorings could be generated in such a way
that they handle only the data migrations and not the structure changes.
Hmm.. I think I have an idea.
Say we have a database in STATE1 and we want to upgrade it to STATE2.
We've made some of the changes through refactoring, and we want others
to be inferred from the differences between the two states.
We dry-run (in the tool, without the use of a database) the
refactorings on STATE1 to get STATE1A. We then compare STATE1A to
STATE2 to get the changes that were made without using refactorings.
The actual script that we will run will constist of the refactorings,
the DDL changes between STATE1A and STATE2 and possibly some custom
I've now become even more convinced that there is no way to separate a
source control tool for an automatic migration tool and a refactoring
tool. All three of them must be thought of and implemented together.
What do you think?
> I'd think it might be easier just to generate a single script that does
> whatever Davsey can figure out, and then point the developer at that to
> go and tweak it to do the right thing. Then the developer can deal with
> issues such as changing
> ALTER TABLE coupon ADD COLUMN shop_id NOT NULL REFERENCES shop
> ALTER TABLE coupon ADD COLUMN shop_id NULL REFERENCES shop
> -- Insert valid values into the column.
> ALTER TABLE coupon ALTER COLUMN shop_id SET NOT NULL
> And this, of course, gives the developer the ultimate in easy
> flexability, since he doesn't have to worry about separate patch files
> or anything like that; just generate and go. And it doesn't preclude
> adding refactoring tools later that could accept a command like "rename
> table foo to bar" and generate all of the right stuff for that.
> If the SQL was getting too nasty, I'd be really tempted to just build
> a DSL over top of it that still lets you get down and dirty for those
> moments when you have a really tricky upgrade.
> Curt Sampson <cjs@c...> +81 90 7737 2974
> Make up enjoying your city life...produced by BIC CAMERA
- On Fri, 11 Nov 2005, Eli Golovinsky wrote:
> This is a quite possible implementation, but I think a limiting one.I think you're just getting yourself into a whole other set of
> Without the ability to parse the information stored in source control
> (in this case creation and upgrading scripts), there are very few
> intelligent things a program can do. And I really want to try and
> avoid the need to write and maintain an SQL parser for each flavor of
> SQL in existence.
limitations. Even on incredibly simple refactorings, such as dropping
a column, Daversy already produces upgrade scripts containing things
such as, "-- Can't remove column DESCRIPTION from table SALGRADE : not
supported in SQLite." So what's a poor DBA to do? Write out by hand the
code to deal with this, of course. What happens when you get to more
sophisticated refactorings, such as the example I gave before, that
require multiple steps? What happens when functions and triggers need
to change the way they operate on tables you've changed? I think you'll
have a lot of cases that can only be dealt with by a DBA.
And this XML intermediate representation has its own problems as well,
because every DBMS has its own set of special features that now has to
be converted back and forth between its flavour of SQL and your XML. How
are you going to deal with things such as PostgreSQL table inheritance?
You get into a lot of special cases anyway, and you're going to end up
doing a lot more coding.
I've been looking at this issue for a few years now, and at this point
I'm entirely convinced that the only way to do this at reasonable
expense is to have a tool that will let you, regardless of what else you
do, continue to use direct-coded SQL statements particular to your DBMS.
Otherwise you're going to be overwhelmed with the programming effort for
your database connectors, or you're going to be limited to just what the
generic program supports, and you'll never be able to use some of the
special features of your particular DBMS.
> I've been thinking about making Daversy an automatic refactoringIf you're using my text-file methods, I don't see a problem. You start
> tool...creating a tool that could generate a refactoring script is a
> good [idea].
> The tool will have to read the current database structure, present you
> with several refactoring options and generate a script that will
> perform them for you. Those scripts can then be edited by hand to
> fine-tune them.
> The problem arises when I think about incorporating such features with
> source control for the database.
out with a file containing a CREATE TABLE statement. When you're done,
you have a changed version of that file, and an upgrade script. E.g.,
CREATE TABLE employee (
employee_id serial PRIMARY KEY,
lastname text NOT NULL,
firstname text NOT NULL
You apply two "rename column" refactorings, using the automated too. Now
you have one changed file and one new file:
CREATE TABLE employee (
employee_id serial PRIMARY KEY,
family_name text NOT NULL,
given_name text NOT NULL
ALTER TABLE employee RENAME COLUMN lastname TO family_name;
ALTER TABLE employee RENAME COLUMN firstname TO given_name;
Add the new file to revision control, commit the changes, and you're
done. Unless the update script is not correct, of course, in which case
you can tweak it as necessary. Perhaps you want to add some tests or
queries before and after the actual upgrade itself, so that you can confirm
that the upgrade script is working properly.
If you're using Subversion, finding out what changed and when is dead
easy; just diff the changeset and you'll get both the changes to the
CREATE TABLE statement and the change script for moving from the
previous revision to the next. If you need to look at more changes, just
diff two revisions several changesets apart, and you'll get what you
You're right that writing parsers for every version of SQL out there
is going to be a bit of work, but I'm not entirely sure it's as bad
as you make it out to be. After all, for anything outside of what the
refactoring tool actually refactors, it doesn't need to understand the
semantics of the code; it just needs to be able to parse it and spit
it back out again.
But I think that it might possibly be more productive to approach from
the other direction: write a domain specific language that can be
combined with raw SQL that will encapsulate the things common to all
DBMSes, and then just use the DSL and SQL together in your text files.
The refactoring tool uses the DSL and ignores the SQL. And the DSL alone
could save a lot of tedious typing effort, given how verbose SQLs tend
to be to express relatively simple concepts.
> Say we have a database in STATE1 and we want to upgrade it to STATE2.Again, seems like a lot of programming work for not so much gain,
> We've made some of the changes through refactoring, and we want others
> to be inferred from the differences between the two states.
> We dry-run (in the tool, without the use of a database) the
> refactorings on STATE1 to get STATE1A. We then compare STATE1A to
> STATE2 to get the changes that were made without using refactorings.
> The actual script that we will run will constist of the refactorings,
> the DDL changes between STATE1A and STATE2 and possibly some custom
compared to just having the refactoring tool write the part of the
script it knows, and leaving me to write the rest.
On one of my projects, in the last nine months I've written 23 database
update scripts (out of some sixty-odd) that contain the word ALTER.
Looking through them, four of these changes are simple column renames
and suchlike that could be handled by a relatively unsophisticated tool,
four more might be able to be handled by a relatively sophisitcated
tool (due to multi-stage ALTER COLUMN stuff being required to deal with
integrity issues and suchlike), and the other fifteen all would have
required human intervention, because the SQL itself doesn't contain
enough information about the meanings of the columns and changes.
For such small returns, I don't even know that I'd use such a tool even
if it did exist. I'd far rather have something that helps me deal nicely
with, e.g., creating functions, where the syntax in PostgreSQL is overly
complex, verbose, and repetitive, and where things I want to make sure
I do (e.g., explicitly setting it volatile/stable/immutable) are not
flagged when absent.
Curt Sampson <cjs@...> +81 90 7737 2974
Make up enjoying your city life...produced by BIC CAMERA