Re: Database Version Management
- You've raised some serious issues here.
> The upgrade scripts may be generated, but the changes must have beenIt doesn't seem to me that everybody uses SQL*Plus to manage their
> applied to the database in the first place. Add the fact that the
> data migrations still need to be written, and you've still manually
> produced everything that will be generated, haven't you?
databases. And even if it is common with Oracle, it is not so with
other databases. So it seems quite possible to me that when people
change the structure of their database, they usually don't write the
DDL statements themselves. The data updating is something that needs
to be done by hand, but Wilfred's ideas about automating the
refactoring process (which I will respond to in a separate message) is
a way to reduce the amount of manual work.
Another reason I find storing a state file in source control a good
thing is the ability to examine the history of changes made to a
database. It could be as useful as it is with code source control. It
might be possible to implement a 'blame' command which shows for each
database object who changed it last and why.
As someone has pointed out to me, it's possible to implement a simple
source control system through the following steps:
Here's how a tool that does not use XML might work:
** Inputs **:
1) Existing "Upgrade Script" (base case is a blank file)
2) New "SQL block" that the user wishes to add
** Outputs **:
1) Updated "Upgrade Script" (concatenation of Inputs 1 and 2 with
input 2 surrounded by a some SQL IF statement to ensure it is never
executed again if upgrade script is re-executed. This may require
creating a table to keep track of versions - only done in the base case).
2) "New Database Script" (generated by tool after it executes the
"Upgrade Script" on a blank database).
This is a quite possible implementation, but I think a limiting one.
Without the ability to parse the information stored in source control
(in this case creation and upgrading scripts), there are very few
intelligent things a program can do. And I really want to try and
avoid the need to write and maintain an SQL parser for each flavor of
SQL in existence.
> The second problem I have with the approach is the lack of testing.I've taken a look at your blog and read some of the things you said
about pre- and post-condition testing. I really like the idea and I
have a few ideas about how to combine your theories with mine.
A far as I understand there are two categories of conditions (both
pre- and post-conditions) that you might want to test - structural and
data related. The structural conditions demand that the database
include certain tables, columns, priamry and foreign keys and so on.
The data related conditions might include some constraints on the data
size, type or relation between data in different parts of the
database. When changing a datatype of a column for instance, you'd
want to make sure you could actually fit the old data into the new type.
I believe that the structural pre-conditions and post-conditions could
be checked by comparing a database with a pre-saved state file,
perhaps with some constraints relaxed (we would not normally want to
compare the existance of indexes for instance as their existence or
absense may be due to a local optimization).
Lets say that STATE1 is what the database is supposed to be like when
we start the conversion process and STATE2 is what the database will
be like when we are done. There might be intermediate states we'll
have to go through, but I'll ignore it for now. Before converting we
compare the database with STATE1 and go over the differences to see
whether they are premitted or not. If there are differences, the
upgrade will not run. On the other hand, if all the differences are
tolerable, we move on to perform the actual conversion - running both
the automatic and the manual scripts. Now, after the conversion is
complete we can compare the resulting database with STATE2 and check
for differences. If there are any serious ones, the conversion will
have failed, probably because of a bug in our automatic tool. It might
also be useful to check the database is in the state we expect it to
be several times during the upgrade, between each pair of states if
we're moving from STATE1 to STATE5 through 2, 3 and 4.
The data pre- and post-conditions are something that will have to be
done by hand, but could be incorporated into the automatic scripts in
a similar manner as the update scripts. It's also possible to do some
research and define types of the various data conditions and create a
UI that will help build them with ease.
Like I said, I really like your ideas about database design by
contract and I think that they should play along nicely with what I
had in mind. It will greatly inhance Daversy's usefullness, so I will
look into implementing them.
> I also hope that one day I will take my own ideas on how databaseBy all means. I'm sure that that will do all of us a lot of good.
> upgrades should be produced and get an open source project togetehr
> to implement it for a wider audience.
- Hi Curt,
> But for a refactoring like this that I do "by hand," I'd be reallyI've been thinking about making Daversy an automatic refactoring tool
> impressed if your tool could generate this change set.
sometime in the future, but I wasn't quite sure how to approach this
I don't want Daversy to become a tool you use to manage your database,
because those tasks are very database specific and I'd like to try and
keep it neutral. On the other hand, database refatoring is something
that could really be useful. Your idea on just creating a tool that
could generate a refactoring script is a good one.
The tool will have to read the current database structure, present you
with several refactoring options and generate a script that will
perform them for you. Those scripts can then be edited by hand to
The problem arises when I think about incorporating such features with
source control for the database. If each and every change to the
database is made through one or more refactorings, there is no
problem. I could keeps the refactorings in the source control and
generate upgrade scritps as neccesary, but I don't think it will even
come to that - some changes are bound to be applied directly to the
database. And here's the problem - the upgrade scripts could either be
generated from the comparison of two states or from the refactorings,
because both might contain the same DDL statements.
On the other hand, the refactorings could be generated in such a way
that they handle only the data migrations and not the structure changes.
Hmm.. I think I have an idea.
Say we have a database in STATE1 and we want to upgrade it to STATE2.
We've made some of the changes through refactoring, and we want others
to be inferred from the differences between the two states.
We dry-run (in the tool, without the use of a database) the
refactorings on STATE1 to get STATE1A. We then compare STATE1A to
STATE2 to get the changes that were made without using refactorings.
The actual script that we will run will constist of the refactorings,
the DDL changes between STATE1A and STATE2 and possibly some custom
I've now become even more convinced that there is no way to separate a
source control tool for an automatic migration tool and a refactoring
tool. All three of them must be thought of and implemented together.
What do you think?
> I'd think it might be easier just to generate a single script that does
> whatever Davsey can figure out, and then point the developer at that to
> go and tweak it to do the right thing. Then the developer can deal with
> issues such as changing
> ALTER TABLE coupon ADD COLUMN shop_id NOT NULL REFERENCES shop
> ALTER TABLE coupon ADD COLUMN shop_id NULL REFERENCES shop
> -- Insert valid values into the column.
> ALTER TABLE coupon ALTER COLUMN shop_id SET NOT NULL
> And this, of course, gives the developer the ultimate in easy
> flexability, since he doesn't have to worry about separate patch files
> or anything like that; just generate and go. And it doesn't preclude
> adding refactoring tools later that could accept a command like "rename
> table foo to bar" and generate all of the right stuff for that.
> If the SQL was getting too nasty, I'd be really tempted to just build
> a DSL over top of it that still lets you get down and dirty for those
> moments when you have a really tricky upgrade.
> Curt Sampson <cjs@c...> +81 90 7737 2974
> Make up enjoying your city life...produced by BIC CAMERA
- On Fri, 11 Nov 2005, Eli Golovinsky wrote:
> This is a quite possible implementation, but I think a limiting one.I think you're just getting yourself into a whole other set of
> Without the ability to parse the information stored in source control
> (in this case creation and upgrading scripts), there are very few
> intelligent things a program can do. And I really want to try and
> avoid the need to write and maintain an SQL parser for each flavor of
> SQL in existence.
limitations. Even on incredibly simple refactorings, such as dropping
a column, Daversy already produces upgrade scripts containing things
such as, "-- Can't remove column DESCRIPTION from table SALGRADE : not
supported in SQLite." So what's a poor DBA to do? Write out by hand the
code to deal with this, of course. What happens when you get to more
sophisticated refactorings, such as the example I gave before, that
require multiple steps? What happens when functions and triggers need
to change the way they operate on tables you've changed? I think you'll
have a lot of cases that can only be dealt with by a DBA.
And this XML intermediate representation has its own problems as well,
because every DBMS has its own set of special features that now has to
be converted back and forth between its flavour of SQL and your XML. How
are you going to deal with things such as PostgreSQL table inheritance?
You get into a lot of special cases anyway, and you're going to end up
doing a lot more coding.
I've been looking at this issue for a few years now, and at this point
I'm entirely convinced that the only way to do this at reasonable
expense is to have a tool that will let you, regardless of what else you
do, continue to use direct-coded SQL statements particular to your DBMS.
Otherwise you're going to be overwhelmed with the programming effort for
your database connectors, or you're going to be limited to just what the
generic program supports, and you'll never be able to use some of the
special features of your particular DBMS.
> I've been thinking about making Daversy an automatic refactoringIf you're using my text-file methods, I don't see a problem. You start
> tool...creating a tool that could generate a refactoring script is a
> good [idea].
> The tool will have to read the current database structure, present you
> with several refactoring options and generate a script that will
> perform them for you. Those scripts can then be edited by hand to
> fine-tune them.
> The problem arises when I think about incorporating such features with
> source control for the database.
out with a file containing a CREATE TABLE statement. When you're done,
you have a changed version of that file, and an upgrade script. E.g.,
CREATE TABLE employee (
employee_id serial PRIMARY KEY,
lastname text NOT NULL,
firstname text NOT NULL
You apply two "rename column" refactorings, using the automated too. Now
you have one changed file and one new file:
CREATE TABLE employee (
employee_id serial PRIMARY KEY,
family_name text NOT NULL,
given_name text NOT NULL
ALTER TABLE employee RENAME COLUMN lastname TO family_name;
ALTER TABLE employee RENAME COLUMN firstname TO given_name;
Add the new file to revision control, commit the changes, and you're
done. Unless the update script is not correct, of course, in which case
you can tweak it as necessary. Perhaps you want to add some tests or
queries before and after the actual upgrade itself, so that you can confirm
that the upgrade script is working properly.
If you're using Subversion, finding out what changed and when is dead
easy; just diff the changeset and you'll get both the changes to the
CREATE TABLE statement and the change script for moving from the
previous revision to the next. If you need to look at more changes, just
diff two revisions several changesets apart, and you'll get what you
You're right that writing parsers for every version of SQL out there
is going to be a bit of work, but I'm not entirely sure it's as bad
as you make it out to be. After all, for anything outside of what the
refactoring tool actually refactors, it doesn't need to understand the
semantics of the code; it just needs to be able to parse it and spit
it back out again.
But I think that it might possibly be more productive to approach from
the other direction: write a domain specific language that can be
combined with raw SQL that will encapsulate the things common to all
DBMSes, and then just use the DSL and SQL together in your text files.
The refactoring tool uses the DSL and ignores the SQL. And the DSL alone
could save a lot of tedious typing effort, given how verbose SQLs tend
to be to express relatively simple concepts.
> Say we have a database in STATE1 and we want to upgrade it to STATE2.Again, seems like a lot of programming work for not so much gain,
> We've made some of the changes through refactoring, and we want others
> to be inferred from the differences between the two states.
> We dry-run (in the tool, without the use of a database) the
> refactorings on STATE1 to get STATE1A. We then compare STATE1A to
> STATE2 to get the changes that were made without using refactorings.
> The actual script that we will run will constist of the refactorings,
> the DDL changes between STATE1A and STATE2 and possibly some custom
compared to just having the refactoring tool write the part of the
script it knows, and leaving me to write the rest.
On one of my projects, in the last nine months I've written 23 database
update scripts (out of some sixty-odd) that contain the word ALTER.
Looking through them, four of these changes are simple column renames
and suchlike that could be handled by a relatively unsophisticated tool,
four more might be able to be handled by a relatively sophisitcated
tool (due to multi-stage ALTER COLUMN stuff being required to deal with
integrity issues and suchlike), and the other fifteen all would have
required human intervention, because the SQL itself doesn't contain
enough information about the meanings of the columns and changes.
For such small returns, I don't even know that I'd use such a tool even
if it did exist. I'd far rather have something that helps me deal nicely
with, e.g., creating functions, where the syntax in PostgreSQL is overly
complex, verbose, and repetitive, and where things I want to make sure
I do (e.g., explicitly setting it volatile/stable/immutable) are not
flagged when absent.
Curt Sampson <cjs@...> +81 90 7737 2974
Make up enjoying your city life...produced by BIC CAMERA