Loading ...
Sorry, an error occurred while loading the content.

refactoring databases

Expand Messages
  • Perrin Harkins
    Hi Rob, I was talking to some of my co-workers the other day about managing changes to databases on a system that has a version in production already. There
    Message 1 of 11 , Feb 17, 2005
    • 0 Attachment
      Hi Rob,

      I was talking to some of my co-workers the other day about managing
      changes to databases on a system that has a version in production
      already. There was a thread I remember reading, possibly on the
      mod_perl list, and possibly by you, about a technique for managing
      schema changes by treating everything after the initial design as an
      update script, i.e. always using ALTER TABLE and the like. I recall
      someone complaining about how that would become hard to manage and there
      was discussion of rolling up all the changes every now and then into a
      new starting point.

      Does this sound like something you posted? I can't seem to find it, and
      I wanted to re-read it. Even if it wasn't you, any advice about
      managing database changes would be welcome, since I remember you saying
      you made lots of ongoing schema changes on your projects even after
      launching them.

      - Perrin
    • Matthew Albright
      Hmmm, that sounds very similar to something I ve done for awhile now (at 2 different jobs). I don t recall posting about it, but I (or one of my cow-orkers)
      Message 2 of 11 , Feb 17, 2005
      • 0 Attachment
        Hmmm, that sounds very similar to something I've done for awhile now (at 2
        different jobs). I don't recall posting about it, but I (or one of my
        cow-orkers) might have...

        It's easier to show some code that would work (as opposed to describing it in
        words):

        sub init_schema_change { # Call this once
        $db->create_table(schema_change => [
        'id serial primary_key',
        'name varchar not null',
        'ts timestamp not null default now()',
        ]);
        }

        sub schema_change {
        my ($name, $change_sub) = @_;

        return if $db->scalar(
        "select count(*) from schema_change where name = ?",
        $name,
        );

        $db->run_in_transaction(sub {
        $change_sub->($db);
        $db->execute("insert into schema_change set name = ?", $name);
        });
        }

        So, this would be called like this, from a file we call update_db.pl:

        schema_change(sub {
        my ($db) = @_;
        $db->execute('alter table foo add field bar integer not null');
        $db->execute('alter table baz add field foo_id integer');
        });

        What this does is give you a table that grows over time that indicates exactly
        which schema changes have been done to the database (and when). You have a
        growing executable update_db.pl file that contains a bunch of schema changes
        that will only be executed once, and can be run at upgrade time to make sure
        the database is up to date. And if any part of the schema change fails, the
        transaction rolls back and the row is not inserted into the schema_change
        table, so you can try again after you fix your code.

        So, you can ship out this update_db.pl file to your customers and be confident
        that no matter how old their database schema is, it will be up to date after
        this file is run.

        And like you said, every so often you do a check of all your customers, and
        figure out the oldest version of code that is out in the wild that has a chance
        of upgrading in the future. Get a test system, and upgrade it to that version,
        dump the schema of the database, and use that as a starting point, deleting
        everything older than that point in your update_db.pl file.

        If anyone is interested, I can share some additional thoughts on data migration
        landmines to watch out for...

        matt

        --- Perrin Harkins <perrin@...> wrote:

        > Hi Rob,
        >
        > I was talking to some of my co-workers the other day about managing
        > changes to databases on a system that has a version in production
        > already. There was a thread I remember reading, possibly on the
        > mod_perl list, and possibly by you, about a technique for managing
        > schema changes by treating everything after the initial design as an
        > update script, i.e. always using ALTER TABLE and the like. I recall
        > someone complaining about how that would become hard to manage and there
        > was discussion of rolling up all the changes every now and then into a
        > new starting point.
        >
        > Does this sound like something you posted? I can't seem to find it, and
        > I wanted to re-read it. Even if it wasn't you, any advice about
        > managing database changes would be welcome, since I remember you saying
        > you made lots of ongoing schema changes on your projects even after
        > launching them.
        >
        > - Perrin
        >
        >
      • Johan Lindstrom
        ... That would be very interesting! This is my take: In my experience, doing this automatically is kind of dangerous. We use a strikingly similar approcach
        Message 3 of 11 , Feb 18, 2005
        • 0 Attachment
          At 08:05 2005-02-18, Matthew Albright wrote:
          >If anyone is interested, I can share some additional thoughts on data
          >migration
          >landmines to watch out for...

          That would be very interesting!

          This is my take:

          In my experience, doing this automatically is kind of dangerous.

          We use a strikingly similar approcach with a "log table" called
          misc_version_feature. We record all changes to structure or data in "delta
          files". This is done manually, because we haven't found any solid way to do
          this automatically. Tools like DB Artisan doesn't quite cut it, and some
          things are really difficult to do even manually.

          The application of changes in the database is done manually also, mainly
          because we can't at _all_ times be certain that someone didn't fiddle with
          something during an emergency fixup. And in many databases (Sybase,
          Oracle), DDL can't be done in a transaction and most of the time a schema
          change involves both a structure change and some kind of data
          transformation to support this. Sometimes it requires two alters with a
          data transformation in between.

          On a general note, having developers change schemas with no sense of the
          volumes of data in the tables can be very bad. Doing an alter on a table
          with a thousand rows is different from doing it with tens of millions of
          rows and a brief maintenance window. In that case the upgrade may have to
          be performed in smaller steps where not all data is migrated at once.
          Sometimes one can prepare the migration and just do the most critical part
          during the downtime.

          Of course this depends on the size of the shop and how well the sysadmins
          communicate with the developers and how well versed the developers are with
          db-related concepts. But the developer mindset is often sligthly different
          from the dba mindset, given the different sets of experience they have.


          /J

          -------- ------ ---- --- -- -- -- - - - - -
          Johan Lindström Sourcerer @ Boss Casinos johanl AT DarSerMan.com

          Latest bookmark: "TCP Connection Passing"
          http://tcpcp.sourceforge.net/
          dmoz: /Computers/Programming/Languages/JavaScript/ 12
        • Adrian Howard
          ... [snip] A me too from over here. Done similar on a couple of projects. We kept the schema change data outside the database - that way we could keep build
          Message 4 of 11 , Feb 18, 2005
          • 0 Attachment
            On 18 Feb 2005, at 07:05, Matthew Albright wrote:

            > Hmmm, that sounds very similar to something I've done for awhile now
            > (at 2
            > different jobs). I don't recall posting about it, but I (or one of my
            > cow-orkers) might have...
            >
            > It's easier to show some code that would work (as opposed to
            > describing it in
            > words):
            [snip]

            A "me too" from over here. Done similar on a couple of projects. We
            kept the schema change data outside the database - that way we could
            keep "build the database from scratch" and "update the database"
            sharing more code. Although I can see wrapping everything inside the
            transaction is a nice feature.

            SQLFairy can also be a useful tool for extracting schema changes from a
            DB that's been tweaked manually.

            [snip]
            > If anyone is interested, I can share some additional thoughts on data
            > migration
            > landmines to watch out for...
            [snip]

            I'm interested.

            To start the ball rolling... One of the problems I had was on-site DBAs
            tweaking the schema and then howling when we broke their changes (and,
            indeed, everything else until we rolled back to the old schema :-)

            We added an md5 hash of an SQLFairy schema dump of the schema before
            each change so we could check that we were fiddling with the correct
            DB.

            Adrian
          • Rob Nagler
            Hi Perrin, It s good to hear from you. ... Probably. It s certainly something we do. I ve included some thoughts below. ... This is like the code we have,
            Message 5 of 11 , Feb 18, 2005
            • 0 Attachment
              Hi Perrin,

              It's good to hear from you.

              > Does this sound like something you posted?

              Probably. It's certainly something we do. I've included some
              thoughts below.

              Matthew Albright writes:
              > It's easier to show some code that would work (as opposed to describing it in
              > words):

              This is like the code we have, but more complex. We don't actually
              store the schema change scripts in the database. We do have a
              db_upgrade_t (free with bOP), with a uniqueness constraint on
              the version field which is populated with the CVS $Revision$ field.

              I've appended one of our upgrade classes. We've done a bad job of
              sharing upgrade code across our applications. It's something on my
              todo list, and I finally turned it into a class a little while ago.
              The application I ripped this from doesn't do an export before the
              upgrade, because it is a huge database. All our other db-upgrade
              scripts do a full export before doing anything.

              > What this does is give you a table that grows over time that
              > indicates exactly which schema changes have been done to the
              > database (and when).

              We use CVS for this. The main problem is that many schema upgrades
              aren't schema upgrades at all. For example, we recently introduced an
              options database in bivio.com. Option ticker symbols are, well,
              stupid. Most brokers don't track historical option prices. However,
              we're an accounting service, and we have to -- Quicken doesn't, but
              there ya go. ;-) Anyways, we had to upgrade the names so they were
              temporally and spatially unique. We consider this a schema change,
              and programmed upgrade_1_options_tickers (see below for why this is
              named that way), even though there was no DDL associated with the
              change.

              > And if any part of the schema change fails, the transaction rolls
              > back and the row is not inserted into the schema_change table, so
              > you can try again after you fix your code.

              This doesn't work with Oracle as mentioned by Johan below.

              > So, you can ship out this update_db.pl file to your customers and be
              > confident that no matter how old their database schema is, it will
              > be up to date after this file is run.

              This is important, and we do this. You test this script to death, and
              then you still get it wrong sometimes. :-(

              > And like you said, every so often you do a check of all your
              > customers, and figure out the oldest version of code that is out in
              > the wild that has a chance of upgrading in the future.

              This is simply not an option we offer our customers. We could do
              this, I guess, but I don't believe in multiple versions in the field
              running wild. At most two: test and prod.

              > If anyone is interested, I can share some additional thoughts on
              > data migration landmines to watch out for...

              Please do.

              Johan Lindstrom writes:
              > In my experience, doing this automatically is kind of dangerous.

              Please elaborate why this type of automation is dangerous? We've had
              about 200+ schema upgrades to bivio.com, 22 upgrades to colosla.org,
              and 26 for myBallot at ieee.org, which is still in beta. I can't
              release stats for other sites, but the numbers are similar. I can get
              these stats in a few seconds from CVS. It's useful data.

              > The application of changes in the database is done manually also, mainly
              > because we can't at _all_ times be certain that someone didn't fiddle with
              > something during an emergency fixup.

              If someone fiddles, and they don't check-in their changes, they burn
              for fiddling. Everything is checked in to CVS. No exceptions.

              > And in many databases (Sybase, Oracle), DDL can't be done in a
              > transaction and most of the time a schema change involves both a
              > structure change and some kind of data transformation to support
              > this. Sometimes it requires two alters with a data transformation in
              > between.

              Yup, all the more reason to do this programmatically, and test the
              heck out of it. Here's our normal procedure:

              1) Develop the procedure, update the code, test on your dev database
              2) Bring over a copy of the production database (if possible), and
              test on your dev database.
              3) Release, and run on test system.
              4) Release, and run on production.

              If we are being particularly paranoid, we ask everybody to upgrade
              their dev databases before going to test.

              Also, we do parallel development. There's upgrade_1_*, upgrade_2_*,
              etc. You have to have a system for this so people can apply updates
              incrementally to their dev and the test databases. Once released to
              production, we apply all updates with one command. If one of the
              commands, blows, you can initiate individual methods by hand.
              (Bivio::ShellUtil makes this easy for us, but that's another story.)

              > On a general note, having developers change schemas with no sense of the
              > volumes of data in the tables can be very bad.

              Having developers with no sense of volumes of data is very bad. We
              all learn this by making that mistake a few times. It's also bad to
              have the developers test on faster development machines without
              knowing there's going to be a slow down when it runs on slower
              production machines.

              > Sometimes one can prepare the migration and just do the most critical part
              > during the downtime.

              We try to do this, too.

              > Of course this depends on the size of the shop and how well the sysadmins
              > communicate with the developers and how well versed the developers are with
              > db-related concepts.

              This is why it is extremely important for developers to be dbas and
              sysadmins, too. They have to know that problems can occur at all
              levels in the system. Indeed, most hard problems occur between large
              subsystems, such as, DBMS and application or application and MTA.

              At bivio, programmers are responsible for the end-to-end service we
              offer our customers: from conception to support with all the
              customer-irrelevant bits inbetween, like programming and sysadmin. We
              often are responsible for marketing and sales, too. It's humbling to
              try to sell your beautifully crafted DB-backed application to someone
              who can't even spell DB, and doesn't want to. They just want to have
              their books balance at the end of the year, and they want their tax
              return to print out (even when Apple's Preview for PDF doesn't handle
              FDF, and Acrobat 7.0 is not downloadable to Windows 98, and ... -- ok,
              it's tax season, and we'll feeling the pain right now. :-)

              Some other things we've learned, not all of which have been put in
              place. Some of these suggestions will no doubt generate shrieks from
              DBAs. :-) Most of these will help greatly if you like the XP
              development model.

              * Universally unique sequence ids. This is very useful for debugging,
              and scalability.

              * Generate the entire schema from your application models. We still
              don't do this, and it means we have to maintain three files: *.sql,
              perl models, and DbUpgrade.pm.

              * Release early, release often. This is the best way to ensure schema
              changes remain small, and lead to minimal system disruption, when
              something goes wrong.

              * With big databases, buy a new machine when you upgrade the DBMS
              software. Use Oracle's standby database feature to minimize
              downtime. Don't do ANY changes (schema or app) when you do this.

              * Automate the things you do most often. If you release often (did
              I mention this?), make sure it takes about 1 or 2 minutes to cut
              a release, and 1 or 2 minutes to install it.

              * Use your app to validate all data before it hits the database except
              unqiueness constraints. Use the db constraints as a backup for you
              software. Db consistency is the most important issue you face. If
              you see a

              * Disk is cheap. Don't worry about how the database stores things
              until it becomes a problem. LOBs are still a problem with DBMSes,
              and we have learned that lesson the hard way.

              * Add indexes wherever you can, but don't preoptimize. Solve
              performance problems as you encounter them. Don't stress about
              the physical structure. It's probably not an issue at all.

              * The DBMS is your bottleneck when it comes to scalability and
              maintainability. Don't use stored procedures. Don't use special
              features. I am still kicking myself over the use of Oracle's CONNECT
              BY. It was a stupid preoptimization that was not necessary. As it turns
              out, Oracle 8i's implementation of self-related tables is horribly
              buggy. We are slowly in the process of removing this
              preoptimization to no performance detriment, and probably to the
              contrary, a general speed up.

              * Automate all tests and run them every day, automatically. Tests
              which aren't automated don't get run.

              This is our experience. Here's the performance:

              http://www.bivio.com/club_index/accounting/investments

              The database server is a 450mhz quad processor with 2GB of memory and
              10K disks running Oracle 8i. The schema that produces this table is
              fully normalized. You are seeing the sum of all transactions in
              tables that hold millions of transactions for all of our customers.
              The prices table has hundreds of millions of records. Nothing is
              cached except what Oracle caches.

              Rob
              ----------------------------------------------------------------
              # Copyright (c) 2005 bivio Software, Inc. All Rights Reserved.
              package Bivio::SQL::DbUpgrade;
              use strict;
              $Bivio::SQL::DbUpgrade::VERSION = sprintf('%d.%02d', q$Revision: 2.22 $ =~ /\d+/g);
              $_ = $Bivio::SQL::DbUpgrade::VERSION;

              use Bivio::ShellUtil;
              @Bivio::SQL::DbUpgrade::ISA = ('Bivio::ShellUtil');

              sub USAGE {
              return <<'EOF';
              usage: s-db-upgrade [options] command [args...]
              commands:
              do_all -- run upgrade method
              upgrade_1_<name> -- first upgrade method
              EOF
              }

              use Bivio::IO::File;
              use Bivio::IO::Ref;
              use Bivio::Type::Date;
              use Bivio::Type::DateInterval;
              use Bivio::Type::DateTime;

              sub do_all {
              my($self) = @_;
              local($|) = 1;
              $self->initialize_ui;
              my($req) = $self->get_request;
              my($upgrade) = Bivio::Biz::Model->new($req, 'DbUpgrade');
              my($v) = $Bivio::SQL::DbUpgrade::VERSION;
              $self->usage("$v already ran")
              if $upgrade->unauth_load(version => $v);
              foreach my $method (_methods()) {
              $self->print("Calling: $method\n");
              $self->$method();
              }
              # creates a completion entry in db to prevent re-running
              $upgrade->create({
              version => $v,
              run_date_time => Bivio::Type::DateTime->now
              });
              return "*** done ***\n";
              }

              sub upgrade_1_some_change {
              my($self) = @_;
              "do something here";
              return;
              }


              sub _add_column {
              my($table, $column, $type, $value) = @_;
              print "Adding $table.$column\n";
              _do_sql("
              ALTER TABLE $table
              ADD $column $type
              /
              ");
              return unless defined($value);

              # Insert default values
              print "Inserting default into $table.$column\n";
              _do_sql("
              update $table set $column = $value
              /");

              return;
              }

              sub _backup_model {
              my($model, $order_by, $req) = @_;
              my($rows) = [];
              my($m) = Bivio::Biz::Model->new($req, $model);
              for ($m->unauth_iterate_start($order_by); $m->iterate_next_and_load;) {
              push(@$rows, $m->get_shallow_copy);
              }
              $m->iterate_end;
              my($f) = "$model-" . Bivio::Type::DateTime->local_now_as_file_name . '.PL';
              Bivio::IO::File->write($f, Bivio::IO::Ref->to_string($rows));
              print("Backed up $f\n");
              return $rows;
              }

              sub _do_sql {
              my($sql) = @_;
              my($statement);
              my($s) = '';
              foreach my $line (split(/\n/, $sql)) {
              # Skip comments and blanks
              next if $line =~ /^\s*--|^\s*$/;

              # Execute statement if '/' found
              if ($line =~ /^\s*\/\s*$/) {
              $statement = Bivio::SQL::Connection->execute($s);
              $s = '';
              next;
              }

              # Build up statement
              $s .= $line."\n";
              }
              die("$s: left over statement") if $s;
              return $statement;
              }

              sub _drop_column {
              my($table, $column) = @_;
              print "Dropping $table.$column\n";
              _do_sql("
              ALTER TABLE $table
              DROP COLUMN $column
              /
              ");
              return;
              }

              sub _drop_constraints {
              my($table, $column) = map {uc($_)} @_;
              # Find all constraints on the table
              my($statement) = _do_sql(<<"EOF");
              SELECT user_cons_columns.constraint_name
              FROM user_cons_columns, user_constraints
              WHERE user_cons_columns.column_name = '$column'
              AND user_cons_columns.table_name = '$table'
              AND user_constraints.constraint_name
              = user_cons_columns.constraint_name
              /
              EOF
              my(@constraints);
              while (my($name) = $statement->fetchrow_array) {
              push(@constraints, $name);
              }
              $statement->finish;

              # Drop the constraints
              foreach my $c (@constraints) {
              _do_sql("ALTER TABLE $table drop constraint $c
              /");
              }
              return int(@constraints);
              }

              sub _init_realm_role {
              my($rru, $perms) = @_;
              $perms =~ s/\\//g;
              $perms =~ s/^.*b-realm-role.* edit /!/mg;
              foreach my $edit (split(/!/, $perms)) {
              next unless $edit;
              $rru->edit(split(' ', $edit));
              }
              return;
              }

              sub _methods {
              {
              no strict;
              local($^W) = 0;;
              local(*stab) = eval('*' . __PACKAGE__ . '::');
              return grep(
              $_ =~ /^upgrade_\d+_/ && __PACKAGE__->can($_),
              sort(keys(%stab)),
              );
              use strict;
              }
              }

              sub _modify_column {
              my($table, $column, $type) = @_;
              _do_sql("
              ALTER TABLE $table MODIFY $column $type
              /");
              return;
              }

              sub _sqlplus {
              my($db_login, $actions) = @_;
              my($res) = Bivio::ShellUtil::piped_exec(undef, 'su - oracle -c sqlplus',
              <<"EOF");
              $db_login
              set linesize 10000
              set pagesize 10000
              whenever sqlerror exit sql.sqlcode rollback
              whenever oserror exit sql.sqlcode rollback
              $actions;
              disconnect
              exit
              EOF
              return $res;
              }

              1;
            • Johan Lindstrom
              ... This is just my experience at $work. But you should know that we re not very mature on process at all, and currently there is no way to guarantee that
              Message 6 of 11 , Feb 18, 2005
              • 0 Attachment
                At 18:37 2005-02-18, Rob Nagler wrote:
                > > In my experience, doing this automatically is kind of dangerous.
                >
                >Please elaborate why this type of automation is dangerous? We've had

                This is just my experience at $work. But you should know that we're not
                very mature on process at all, and currently there is no way to guarantee
                that there have been no fiddling. That's the current goal, to ensure we're
                at a known state before each upgrade.

                It's not quite best practice, but I guess a lot of shops are in a somewhat
                similar situation, especially if they have a tradition of strong operations
                departments with strong dba:s.


                > > Sometimes one can prepare the migration and just do the most critical part
                > > during the downtime.
                >
                >We try to do this, too.

                Do you automate these steps also?


                > > Of course this depends on the size of the shop and how well the sysadmins
                > > communicate with the developers and how well versed the developers are
                > with
                > > db-related concepts.
                >
                >This is why it is extremely important for developers to be dbas and
                >sysadmins, too. They have to know that problems can occur at all

                Agree 100%. It's a simple fact that all developers need to know their
                tools. And a db is a tool being used similar to a compiler. It's not a
                black box where there's... uh, data or something.


                >* Universally unique sequence ids. This is very useful for debugging,
                > and scalability.

                Does that mean you don't share identities between tables at all? Just one
                global sequence (or whatever)?


                /J

                -------- ------ ---- --- -- -- -- - - - - -
                Johan Lindström Sourcerer @ Boss Casinos johanl AT DarSerMan.com

                Latest bookmark: "TCP Connection Passing"
                http://tcpcp.sourceforge.net/
                dmoz: /Computers/Programming/Languages/JavaScript/ 12
              • Matthew Albright
                ... Yeah, we re on postgresql. ... We are actually shipping a software product, and we make the new releases available to our customers, but they don t have to
                Message 7 of 11 , Feb 18, 2005
                • 0 Attachment
                  --- Rob Nagler <nagler@...> wrote:
                  > This doesn't work with Oracle as mentioned by Johan below.

                  Yeah, we're on postgresql.

                  > This is simply not an option we offer our customers. We could do
                  > this, I guess, but I don't believe in multiple versions in the field
                  > running wild. At most two: test and prod.

                  We are actually shipping a software product, and we make the new releases
                  available to our customers, but they don't have to upgrade every time we
                  release. We're an XP shop, so we release often, which means we have a wide
                  variety of versions in the field.

                  > This is why it is extremely important for developers to be dbas and
                  > sysadmins, too.

                  I couldn't agree more with this. When whoever it was started talking about
                  DBA's changing the database structure, my eyes glazed over, because I've never
                  been in an organization that had a DBA. We have people who are stronger and
                  weaker in certain areas, but really, anyone can modify anything... that's what
                  pair programming is for.

                  > > If anyone is interested, I can share some additional thoughts on
                  > > data migration landmines to watch out for...
                  >
                  > Please do.

                  Like most relational database-backed applications, we've coded up a
                  object-relational mapping layer that follows the "Active Record" pattern
                  (http://www.martinfowler.com/eaaCatalog/activeRecord.html). The problems come
                  in when you call too much of that mapping code from your update_db.pl script.
                  At the time the update_db.pl code is run, it's running against the newest
                  production code. If your migration code calls methods in the "real" code, that
                  code may change, and your migration code might cease to work in the future.

                  To avoid this rather subtle problem (which I suspect I've not explained very
                  lucidly), we do 2 things:

                  - Avoid the use of "real" code in the migration code, even if it means
                  duplication
                  - Test to make sure it is possible to upgrade from the oldest customer
                  version to the new version

                  Sometimes, the use of real code is unavoidable, and we've had to reorder things
                  in the update_db.pl file... that's something to be avoided.

                  matt
                • Adrian Howard
                  On 19 Feb 2005, at 00:50, Matthew Albright wrote: [snip] ... [snip] ... Amen. ... Just to clarify - I was talking about a /client/ DBA who tweaked the schema
                  Message 8 of 11 , Feb 19, 2005
                  • 0 Attachment
                    On 19 Feb 2005, at 00:50, Matthew Albright wrote:
                    [snip]
                    > --- Rob Nagler <nagler@...> wrote:
                    [snip]
                    >> This is why it is extremely important for developers to be dbas and
                    >> sysadmins, too.
                    >
                    > I couldn't agree more with this.

                    Amen.

                    > When whoever it was started talking about
                    > DBA's changing the database structure, my eyes glazed over, because
                    > I've never
                    > been in an organization that had a DBA.

                    Just to clarify - I was talking about a /client/ DBA who tweaked the
                    schema of software that we shipped to them - not the DBA of the
                    development team.

                    Not that we had anybody with that title.

                    (although we did have an Oracle goddess who got paired with a heck of a
                    lot when it came to the DB code :-)

                    Yes this was (from one point of view) the clients "fault" - but if
                    you're going to ship them automated scripts to update there database
                    having an automated double-check that you're really updating the right
                    schema is a nice thing.

                    Adrian
                  • Rob Nagler
                    ... Yes, we try our best to automate everything. If we think there is going to be multiple upgrades, one of which is critical, we ll organize these in
                    Message 9 of 11 , Feb 19, 2005
                    • 0 Attachment
                      Johan Lindstrom writes:
                      > > > Sometimes one can prepare the migration and just do the most critical part
                      > > > during the downtime.
                      > >
                      > >We try to do this, too.
                      >
                      > Do you automate these steps also?

                      Yes, we try our best to automate everything. If we think there is
                      going to be multiple upgrades, one of which is critical, we'll
                      organize these in separate releases. If this isn't possible, we'll
                      make it a two-part upgrade. That script I included in a previous
                      message doesn't update the db_upgrade_t if you just call one of the
                      methods.

                      > Does that mean you don't share identities between tables at all? Just one
                      > global sequence (or whatever)?

                      Not exactly. The following describes it in detail:

                      http://petshop.bivio.biz/src?s=Bivio::Type::PrimaryId

                      This the base class of all serial keys. A PrimaryId is a NUMERIC(18)
                      (but could be larger) which looks like:

                      18 5 3 2 0
                      +------------------+-------------+----------+-------+
                      | Monotonically | Site Id or | Mistake | Type |
                      | Increasing Seq | Other info | Bit | |
                      +------------------+-------------+----------+-------+

                      The numbers above are digit index. Type is a number that uniquely
                      identifies the key type. This divides the name space. The Site Id
                      allows you to divide the name space across independent databases. The
                      Mistake Bit allows you to make mistakes. :-) We used it in one case
                      to transform one key into another without changing the type It saved
                      a lookup, and was related to finding the start of a tree for Oracle's
                      CONNECT BY. Now they we are dumping CONNECT BY, we have no need for
                      it. But that's another discussion.

                      Here's how we initialize the sequences:

                      CREATE sequence user_s
                      MINVALUE 100001
                      CACHE 1 INCREMENT BY 100000
                      /

                      CREATE SEQUENCE club_s
                      MINVALUE 100002
                      CACHE 1 INCREMENT BY 100000
                      /

                      CREATE SEQUENCE ec_payment_s
                      MINVALUE 100015
                      CACHE 1 INCREMENT BY 100000
                      /

                      CREATE sequence bulletin_s
                      MINVALUE 100016
                      CACHE 1 INCREMENT BY 100000
                      /

                      As you can see the spaces are unique, but overlapping. Some people
                      might complain it junks up the indexes, but we haven't seen that to be
                      a problem. Once the tables get big enough, it really doesn't matter.

                      Rob
                    • Rob Nagler
                      ... Perhaps you should pay them to upgrade. :-) Yes, you ve got a logistical problem, and it sounds like you have a good solution for it. ... Interesting you
                      Message 10 of 11 , Feb 19, 2005
                      • 0 Attachment
                        Matthew Albright writes:
                        > We are actually shipping a software product, and we make the new releases
                        > available to our customers, but they don't have to upgrade every time we
                        > release. We're an XP shop, so we release often, which means we have a wide
                        > variety of versions in the field.

                        Perhaps you should pay them to upgrade. :-) Yes, you've got a
                        logistical problem, and it sounds like you have a good solution for
                        it.

                        > been in an organization that had a DBA. We have people who are stronger and
                        > weaker in certain areas, but really, anyone can modify anything... that's what
                        > pair programming is for.

                        Interesting you should mention that. I've been a bit turned off to
                        "general pair programming" lately. We pair for the same reason. One
                        of the things about pair programming that I don't like is that when
                        people are alone, they learn from their mistakes at their own speed.
                        If they are with another programmer, they not only get the problem
                        solved more quickly, but they also don't necessarily assimilate the
                        thought process and tools the other programmer brins to the table.
                        Experience caused by thousands of wrong paths is a great guide.
                        Finding the path right away isn't always the best way. It's certainly
                        my experience with kids that they learn much better when they burn
                        their hands than when you tell them the iron is hot.

                        > Like most relational database-backed applications, we've coded up a
                        > object-relational mapping layer that follows the "Active Record" pattern
                        > (http://www.martinfowler.com/eaaCatalog/activeRecord.html). The problems come
                        > in when you call too much of that mapping code from your update_db.pl script.
                        > At the time the update_db.pl code is run, it's running against the newest
                        > production code. If your migration code calls methods in the "real" code, that
                        > code may change, and your migration code might cease to work in the future.

                        Excellent point. We see this sometimes when trying to do a current
                        upgrade. I would hate to solve this problem for N versions in the
                        past.

                        > To avoid this rather subtle problem (which I suspect I've not explained very
                        > lucidly), we do 2 things:
                        >
                        > - Avoid the use of "real" code in the migration code, even if it means
                        > duplication
                        > - Test to make sure it is possible to upgrade from the oldest customer
                        > version to the new version
                        >
                        > Sometimes, the use of real code is unavoidable, and we've had to reorder things
                        > in the update_db.pl file... that's something to be avoided.

                        Great solution. Seems like you have to do a lot of feature-based
                        tests. For example, you might add a feature, fix it, then remove it
                        (because it wasn't the right solution in the first place). Those
                        three upgrades are a no-op after the "remove it" upgrade.

                        One issue I don't see how you get around is the collection of new
                        types of data. You can never assume that a new data type of data
                        exists in the database. This means you probably can't retire
                        "deprecated" code that works around the non-existence of the new type
                        of data. This may create lots and lots of paths to test. Ugh...

                        Rob
                      • Rob Nagler
                        ... Yikes. Remember there s a clear separation of concerns between the customer and the programmer. Be sure to yell at your customer. ;-) ... Yup. I wonder
                        Message 11 of 11 , Feb 19, 2005
                        • 0 Attachment
                          Adrian Howard writes:
                          > Just to clarify - I was talking about a /client/ DBA who tweaked the
                          > schema of software that we shipped to them - not the DBA of the
                          > development team.

                          Yikes. Remember there's a clear separation of concerns between the
                          customer and the programmer. Be sure to yell at your customer. ;-)

                          > Yes this was (from one point of view) the clients "fault" - but if
                          > you're going to ship them automated scripts to update there database
                          > having an automated double-check that you're really updating the right
                          > schema is a nice thing.

                          Yup. I wonder though if you don't also have to protect the data?
                          Usually apps trust the data that comes from the database. In your
                          case, that wouldn't be the case. Do you have extra validation to make
                          sure the data hasn't been tampered with?

                          Rob
                        Your message has been successfully submitted and would be delivered to recipients shortly.