Loading ...
Sorry, an error occurred while loading the content.
 

Re: ANN: Database Refactoring Catalog Update

Expand Messages
  • psadalage
    ... If there is any kind of code generation based on Data in a table, for example if you have a AddressType table and generate classes based on that, then
    Message 1 of 26 , Jun 18, 2003
      > 7. I'm not convinced that Insert Data is a db refactoring.
      >Doesn't seem
      > interesting enough to me. Same issue with Update data? Also, if
      >these are
      > refactorings why don't you have Delete Data as well?

      If there is any kind of code generation based on Data in a table,
      for example if you have a AddressType table and generate classes
      based on that, then changing the data in the tables affects the way
      the application performs. Hence the Insert Data/Update Data as a
      refactoring.


      As for Delete Data not being there, it's in the works. There are lot
      more of these refactorings that I have not finished yet.

      Pramod
    • Scott W. Ambler
      I d change the names then to reflect the fact that you re focused on a data driven application. - Scott ...
      Message 2 of 26 , Jun 18, 2003
        I'd change the names then to reflect the fact that you're focused on a data
        driven application.
        - Scott

        At 05:06 PM 6/18/2003 +0000, you wrote:
        > > 7. I'm not convinced that Insert Data is a db refactoring.
        > >Doesn't seem
        > > interesting enough to me. Same issue with Update data? Also, if
        > >these are
        > > refactorings why don't you have Delete Data as well?
        >
        >If there is any kind of code generation based on Data in a table,
        >for example if you have a AddressType table and generate classes
        >based on that, then changing the data in the tables affects the way
        >the application performs. Hence the Insert Data/Update Data as a
        >refactoring.
        >
        >
        >As for Delete Data not being there, it's in the works. There are lot
        >more of these refactorings that I have not finished yet.
        >
        >Pramod
        >
        >
        >
        >To unsubscribe from this group, send an email to:
        >agileDatabases-unsubscribe@yahoogroups.com
        >
        >
        >
        >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

        ====================================================
        Scott W. Ambler
        Senior Consultant, Ronin International, Inc.
        www.ronin-intl.com/company/scottAmbler.html

        www.agiledata.org
        www.agilemodeling.com
        www.ambysoft.com
        www.enterpriseunifiedprocess.info
        www.modelingstyle.info
        www.ronin-intl.com
      • Willem Bogaerts
        ... I noticed in the catalog that there were quite a few refactorings toward stored procedures, but not away from them. I can imagine that replace stored
        Message 3 of 26 , Jun 19, 2003
          >>> scott.ambler@... 16/06/2003 17:03:04 >>>
          >I've just updated www.agiledata.org/essays/databaseRefactoringCatalog.html
          >with links to some of Pramod's pages as well as some of my own new database
          >refactorings.

          I noticed in the catalog that there were quite a few refactorings toward stored procedures, but not away from them. I can imagine that "replace stored procedure with application action" would be a nice refactoring for either clarity or database independence.

          I like the catalog. There are more database refactorings than I assumed possible, and the list can even get longer.

          Best regards


          ------------------------------------------------------------------------
          the above information is intended only for the person or entity to whom
          it is addressed and may contain confidential and/or privileged
          information. Any review, retransmission, dissemination of, or taking
          action in reliance upon this information by others than the intended
          recipient is prohibited. If you are not the intended recipient, please
          return this e-mail to the sender and delete it from any computer system.
          ------------------------------------------------------------------------
        • Scott W. Ambler
          Thanks. Migrate Database Method to Application is basically take stored procedure out and shoot it . ;-) I d argue that there are more refactorings that what
          Message 4 of 26 , Jun 22, 2003
            Thanks.

            Migrate Database Method to Application is basically "take stored procedure
            out and shoot it". ;-)

            I'd argue that there are more refactorings that what I have there, in fact
            I know so because I've got notes kicking around that I need to write up
            still. Some code refactorings are clearly applicable to stored procs for
            example, and I haven't listed any of them.

            Any suggestions are always welcome.

            Also, I just posted www.agiledata.org/essays/rup.html so if anyone out
            there is on a RUP project you might want to check it out (also, feel free
            to share the URL with colleagues that are on RUP projects).

            - Scott

            At 09:41 AM 6/19/2003 +0200, you wrote:
            > >>> scott.ambler@... 16/06/2003 17:03:04 >>>
            > >I've just updated www.agiledata.org/essays/databaseRefactoringCatalog.html
            > >with links to some of Pramod's pages as well as some of my own new database
            > >refactorings.
            >
            >I noticed in the catalog that there were quite a few refactorings toward
            >stored procedures, but not away from them. I can imagine that "replace
            >stored procedure with application action" would be a nice refactoring for
            >either clarity or database independence.
            >
            >I like the catalog. There are more database refactorings than I assumed
            >possible, and the list can even get longer.
            >
            >Best regards
            >
            >
            >------------------------------------------------------------------------
            >the above information is intended only for the person or entity to whom
            >it is addressed and may contain confidential and/or privileged
            >information. Any review, retransmission, dissemination of, or taking
            >action in reliance upon this information by others than the intended
            >recipient is prohibited. If you are not the intended recipient, please
            >return this e-mail to the sender and delete it from any computer system.
            >------------------------------------------------------------------------
            >
            >
            >
            >To unsubscribe from this group, send an email to:
            >agileDatabases-unsubscribe@yahoogroups.com
            >
            >
            >
            >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

            ====================================================
            Scott W. Ambler
            Senior Consultant, Ronin International, Inc.
            www.ronin-intl.com/company/scottAmbler.html

            www.agiledata.org
            www.agilemodeling.com
            www.ambysoft.com
            www.enterpriseunifiedprocess.info
            www.modelingstyle.info
            www.ronin-intl.com
          • richard quinn
            Hello, looking at your list of database refactorings, a few questions spring to mind. I hope posting to the list is the right place to discuss them: I think
            Message 5 of 26 , Jun 23, 2003
              Nachricht
              Hello,
               
              looking at your list of database refactorings, a few questions spring to mind.
              I hope posting to the list is the right place to discuss them:
               
              I think that the Refactorings List is a very good start, but generally there is not enough detail, and no ERDs to go with each factoring, which could a show a before / after snapshot of the DB Schema. Over at martinfowler.com, I notice that he places the details of each Refactoring on a separate page, uncluttering his list and allowing great detail to be expressed. A more general wish would be to see a discussion of each refactoring, why, when, when not, caveats and such.
               
              I miss (or didn't read the list properly) a refactoring which we use quite regularly
                  Horizontally partition tables: take a large table: tblBaseData, and create n tables with the same Schema: tblBaseData1,tblBaseData2, tblBaseDataN,. Choose a value (timestamp maybe) which can be used to partition the data into more or less equally large sets, and move a subset of the base records into each of your new tables. Creata a view or a method to provide a unified representation of the data.
               
              The advantage is, you can (on some DBMSs) then move the data to different nodes in your server farm, reaping huge scalability and some performance rewards. Its also possible to allow additions only to an incoming table, improving concurrency on the reads on the other tabes (but this depends of course on how your apps access the data). Although I do concede it is a kind of negative refactoring, since it sort of ruins your hard-won normalization :)
               
              BTW, we have begun taking a new approach to usings Stored Procedures (Methods, as you call them). The classic CRUD approach of having several SPs for each entity becomes somewhat cumbersome when you have hunreds of tables, and thousands of SPs. Problems arise when refactorings are introduced, creating a potentially huge ToDo list of many subsequent changes necessary to keep your DB in sync. Classic resolutions have been:
               
                  a) derive all DDL Code from a (data | object) Model, and update the DB semi-automatically
                  b) remove all CRUD SPs and put the data logic in your application's data objects (shudder!)
                  c) bite the bullet and create bloated monstrosities
               
              The main problem with C is the huge increase in coupling you get. Assume you have an application object based on cAddress, which contains 20 attributes and associated access methods. Your Object is mapped to 4 SPs, Create, Update, Read, Delete (although I know that in practice you can get away with 2 SPs: Create/Update and Read). Each of your 4 SPs takes 20 parameters and then a few more for transferring state information. If you change your tables you have a severe problem updating all that code. If other SPs are using your 4 original SPs you have serious problems discovering and modifying all dependent code through all delegation chains. I've been there, its no fun telling your client that adding a new VARCHAR (20) field will take 4-7 days of work.
               
              So what is our solution? Basically, we keep all of our CRUD SPs, allowing us to leverage the performance gains. We can also reuse our code generation tool, easing the pain somewhat. Delegating operational repsonsibilities to different SPs also helps when isolating defects.
               
              We create an entry point SP, which recieves paramaters telling it what kind of entity is being operated on, what the operation is (and how to do it), and an XML document which must be filled with the entity data after the operation is complete, and possibly already populated with data which should be written to the entity when we do updates and inserts.
               
              On the application side, each data object can produce an XML document of its current state (a serialization), and pass this document to the data access layer. It receives the XML document back and (using a factory) instantiates a new object representation of the record.
               
              This allows quite a useful level of decoupling, We are now able to modify any or each of the application, the data access ayer or the data storage layer without breaking the interfaces. The main advantage is, that we keep the high isolation of each layer, without feeling the pain of many interface contracts. A potential disadvantage is performance decrease, since we're not hard-wiring input/output values any more. We don't have any data yet on the performance difference, but we're willing to risk that it is either negligible or irrelevant compared to increased development and maintenance efficiency.
               
               
              Best Regards,

              Richard Quinn
              -----------
              Richard Quinn
              richard[at]richard[minus]quinn[d o t]co[d o t]uk :)
               
               
               
               
               
               
               
               
               
               
               
               
              -----Urspr√ľngliche Nachricht-----
              Von: sentto-7758546-235-1056333752-rquinn=web.de@... [mailto:sentto-7758546-235-1056333752-rquinn=web.de@...] Im Auftrag von Scott W. Ambler
              Gesendet: Montag, 23. Juni 2003 04:01
              An: agileDatabases@yahoogroups.com; agiledata@...
              Betreff: Re: [agileDatabases] ANN: Database Refactoring Catalog Update


              Thanks.

              Migrate Database Method to Application is basically "take stored procedure
              out and shoot it".  ;-)

              I'd argue that there are more refactorings that what I have there, in fact
              I know so because I've got notes kicking around that I need to write up
              still.  Some code refactorings are clearly applicable to stored procs for
              example, and I haven't listed any of them.

              Any suggestions are always welcome.

              Also, I just posted www.agiledata.org/essays/rup.html so if anyone out
              there is on a RUP project you might want to check it out (also, feel free
              to share the URL with colleagues that are on RUP projects).

              - Scott

              At 09:41 AM 6/19/2003 +0200, you wrote:
              > >>> scott.ambler@... 16/06/2003 17:03:04 >>>
              > >I've just updated www.agiledata.org/essays/databaseRefactoringCatalog.html
              > >with links to some of Pramod's pages as well as some of my own new database
              > >refactorings.
              >
              >I noticed in the catalog that there were quite a few refactorings toward
              >stored procedures, but not away from them. I can imagine that "replace
              >stored procedure with application action" would be a nice refactoring for
              >either clarity or database independence.
              >
              >I like the catalog. There are more database refactorings than I assumed
              >possible, and the list can even get longer.
              >
              >Best regards
              >
              >
              >------------------------------------------------------------------------
              >the above information is intended only for the person or entity to whom
              >it is addressed and may contain confidential and/or privileged
              >information. Any review, retransmission, dissemination of, or taking
              >action in reliance upon this information by others than the intended
              >recipient is prohibited. If you are not the intended recipient, please
              >return this e-mail to the sender and delete it from any computer system.
              >------------------------------------------------------------------------
              >
              >
              >
              >To unsubscribe from this group, send an email to:
              >agileDatabases-unsubscribe@yahoogroups.com
              >
              >
              >
              >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

              ====================================================
              Scott W. Ambler
              Senior Consultant, Ronin International, Inc.
              www.ronin-intl.com/company/scottAmbler.html

              www.agiledata.org
              www.agilemodeling.com
              www.ambysoft.com
              www.enterpriseunifiedprocess.info
              www.modelingstyle.info
              www.ronin-intl.com


              Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
            • Marc Hamann
              ... Hi Richard, I ve noticed that most DBAs have this same instinctive repulsion allowing an application control the data logic, and I can see why this is
              Message 6 of 26 , Jun 23, 2003
                At 03:31 AM 6/23/03, Richard Quinn wrote:
                > b) remove all CRUD SPs and put the data logic in your application's
                > data objects (shudder!)

                Hi Richard,

                I've noticed that most DBAs have this same instinctive repulsion allowing
                an "application" control the data logic, and I can see why this is so if
                a) you are solely responsible for the database and you don't want someone
                to "mess it up" or b) the same database is used extensively by multiple
                applications with no common access layer.

                I wonder: would you feel the same way if the database was dedicated to a
                particular application and the "DBA" was a full member of the application
                team, actively contributing to the "application" data logic?

                What does anyone else think?

                Marc
              • Joe Celko
                ... dedicated to a particular application and the DBA was a full member of the application team, actively contributing to the application data logic?
                Message 7 of 26 , Jun 23, 2003
                  >> I wonder: would you feel the same way if the database was
                  dedicated to a particular application and the "DBA" was a full
                  member of the application team, actively contributing to
                  the "application" data logic? <<

                  If the data is used by one and only one application, then you should
                  have had a file system that is native to the host language of the
                  application. Files are a lot faster and smaller than an RDBMS.

                  But the whole reason for a database was to share the data with a lot
                  of applications -- both present and future. How exactly is the DBA
                  supposed to guarantee for all those applications, present and
                  future, that they use the same definitions, domains, checks and
                  constraints? The answer is that you put it in the database itself.

                  Professionals write schema code that is correct, maintainable,
                  robust, and all those other good words from Software engineering.
                  Amateurs leave holes in their schemas and then wonder why the
                  database got corrupted and desparately try to patch it on the fly
                  after the fact in application code.

                  This snippet is taken from an email entitled "Don't Bother Me with
                  All that Theoretical Stuff, I've Got Practical Things to Do" at
                  www.debunk.com and it is a classic. For the records, I am not
                  the "Joe" in this dialogue.

                  From: BC
                  To: editor@...

                  I just have to tell someone who understands. I just had the
                  following conversation with the data architect of the OLTP system
                  from which I get data for our data warehouse:

                  Self: Joe, I'm looking at the schema and I'm trying to determine
                  what column we use for the number of individual sellable units that
                  are in a container. There is no "container" table. How is this
                  known?

                  Joe: Oh, you can use any. Where are you accessing the data from?

                  Self: If you mean what am I joining into the query, it shouldn't
                  matter. What defines this fact?

                  Joe: What are you trying to do?

                  Self: Stop myself from killing you... (okay, I only thought this
                  line...) Inevitably, I found out that if push comes to shove we
                  would indeed trust a column in one table over the value in any of
                  the myriad of other tables. I go back and select the distinct values
                  from the column.

                  Self: Joe. What does a zero mean in the Dsf3345 column? (Nice names,
                  eh? But relax, I've been assured that: "That's okay. We have all the
                  definitions in our application tool")

                  Joe: It means there is only one sellable item in the container.

                  Self: So, what does the value one mean?

                  Joe: It means there is one sellable item in the container.

                  Self: Do you realize people write reports against this data? Some of
                  them actually try to do math with it.

                  Joe: Why are you making such a big deal about this?

                  Self: We could of course not allow zero values in the column. That
                  way we don't have to transform zeros into ones in each report. One
                  constraint would do the trick.

                  Joe: We put all our integrity constraints in the application.

                  Self: Could you check all your code for all the places you update or
                  insert into this column and ensure we never allow zeroes?

                  Joe: We're too busy for that, think of all the tables we would have
                  to change. Just change your reports. Besides, we have to allow
                  zeroes.

                  Self: Why's that?

                  Joe: Well, what else are they going to use for "not applicable"?

                  Our data warehouse is far more normalized than the OLTP system. (And
                  for some strange reason it performs better.) BTW, Joe (not his real
                  name) is an author of a textbook on OO Design Measurement.
                • Rich
                  Hello, Actually, I feel that if your data has relevancy only for one application, why bother using an [expensive] RDBMS, an Object store placed directly in RAM
                  Message 8 of 26 , Jun 23, 2003
                    Nachricht
                    Hello,
                     
                    Actually, I feel that if your data has relevancy only for one application, why bother using an [expensive] RDBMS, an Object store placed directly in RAM - with appropriate serialization deserialization from a file - will do your job much better. Here you could have huge performance, little concurrency and you would be sure that only your specific application can utilize the data.
                     
                    For me the central issue is not "how many apps are using the data" but "is the data valuable to the business per se". Most data, in my experience, is hugely valuable to the business which owns / produces / gathers it. In these - vast majority of - cases, the business recognises the intrinsic value of the data, and is willing to spend appropriate sums of money to pay professionals who can ensure that the data:
                     
                        remains consistent
                        can be analyzed
                        can be understood
                        is verifyable
                        and therefore retains its value
                     
                    Now, IMHO, giving the responsibility for the data to your J2EE / .NET / .LatestFashion Architects is like putting the fox in charge of the henhouse. Few of these people are willing to spend the effort needed to respect the data's business value, and will sacrifice any NOT NULL or IDENTITY or CONSTRAINT or relationship, as long as their app mostly fits budgeting and scheduling limitations. Ultimately, the business suffers from the steady and irreversible decomposition of their data. I mean, these companies are busy planting orchards and harvesting apples, later they discover that the trees are rotting at the foot of some technology guru's project reference list.
                     
                    I have the good fortune to work with a team where each member is not just a qualified DBA but also a qualified enterprise programmer, each developer has the App. server and the DB server running on their workstation, so I also don't see a correlation between a team member's role and the underlying personal belief-system which guides the development and the choice of where to put the data logic.
                     
                    Basically what I'm saying is, that "enterprise applications" produce and manage data, and it is the data which ultimately creates business value, rarely the application. So we don't entrust the enforcement of data logic to our applications. (This is just one angle on the subject, I could probably also rant on about other distasteful aspects of this approach, such as the impossibility of maintaining code | database which have shared data responsibility for data integrity ).
                     
                    Just some of my thoughts, not intended to offend,
                     
                    Best Regards,
                    Richard Quinn
                     
                     
                     
                     
                    -----Urspr√ľngliche Nachricht-----
                    Von: sentto-7758546-237-1056375945-rquinn=web.de@... [mailto:sentto-7758546-237-1056375945-rquinn=web.de@...] Im Auftrag von Marc Hamann
                    Gesendet: Montag, 23. Juni 2003 15:46
                    An: agileDatabases@yahoogroups.com
                    Betreff: Re: [agileDatabases] Location of Data Logic

                    At 03:31 AM 6/23/03, Richard Quinn wrote:
                    >     b) remove all CRUD SPs and put the data logic in your application's
                    > data objects (shudder!)

                    Hi Richard,

                    I've noticed that most DBAs have this same instinctive repulsion allowing
                    an "application" control the data logic,  and I can see why this is so if
                    a) you are solely responsible for the database and you don't want someone
                    to "mess it up" or b) the same database is used extensively by multiple
                    applications with no common access layer.

                    I wonder:  would you feel the same way if the database was dedicated to a
                    particular application and the "DBA" was a full member of the application
                    team, actively contributing to the "application" data logic?

                    What does anyone else think?

                    Marc
                  • yahoogroups@jhrothjr.com
                    ... From: Joe Celko To: agileDatabases@yahoogroups.com
                    Message 9 of 26 , Jun 23, 2003
                      ----- Original Message -----
                      From: "Joe Celko"
                      <celko.at.northfacelearning.com@...>
                      To: "agileDatabases@yahoogroups.com"
                      <agileDatabases.at.yahoogroups.com@...>
                      Sent: Monday, June 23, 2003 5:59 PM
                      Subject: [agileDatabases] Re: Location of Data Logic


                      > >> I wonder: would you feel the same way if the database was
                      > dedicated to a particular application and the "DBA" was a full
                      > member of the application team, actively contributing to
                      > the "application" data logic? <<
                      >
                      > If the data is used by one and only one application, then you should
                      > have had a file system that is native to the host language of the
                      > application. Files are a lot faster and smaller than an RDBMS.
                      >
                      > But the whole reason for a database was to share the data with a lot
                      > of applications -- both present and future. How exactly is the DBA
                      > supposed to guarantee for all those applications, present and
                      > future, that they use the same definitions, domains, checks and
                      > constraints? The answer is that you put it in the database itself.
                      >
                      > Professionals write schema code that is correct, maintainable,
                      > robust, and all those other good words from Software engineering.
                      > Amateurs leave holes in their schemas and then wonder why the
                      > database got corrupted and desparately try to patch it on the fly
                      > after the fact in application code.
                      >
                      > This snippet is taken from an email entitled "Don't Bother Me with
                      > All that Theoretical Stuff, I've Got Practical Things to Do" at
                      > www.debunk.com and it is a classic. For the records, I am not
                      > the "Joe" in this dialogue.

                      You've actually got two different issues here:

                      1. Does the data schema make sense (that is, all the usual
                      checks and so forth are applied, etc.)

                      2. Does the data base itself have to be visible to all and sundry.

                      I don't think you're going to get a real arguement about whether
                      the data model needs to make sense. That's simply good design,
                      whether you're talking about a relational data base schema
                      or an object schema.

                      The real killer is the second one. From a software engineering
                      viewpoint, the standard "corporate data base" view is a bad
                      case of out of control coupling, which leads to huge projects
                      to change things that ought to be a snap, and also leads to lots
                      of planning and management overhead.

                      Publishing your data base violates one of the most fundamental
                      principles of software engineering, that of data hiding or
                      encapsulation. If you want to be able to move fast, you cannot
                      have 900 different applications and uncounted hoards of carefully
                      saved queries in spreadsheats and custom reports all dependent
                      on the same table.

                      I realize that, from the DBA's perspective, this is rank heresy.
                      Reflecting on a bit of history is probably a good idea: relational
                      data base technology came out of very different roots than
                      software engineering. One of the early selling points of relational
                      technology was the "corporate data base" notion; IBM's (and
                      others) marketing machine just rolled over everyone who
                      questioned whether this was a good idea from the continuous
                      development point of view.

                      If application A wants some data that's managed by application
                      B, then it should ask nicely, using an agreed upon protocol, not
                      go around the application directly to the data.

                      John Roth


                      >
                      > From: BC
                      > To: editor@...
                      >
                      > I just have to tell someone who understands. I just had the
                      > following conversation with the data architect of the OLTP system
                      > from which I get data for our data warehouse:
                      >
                      > Self: Joe, I'm looking at the schema and I'm trying to determine
                      > what column we use for the number of individual sellable units that
                      > are in a container. There is no "container" table. How is this
                      > known?
                      >
                      > Joe: Oh, you can use any. Where are you accessing the data from?
                      >
                      > Self: If you mean what am I joining into the query, it shouldn't
                      > matter. What defines this fact?
                      >
                      > Joe: What are you trying to do?
                      >
                      > Self: Stop myself from killing you... (okay, I only thought this
                      > line...) Inevitably, I found out that if push comes to shove we
                      > would indeed trust a column in one table over the value in any of
                      > the myriad of other tables. I go back and select the distinct values
                      > from the column.
                      >
                      > Self: Joe. What does a zero mean in the Dsf3345 column? (Nice names,
                      > eh? But relax, I've been assured that: "That's okay. We have all the
                      > definitions in our application tool")
                      >
                      > Joe: It means there is only one sellable item in the container.
                      >
                      > Self: So, what does the value one mean?
                      >
                      > Joe: It means there is one sellable item in the container.
                      >
                      > Self: Do you realize people write reports against this data? Some of
                      > them actually try to do math with it.
                      >
                      > Joe: Why are you making such a big deal about this?
                      >
                      > Self: We could of course not allow zero values in the column. That
                      > way we don't have to transform zeros into ones in each report. One
                      > constraint would do the trick.
                      >
                      > Joe: We put all our integrity constraints in the application.
                      >
                      > Self: Could you check all your code for all the places you update or
                      > insert into this column and ensure we never allow zeroes?
                      >
                      > Joe: We're too busy for that, think of all the tables we would have
                      > to change. Just change your reports. Besides, we have to allow
                      > zeroes.
                      >
                      > Self: Why's that?
                      >
                      > Joe: Well, what else are they going to use for "not applicable"?
                      >
                      > Our data warehouse is far more normalized than the OLTP system. (And
                      > for some strange reason it performs better.) BTW, Joe (not his real
                      > name) is an author of a textbook on OO Design Measurement.
                      >
                      >
                      >
                      >
                      >
                      >
                      >
                      > To unsubscribe from this group, send an email to:
                      > agileDatabases-unsubscribe@yahoogroups.com
                      >
                      >
                      >
                      > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                      >
                      >
                    • Marc Hamann
                      ... OK, perhaps we have a terminology confusion. I m not talking about the basics of constraints (required columns, referential constraints, uniqueness
                      Message 10 of 26 , Jun 23, 2003
                        At 05:59 PM 6/23/03, Joe Celko wrote:
                        >But the whole reason for a database was to share the data with a lot
                        >of applications -- both present and future. How exactly is the DBA
                        >supposed to guarantee for all those applications, present and
                        >future, that they use the same definitions, domains, checks and
                        >constraints? The answer is that you put it in the database itself.

                        OK, perhaps we have a terminology confusion. I'm not talking about the
                        basics of constraints (required columns, referential constraints,
                        uniqueness constraints). I think we would all agree that for non-trivial
                        situations you would want to apply these.

                        I'm talking about that fuzzy world of "business logic" that covers a
                        multitude of concepts, including some that tend to be viewed as part of
                        the data logic.From my point of view this is the problem. There is a whole
                        range of "logic" that goes into making data usable. Some people seem to
                        feel that most of that should go into the database. Others feel that most
                        of that should be in the "application". If you don't make an ideological
                        distinction between "data" and "application" (for example, where the same
                        people are responsible for both) what are you to?


                        >Professionals write schema code that is correct, maintainable,
                        >robust, and all those other good words from Software engineering.
                        >Amateurs leave holes in their schemas and then wonder why the
                        >database got corrupted and desparately try to patch it on the fly
                        >after the fact in application code.

                        My experience is that the biggest data quality issues are not enforceable
                        by ANY technical means, since they arise from semantic misunderstandings,
                        laziness, ambiguity, etc. that are features of the business world. It
                        is very easy to end up with perfectly valid but utterly meaningless
                        data. Though I'm not arguing for sloppy schema development, I would argue
                        that sometimes the best way to help reduce these semantic problems is by
                        putting more thought into the "application" end of things (closer to the UI).

                        Agile emphatically does not mean lacking in discipline. On the contrary, I
                        think discipline is the single most valuable force for effectiveness in any
                        domain. However, one aspect of discipline is knowing where to apply the
                        most effort to get the most effect. Maybe for some projects spending a
                        long time putting all the logic into stored procedures is counter productive.

                        Marc
                      • Marc Hamann
                        ... You may have a case for many applications. What if the application is rather large, for example, all of the inter-related data for several departments
                        Message 11 of 26 , Jun 23, 2003
                          At 06:43 PM 6/23/03, Richard Quinn wrote:

                          >Actually, I feel that if your data has relevancy only for one application,
                          >why bother using an [expensive] RDBMS, an Object store placed directly in
                          >RAM - with appropriate serialization deserialization from a file - will do
                          >your job much better. Here you could have huge performance, little
                          >concurrency and you would be sure that only your specific application can
                          >utilize the data.

                          You may have a case for many applications. What if the application is
                          rather large, for example, all of the inter-related data for several
                          departments with a common business domain? You may want to use a
                          third-party reporting tool, and feed the data warehouse from it, etc. For
                          these reasons, you probably will still choose an RDBMS as the "smart
                          repository" for the data.


                          > For me the central issue is not "how many apps are using the data" but
                          > "is the data valuable to the business per se". Most data, in my
                          > experience, is hugely valuable to the business which owns / produces /
                          > gathers it. In these - vast majority of - cases, the business recognises
                          > the intrinsic value of the data, and is willing to spend appropriate sums
                          > of money to pay professionals who can ensure that the data:
                          >
                          > remains consistent
                          > can be analyzed
                          > can be understood
                          > is verifyable
                          > and therefore retains its value


                          As I mention in my response to Joe, data quality is not just a matter of
                          syntactic well-formedness, but also semantic meaningfulness. One good way
                          to ensure the latter is to make sure that the application "makes sense" to
                          the end user who inputs and uses the data. Sometimes this is hampered by
                          an arbitrary wall between "data" and "application".


                          > Now, IMHO, giving the responsibility for the data to your J2EE / .NET /
                          > .LatestFashion Architects is like putting the fox in charge of the henhouse.

                          Though there may be unprofessional and undisciplined proponents of any
                          technical expertise, it might be an over-generalization to say that all
                          are. The premise of my original post was that there existed a mutual
                          respect and spirit of collegial cooperation between the "data people" and
                          the "application people", such that they saw themselves as part of the same
                          team. In fact, the only division was one of relative expertise rather than
                          sharply delimited "turfs".
                          So no foxes, only hens. ;-)


                          >I have the good fortune to work with a team where each member is not just
                          >a qualified DBA but also a qualified enterprise programmer, each developer
                          >has the App. server and the DB server running on their workstation, so I
                          >also don't see a correlation between a team member's role and the
                          >underlying personal belief-system which guides the development and the
                          >choice of where to put the data logic.

                          Perfect! The foxless henhouse! ;-) This has tended to be my experience as
                          well, since I have worked in "multi-tasking" environments where I had
                          responsibility for BOTH ends of the database. But some people I have met
                          from more specialized organizations where the divide is more pronounced
                          have tended to react badly to the concept that one might have a choice of
                          where to put logic. ;-)

                          > Basically what I'm saying is, that "enterprise applications" produce and
                          > manage data, and it is the data which ultimately creates business value,
                          > rarely the application. So we don't entrust the enforcement of data logic
                          > to our applications. (This is just one angle on the subject, I could
                          > probably also rant on about other distasteful aspects of this approach,
                          > such as the impossibility of maintaining code | database which have
                          > shared data responsibility for data integrity ).

                          Perhaps this in only a semantic distinction, but some things that are
                          usually considered "application" are in fact critical aspects of making the
                          data have business value. I suppose it depends on where you draw the line
                          between the two, and that is what interests me.

                          >Just some of my thoughts, not intended to offend,

                          Of course. What is a mailing list without discussion? ;-)

                          Thanks for the thoughtful response!

                          Marc
                        • Scott W. Ambler
                          ... that s something I intend to do in the near future. Higher priorities right now. ... There are both vertical and horizontal partitioning refactorings in
                          Message 12 of 26 , Jun 23, 2003
                            At 09:31 AM 6/23/2003 +0200, you wrote:
                            >Hello,
                            >
                            >looking at your list of database refactorings, a few questions spring to mind.
                            >I hope posting to the list is the right place to discuss them:
                            >
                            >I think that the Refactorings List is a very good start, but generally
                            >there is not enough detail, and no ERDs to go with each factoring, which
                            >could a show a before / after snapshot of the DB Schema. Over at
                            >martinfowler.com, I notice that he places the details of each Refactoring
                            >on a separate page, uncluttering his list and allowing great detail to be
                            >expressed. A more general wish would be to see a discussion of each
                            >refactoring, why, when, when not, caveats and such.

                            that's something I intend to do in the near future. Higher priorities
                            right now.


                            >
                            >I miss (or didn't read the list properly) a refactoring which we use quite
                            >regularly
                            > Horizontally partition tables: take a large table: tblBaseData, and
                            > create n tables with the same Schema: tblBaseData1,tblBaseData2,
                            > tblBaseDataN,. Choose a value (timestamp maybe) which can be used to
                            > partition the data into more or less equally large sets, and move a
                            > subset of the base records into each of your new tables. Creata a view or
                            > a method to provide a unified representation of the data.

                            There are both vertical and horizontal partitioning refactorings in the list.

                            >
                            >The advantage is, you can (on some DBMSs) then move the data to different
                            >nodes in your server farm, reaping huge scalability and some performance
                            >rewards. Its also possible to allow additions only to an incoming table,
                            >improving concurrency on the reads on the other tabes (but this depends of
                            >course on how your apps access the data). Although I do concede it is a
                            >kind of negative refactoring, since it sort of ruins your hard-won
                            >normalization :)
                            >
                            >BTW, we have begun taking a new approach to usings Stored Procedures
                            >(Methods, as you call them). The classic CRUD approach of having several
                            >SPs for each entity becomes somewhat cumbersome when you have hunreds of
                            >tables, and thousands of SPs. Problems arise when refactorings are
                            >introduced, creating a potentially huge ToDo list of many subsequent
                            >changes necessary to keep your DB in sync. Classic resolutions have been:
                            >
                            > a) derive all DDL Code from a (data | object) Model, and update the
                            > DB semi-automatically
                            > b) remove all CRUD SPs and put the data logic in your application's
                            > data objects (shudder!)
                            > c) bite the bullet and create bloated monstrosities

                            See www.agiledata.org/essays/implementationStrategies.html. Seems to me
                            that you're talking about basic encapsulation strategies. Could be wrong
                            about that though?

                            >
                            >The main problem with C is the huge increase in coupling you get. Assume
                            >you have an application object based on cAddress, which contains 20
                            >attributes and associated access methods. Your Object is mapped to 4 SPs,
                            >Create, Update, Read, Delete (although I know that in practice you can get
                            >away with 2 SPs: Create/Update and Read). Each of your 4 SPs takes 20
                            >parameters and then a few more for transferring state information. If you
                            >change your tables you have a severe problem updating all that code. If
                            >other SPs are using your 4 original SPs you have serious problems
                            >discovering and modifying all dependent code through all delegation
                            >chains. I've been there, its no fun telling your client that adding a new
                            >VARCHAR (20) field will take 4-7 days of work.
                            >
                            >So what is our solution? Basically, we keep all of our CRUD SPs, allowing
                            >us to leverage the performance gains. We can also reuse our code
                            >generation tool, easing the pain somewhat. Delegating operational
                            >repsonsibilities to different SPs also helps when isolating defects.
                            >
                            >We create an entry point SP, which recieves paramaters telling it what
                            >kind of entity is being operated on, what the operation is (and how to do
                            >it), and an XML document which must be filled with the entity data after
                            >the operation is complete, and possibly already populated with data which
                            >should be written to the entity when we do updates and inserts.
                            >
                            >On the application side, each data object can produce an XML document of
                            >its current state (a serialization), and pass this document to the data
                            >access layer. It receives the XML document back and (using a factory)
                            >instantiates a new object representation of the record.
                            >
                            >This allows quite a useful level of decoupling, We are now able to modify
                            >any or each of the application, the data access ayer or the data storage
                            >layer without breaking the interfaces. The main advantage is, that we keep
                            >the high isolation of each layer, without feeling the pain of many
                            >interface contracts. A potential disadvantage is performance decrease,
                            >since we're not hard-wiring input/output values any more. We don't have
                            >any data yet on the performance difference, but we're willing to risk that
                            >it is either negligible or irrelevant compared to increased development
                            >and maintenance efficiency.

                            Interesting. If it's working for you stick with it until it doesn't.

                            - Scott
                          • rquinn@web.de
                            Hello, True, many app s create the data, but we agree that its the data, not the apps, which hold the accumulated business value? Some companies in the
                            Message 13 of 26 , Jun 24, 2003
                              Message
                              Hello,
                               
                              True, many app's create the data, but we agree that its the data, not the apps, which hold the accumulated business value? Some companies in the consulting business produce tons of powerpoint and and word files, its not the applications but the data which contains the business value. OK, without the applications you'd have a hard time getting at your data, but monopolising other people's data in this way is something I try to avoid.
                               
                              Where I tend to draw the line depends very much on what development resources are available to the project, and what the projects requirements are and its nature.
                               
                              In all projects we remove the data access code (insert, creates, reads, deletes) from the data objects and put them in an abstratction layer. Sometimes the layer is in application code (usually J2EE), sometimes its in the database as a stored procedure layer. In any case, I enforce this rigourously, making sure that the development team sticks with its chosen paradigm.
                               
                              In many systems, we need things like EAI at the database layer (no tinkering with SOAP :)) , we need to enforce business rules such as keyword generation, session deletion, generation of reporting data and such. We also code this is as nearly as possible in the data base. This frees up development to concentrate on the business logic, the when and why of accessing/modifying the data, the how is enforced at the DB or at the data abstaction layer. It is a semantic distinction, but that is good, semantic misunderstandings are never really productive :)
                               
                              Sometimes business say things like "Customers will only ever buy a power station from us once". The business types, (all MBA holders :) believe this with such force that they'll pour scorn on anybody more junior who challenges their assumptions. Then they discover, suddenly, that their customers go through a demerger and they suddenly have a new customer owning some equipment bought by another customer. So the business rules must changes and this is justified by saying "the market has changed, we need to keep customer-focused" instead of admitting that they failed to anticipate some eventualities.
                               
                              So what we encounter are "business rules" dreamed up during the ride to work by excited managers. The "busines rules" are thus very volatile. Now, how does it help me manage a large enterprise application, if I know that the developers haven't adequately separated business from data rules? It doesn't. I'd have to check to see wether the business rule was enforced in app code or in data access code. It could be in either, if a distinction had never been drawn between the two. (Since it is a business rule, and application independent, and data integrity specific, I would hope to find it in the DB or DAL, but you never know). In any case, there is no advantage in changing app code over changing DAL / DB code. none at all. This is the point where I argue for complete decoupling of presentation, application logic / data flow, data abstraction and access and data storage. If you plan it right and reduce the interface coupling between your layers its possible to get the best of everything.
                               
                              We have coded apps in PHP, Java, ASP, .NET and C++ against Oracle, MySQL, DB2 and MS-SQL. We haven't yet come across a situation where we couldn't abstract data access to a separate layer, even using PHP against MySQL.
                              I have noticed that many developers / architects tend to believe that the coding overhead to doing it this way is huge, and misguidedly believe that they can gain some efficiency by embedding SQL in their data objects. I have also seen some poor fools go to extreme ends to code an "SQL Generator" in Java, so that they could retain some self esteem and platform independence whilst nevertheless codifyng SQL Operations into their classes. This is quite mistaken, just as mistaken as the belief that you can leave testing to after the implementation phase, directly before roll-out :)
                               
                              What I'm currently missing are some good arguments for putting data access / integrity logic into your application code. I can't think of any, but then I am biased ;)#
                               
                              Cheers, Rich
                              -----Original Message-----
                              From: sentto-7758546-243-1056418570-rquinn=web.de@... [mailto:sentto-7758546-243-1056418570-rquinn=web.de@...] On Behalf Of Marc Hamann
                              Sent: Dienstag, 24. Juni 2003 03:37
                              To: agileDatabases@yahoogroups.com
                              Subject: Re: AW: [agileDatabases] Location of Data Logic

                              >  Basically what I'm saying is, that "enterprise applications" produce and
                              > manage data, and it is the data which ultimately creates business value,
                              > rarely the application. So we don't entrust the enforcement of data logic
                              > to our applications. (This is just one angle on the subject, I could
                              > probably also rant on about other distasteful aspects of this approach,
                              > such as the impossibility of maintaining code | database which have
                              > shared data responsibility for data integrity ).

                              Perhaps this in only a semantic distinction, but some things that are
                              usually considered "application" are in fact critical aspects of making the
                              data have business value.  I suppose it depends on where you draw the line
                              between the two, and that is what interests me.
                            • Scott W. Ambler
                              ... You need both together. ... Data and non-data folks need to find ways of working together effectively. We really shouldn t be talking about two types of
                              Message 14 of 26 , Jun 24, 2003
                                At 09:56 AM 6/24/2003 +0200, you wrote:
                                >Hello,
                                >
                                >True, many app's create the data, but we agree that its the data, not the
                                >apps, which hold the accumulated business value? Some companies in the
                                >consulting business produce tons of powerpoint and and word files, its not
                                >the applications but the data which contains the business value. OK,
                                >without the applications you'd have a hard time getting at your data, but
                                >monopolising other people's data in this way is something I try to avoid.

                                You need both together.


                                >
                                >Where I tend to draw the line depends very much on what development
                                >resources are available to the project, and what the projects requirements
                                >are and its nature.

                                Data and non-data folks need to find ways of working together
                                effectively. We really shouldn't be talking about two types of folks, just
                                IT professionals IMHO. That's one of the ideas that I try to get across at
                                www.agiledata.org/essays/vision.html.


                                >
                                >In all projects we remove the data access code (insert, creates, reads,
                                >deletes) from the data objects and put them in an abstratction layer.
                                >Sometimes the layer is in application code (usually J2EE), sometimes its
                                >in the database as a stored procedure layer. In any case, I enforce this
                                >rigourously, making sure that the development team sticks with its chosen
                                >paradigm.
                                >
                                >In many systems, we need things like EAI at the database layer (no
                                >tinkering with SOAP :)) , we need to enforce business rules such as
                                >keyword generation, session deletion, generation of reporting data and
                                >such. We also code this is as nearly as possible in the data base. This
                                >frees up development to concentrate on the business logic, the when and
                                >why of accessing/modifying the data, the how is enforced at the DB or at
                                >the data abstaction layer. It is a semantic distinction, but that is good,
                                >semantic misunderstandings are never really productive :)

                                The interesting implication is that it isn't cut and dried as to how you
                                deploy business logic, RI logic, ... You have choices, trade-offs to each
                                choice, and will chose differently depending on your situation. Yes,
                                sometimes your DB is the lowest common denominator within your org and
                                you're motivated into putting more logic there than you would
                                prefer. Sometimes this isn't the case (perhaps you're in a multi-db
                                environment, sometimes everything is going through a common service
                                layer). I go into detail on the trade-offs at
                                www.agiledata.org/essays/referentialIntegrity.html (I address both
                                business logic and RI in this essay).


                                >
                                >Sometimes business say things like "Customers will only ever buy a power
                                >station from us once". The business types, (all MBA holders :) believe
                                >this with such force that they'll pour scorn on anybody more junior who
                                >challenges their assumptions. Then they discover, suddenly, that their
                                >customers go through a demerger and they suddenly have a new customer
                                >owning some equipment bought by another customer. So the business rules
                                >must changes and this is justified by saying "the market has changed, we
                                >need to keep customer-focused" instead of admitting that they failed to
                                >anticipate some eventualities.

                                This is why you must embrace change. ;-)


                                >
                                >So what we encounter are "business rules" dreamed up during the ride to
                                >work by excited managers. The "busines rules" are thus very volatile. Now,
                                >how does it help me manage a large enterprise application, if I know that
                                >the developers haven't adequately separated business from data rules? It
                                >doesn't. I'd have to check to see wether the business rule was enforced in
                                >app code or in data access code. It could be in either, if a distinction
                                >had never been drawn between the two. (Since it is a business rule, and
                                >application independent, and data integrity specific, I would hope to find
                                >it in the DB or DAL, but you never know). In any case, there is no
                                >advantage in changing app code over changing DAL / DB code. none at all.
                                >This is the point where I argue for complete decoupling of presentation,
                                >application logic / data flow, data abstraction and access and data
                                >storage. If you plan it right and reduce the interface coupling between
                                >your layers its possible to get the best of everything.


                                Yes, layering is a good thing. Usually.

                                >
                                >We have coded apps in PHP, Java, ASP, .NET and C++ against Oracle, MySQL,
                                >DB2 and MS-SQL. We haven't yet come across a situation where we couldn't
                                >abstract data access to a separate layer, even using PHP against MySQL.
                                >I have noticed that many developers / architects tend to believe that the
                                >coding overhead to doing it this way is huge, and misguidedly believe that
                                >they can gain some efficiency by embedding SQL in their data objects.

                                Then you should invite them to actually performance profile their work. I
                                run into this all the time and in most cases they've never done their homework.


                                > I have also seen some poor fools go to extreme ends to code an "SQL
                                > Generator" in Java, so that they could retain some self esteem and
                                > platform independence whilst nevertheless codifyng SQL Operations into
                                > their classes.

                                You have encapsulation choices. See
                                www.agiledata.org/essays/implementationStrategies.html for basic options.


                                > This is quite mistaken, just as mistaken as the belief that you can
                                > leave testing to after the implementation phase, directly before roll-out :

                                >
                                >What I'm currently missing are some good arguments for putting data access
                                >/ integrity logic into your application code. I can't think of any, but
                                >then I am biased ;)#

                                See that first essay. I'd love to hear feedback.

                                >
                                >Cheers, Rich


                                -Scott


                                ====================================================
                                Scott W. Ambler
                                Senior Consultant, Ronin International, Inc.
                                www.ronin-intl.com/company/scottAmbler.html

                                www.agiledata.org
                                www.agilemodeling.com
                                www.ambysoft.com
                                www.enterpriseunifiedprocess.info
                                www.modelingstyle.info
                                www.ronin-intl.com
                              • Marc Hamann
                                ... While I see what your saying, I do think it is important to emphasize that business value is created by the _combination_ of a) having the meaningful,
                                Message 15 of 26 , Jun 24, 2003
                                  At 03:56 AM 6/24/03, Richard Quinn wrote:
                                  >True, many app's create the data, but we agree that its the data, not the
                                  >apps, which hold the accumulated business value?

                                  While I see what your saying, I do think it is important to emphasize that
                                  business value is created by the _combination_ of a) having the meaningful,
                                  quality data and b) having effective tools and business procedures to use
                                  and maintain that data. You really can't have useful data without both.

                                  >We have coded apps in PHP, Java, ASP, .NET and C++ against Oracle, MySQL,
                                  >DB2 and MS-SQL. We haven't yet come across a situation where we couldn't
                                  >abstract data access to a separate layer, even using PHP against MySQL.

                                  I would agree that a well-defined and centralized data access layer is
                                  almost a requirement, I'm not sure it has to be _physically_ separated; it
                                  could be deployed in a shared app space for example. Again, this might be
                                  a semantic quibble. ;-)

                                  >What I'm currently missing are some good arguments for putting data access
                                  >/ integrity logic into your application code. I can't think of any, but
                                  >then I am biased ;)#

                                  Hopefully this is a clear example of the kind of thing I have in mind.

                                  Imagine a fairly typical example where at the conceptual level you have
                                  Products and Product Types, where each Product can have multiple Types, and
                                  you've implemented this in the typical way with three tables and the schema
                                  enforces the basic foreign key, uniqueness, etc. constraints.

                                  Now it so happens that for most products, any type is possible, but for a
                                  very small number of products certain types don't make sense. The end
                                  users (or their bosses) are worried about mistakes, so they ask that the
                                  interface be set up so that those types are automatically hidden when they
                                  aren't appropriate.

                                  So you have a dilemma: product X with type Y is semantically incorrect,
                                  and technically shouldn't appear in the database, so you could alter all of
                                  your data access code and create complex check triggers to enforce this, or
                                  you can simply add a couple lines of logic to your interface to filter out
                                  inappropriate values.
                                  You could opt for both, but then you have logic for the same thing in two
                                  places.

                                  My own answer to this problem (barring other factors) would be to start by
                                  putting the logic near the interface. If this started to become
                                  unwieldily, say because there were more complicated or simply more
                                  exceptions, I would move this logic to an "access layer for the access
                                  layer". If this became a truly pervasive feature of the data, I would
                                  reorganize my schema, add the trigger logic and move the feature to the
                                  database.

                                  If there are never more than a couple simple exceptions, I have saved
                                  myself "invasive surgery" on the database while still protecting the
                                  business value of the data.

                                  This approach only works if I see the space between the UI and the database
                                  as a continuum rather than a discrete space with fixed boundaries.

                                  Cheers,

                                  Marc
                                • Josh Collier
                                  Why not just use a view, to filter out the invalid product-product type combinations? ... From: Marc Hamann [mailto:marc@hamann.ca] Sent: Tuesday, June 24,
                                  Message 16 of 26 , Jun 24, 2003
                                    Why not just use a view, to filter out the invalid product-product type combinations?
                                     
                                     
                                    -----Original Message-----
                                    From: Marc Hamann [mailto:marc@...]
                                    Sent: Tuesday, June 24, 2003 8:13 AM
                                    To: agileDatabases@yahoogroups.com
                                    Subject: RE: AW: [agileDatabases] Location of Data Logic

                                    At 03:56 AM 6/24/03, Richard Quinn wrote:
                                    >True, many app's create the data, but we
                                    agree that its the data, not the
                                    >apps, which hold the accumulated
                                    business value?

                                    While I see what your saying, I do think it is important to emphasize that
                                    business value is created by the _combination_ of a) having the meaningful,
                                    quality data and b) having effective tools and business procedures  to use
                                    and maintain that data.  You really can't have useful data without both.

                                    >We have coded apps in PHP, Java,
                                    ASP, .NET and C++ against Oracle, MySQL,
                                    >DB2 and MS-SQL. We haven't yet
                                    come across a situation where we couldn't
                                    >abstract data access to a
                                    separate layer, even using PHP against MySQL.

                                    I would agree that a well-defined and centralized data access layer is
                                    almost a requirement, I'm not sure it has to be _physically_ separated; it
                                    could be deployed in a shared app space for example.  Again, this might be
                                    a semantic quibble. ;-)

                                    >What I'm currently missing are some good arguments for putting
                                    data access
                                    >/ integrity logic into your application code. I can't think
                                    of any, but
                                    >then I am biased ;)#

                                    Hopefully this is a clear example of the kind of thing I have in mind.

                                    Imagine a fairly typical example where at the conceptual level you have
                                    Products and Product Types, where each Product can have multiple Types, and
                                    you've implemented this in the typical way with three tables and the schema
                                    enforces the basic foreign key, uniqueness, etc. constraints.

                                    Now it so happens that for most products, any type is possible, but for a
                                    very small number of products certain types don't make sense.  The end
                                    users (or their bosses) are worried about mistakes, so they ask that the
                                    interface be set up so that those types are automatically hidden when they
                                    aren't appropriate.

                                    So you have a dilemma:  product X with type Y is semantically incorrect,
                                    and technically shouldn't appear in the database, so you could alter all of
                                    your data access code and create complex check triggers to enforce this, or
                                    you can simply add a couple lines of logic to your interface to filter out
                                    inappropriate values.
                                    You could opt for both, but then you have logic for the same thing in two
                                    places.

                                    My own answer to this problem (barring other factors) would be to start by
                                    putting the logic near the interface.  If this started to become
                                    unwieldily, say because there were more complicated or simply more
                                    exceptions, I would move this logic to an "access layer for the access
                                    layer".  If this became a truly pervasive feature of the data, I would
                                    reorganize my schema, add the trigger logic and move the feature to the
                                    database.

                                    If there are never more than a couple simple exceptions, I have saved
                                    myself "invasive surgery" on the database while still protecting the
                                    business value of the data.

                                    This approach only works if I see the space between the UI and the database
                                    as a continuum rather than a discrete space with fixed boundaries.

                                    Cheers,

                                    Marc





                                    To unsubscribe from this group, send an email to:
                                    agileDatabases-unsubscribe@yahoogroups.com



                                    Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
                                  • Marc Hamann
                                    ... You would still either have to change the schema to encode the exceptions or have application logic that said: If this product can have all types, use the
                                    Message 17 of 26 , Jun 24, 2003
                                      At 03:10 PM 6/24/03, Josh Collier wrote:
                                      >Why not just use a view, to filter out the invalid product-product type
                                      >combinations?

                                      You would still either have to change the schema to encode the exceptions
                                      or have application logic that said: If this product can have all types,
                                      use the all type view, else use one of the exception views.

                                      It is the choice between those two options (application logic or database
                                      logic) that is under discussion.

                                      Marc
                                    • Richard Quinn
                                      Well, IMHO, a schema error has been made. Your initial assumption that a product can be any, or several, of any product type is in certain cases invalid. The
                                      Message 18 of 26 , Jun 24, 2003
                                        Well, IMHO, a schema error has been made.

                                        Your initial assumption that a product can be any, or several, of any
                                        product type is in certain cases invalid. The usage of 2 primary data
                                        tables and an intersection table for m:n relationship degrees can't cope
                                        with the scenario of restricted product type combinations.

                                        You would need to have [some kind of] a category table, saying that
                                        productID:nnn is of category "standard product", which can be associated
                                        with one or several of n concrete product types. When the business
                                        decides that "special products" can now be related to different product
                                        types, you can easily enforce the new rule entirely with relationships
                                        and approriate records marking them. It is perfectly possible to capture
                                        the semantics of relationship possibilities in an RDBMS without
                                        resorting to triggers or constraints or anything else.

                                        This is a good example of a case where the data could be modelled more
                                        accurately, and in so doing:

                                        it is easily understandable

                                        remains consistent

                                        does not decompose over time

                                        Of course, the increased data complexity will have to be mirrored in
                                        your apps data access logic, but nobody is saying that complex business
                                        rules don'r require complex solutions.

                                        When the business rules change radically, and you then have to face
                                        decisions like this one, then the call about what to do is much harder
                                        to make. Refactoring a 3 table m:n relationship into a multidimensional
                                        conditional relationship schema, on the fly, is a bit tricky, and it is
                                        likely that someone will get seriously upset at the suggestion. I
                                        suppose it would be up to us to warn of the consequences and then go
                                        with whatever economically acceptable approach is approved. (aagh -
                                        corporate sell-out :) )

                                        The approach of first tinkering with the app (and I reckon that most
                                        people would restrict what the GUI allows the users to do) and then when
                                        things dont clear go up, but maybe even go from bad to worse, then start
                                        modifying the schema, can only lead to data decomposition and loss of
                                        value. I'm sure we've all seen normalized DB's where some despairing
                                        coder has started to misuse fields for other purposes, or where the
                                        semantics of the data relationships have been so filtered in the app
                                        that the data structures don't make sense any more. Is this agility, I
                                        wonder?

                                        Best Regards

                                        - Richard

                                        -----

                                        -----Original Message-----
                                        From:
                                        sentto-7758546-247-1056467516-rquinn=web.de@...
                                        [mailto:sentto-7758546-247-1056467516-rquinn=web.de@...
                                        .com] On Behalf Of Marc Hamann
                                        Sent: Tuesday, June 24, 2003 5:13 PM
                                        To: agileDatabases@yahoogroups.com
                                        Subject: RE: AW: [agileDatabases] Location of Data Logic


                                        At 03:56 AM 6/24/03, Richard Quinn wrote:
                                        >True, many app's create the data, but we agree that its the data, not
                                        the
                                        >apps, which hold the accumulated business value?

                                        While I see what your saying, I do think it is important to emphasize
                                        that
                                        business value is created by the _combination_ of a) having the
                                        meaningful,
                                        quality data and b) having effective tools and business procedures to
                                        use
                                        and maintain that data. You really can't have useful data without both.

                                        >We have coded apps in PHP, Java, ASP, .NET and C++ against Oracle,
                                        MySQL,
                                        >DB2 and MS-SQL. We haven't yet come across a situation where we
                                        couldn't
                                        >abstract data access to a separate layer, even using PHP against MySQL.

                                        I would agree that a well-defined and centralized data access layer is
                                        almost a requirement, I'm not sure it has to be _physically_ separated;
                                        it
                                        could be deployed in a shared app space for example. Again, this might
                                        be
                                        a semantic quibble. ;-)

                                        >What I'm currently missing are some good arguments for putting data
                                        access
                                        >/ integrity logic into your application code. I can't think of any, but

                                        >then I am biased ;)#

                                        Hopefully this is a clear example of the kind of thing I have in mind.

                                        Imagine a fairly typical example where at the conceptual level you have
                                        Products and Product Types, where each Product can have multiple Types,
                                        and
                                        you've implemented this in the typical way with three tables and the
                                        schema
                                        enforces the basic foreign key, uniqueness, etc. constraints.

                                        Now it so happens that for most products, any type is possible, but for
                                        a
                                        very small number of products certain types don't make sense. The end
                                        users (or their bosses) are worried about mistakes, so they ask that the

                                        interface be set up so that those types are automatically hidden when
                                        they
                                        aren't appropriate.

                                        So you have a dilemma: product X with type Y is semantically incorrect,

                                        and technically shouldn't appear in the database, so you could alter all
                                        of
                                        your data access code and create complex check triggers to enforce this,
                                        or
                                        you can simply add a couple lines of logic to your interface to filter
                                        out
                                        inappropriate values.
                                        You could opt for both, but then you have logic for the same thing in
                                        two
                                        places.

                                        My own answer to this problem (barring other factors) would be to start
                                        by
                                        putting the logic near the interface. If this started to become
                                        unwieldily, say because there were more complicated or simply more
                                        exceptions, I would move this logic to an "access layer for the access
                                        layer". If this became a truly pervasive feature of the data, I would
                                        reorganize my schema, add the trigger logic and move the feature to the
                                        database.

                                        If there are never more than a couple simple exceptions, I have saved
                                        myself "invasive surgery" on the database while still protecting the
                                        business value of the data.

                                        This approach only works if I see the space between the UI and the
                                        database
                                        as a continuum rather than a discrete space with fixed boundaries.

                                        Cheers,

                                        Marc
                                      • Alexander Tabakov
                                        Hi All, Discussion about data logic reminds me of something I ve read in a Microsoft book abount COM+ They define 3 types of applications regarding concurrency
                                        Message 19 of 26 , Jun 25, 2003
                                          Hi All,

                                          Discussion about data logic reminds me of something I've read in a
                                          Microsoft book abount COM+

                                          They define 3 types of applications regarding concurrency and
                                          synchronization.

                                          Level 1 App - Multiple processes
                                          Level 2 App - Multi threaded application, in which shared data is
                                          protected with some syncronization primitives like
                                          crytical section, etc.
                                          Level 3 App - Applications which uses the database as the ultimate
                                          syncronization mechanism. The idea is that databases
                                          are quite a mature technology and uses much more
                                          sofisticated thechnologies to ensure that data is
                                          consistent.

                                          So the question now is: Should we use Level 2 or Level 3 solution.
                                          Should we implement any possible constraint on the data or just let
                                          the database to do everything for us.

                                          I have heard opinions (especially from MySQL guys) saying that all
                                          constraints must be implemented in the code. Why should we put constraints in the
                                          database and make it slower ??? But what happens if somebody else starts
                                          using your database. Using data access layer is something mandatory but it does not prevents
                                          from somebody else opening an ODBC connection to the database ....

                                          What happens there?

                                          --
                                          Best regards,
                                          Alexander mailto:saho@...
                                        • yahoogroups@jhrothjr.com
                                          ... From: Alexander Tabakov To: Richard Quinn
                                          Message 20 of 26 , Jun 25, 2003
                                            ----- Original Message -----
                                            From: "Alexander Tabakov"
                                            <saho.at.eklektica.com@...>
                                            To: "Richard Quinn"
                                            <agileDatabases.at.yahoogroups.com@...>
                                            Sent: Wednesday, June 25, 2003 4:14 AM
                                            Subject: Re[2]: AW: [agileDatabases] Location of Data Logic


                                            > Hi All,
                                            >
                                            > Discussion about data logic reminds me of something I've read in a
                                            > Microsoft book abount COM+
                                            >
                                            > They define 3 types of applications regarding concurrency and
                                            > synchronization.
                                            >
                                            > Level 1 App - Multiple processes
                                            > Level 2 App - Multi threaded application, in which shared data is
                                            > protected with some syncronization primitives like
                                            > crytical section, etc.
                                            > Level 3 App - Applications which uses the database as the ultimate
                                            > syncronization mechanism. The idea is that databases
                                            > are quite a mature technology and uses much more
                                            > sofisticated thechnologies to ensure that data is
                                            > consistent.
                                            >
                                            > So the question now is: Should we use Level 2 or Level 3 solution.
                                            > Should we implement any possible constraint on the data or just let
                                            > the database to do everything for us.
                                            >
                                            > I have heard opinions (especially from MySQL guys) saying that all
                                            > constraints must be implemented in the code. Why should we put constraints
                                            in the
                                            > database and make it slower ??? But what happens if somebody else starts
                                            > using your database. Using data access layer is something mandatory but it
                                            does not prevents
                                            > from somebody else opening an ODBC connection to the database ....
                                            >
                                            > What happens there?

                                            What happens if you let someone else use your data base?
                                            (Clue: you've got someone else to negotiate with when you need to
                                            change the schema.)

                                            John Roth

                                            >
                                            > --
                                            > Best regards,
                                            > Alexander mailto:saho@...
                                            >
                                            >
                                            >
                                            > To unsubscribe from this group, send an email to:
                                            > agileDatabases-unsubscribe@yahoogroups.com
                                            >
                                            >
                                            >
                                            > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                                            >
                                            >
                                          • Marc Hamann
                                            ... ... OK. So let s add another subtlety to the process. Suppose that, upon further investigation, that the rule about certain products not having
                                            Message 21 of 26 , Jun 25, 2003
                                              At 05:24 PM 6/24/03, you wrote:
                                              >Well, IMHO, a schema error has been made.
                                              <snip>
                                              >When the business rules change radically, and you then have to face
                                              >decisions like this one, then the call about what to do is much harder
                                              >to make.

                                              OK. So let's add another subtlety to the process. Suppose that, upon
                                              further investigation, that the rule about certain products not having
                                              certain types is not enterprise wide, but only applies to a new policy in
                                              one department. Now the job of encoding it into the schema and
                                              coordinating that with the application is even harder, and next week the
                                              department could change it's mind and go back to allow all types for all
                                              products.

                                              Perhaps making the change in the interface for that one department isn't
                                              such a bad idea after all?

                                              > I
                                              >suppose it would be up to us to warn of the consequences and then go
                                              >with whatever economically acceptable approach is approved. (aagh -
                                              >corporate sell-out :) )

                                              This humorous comment triggered something for me about our discussion of
                                              data as a repository of business value. First, I think it would be better
                                              to say that data is _potential_ business value; it doesn't have a defined
                                              value until it is used.

                                              To clarify, think about money. Money represents potential wealth. If it
                                              stuck in the bank and I can't get it out, I'm no better off than if I
                                              didn't have it.

                                              So my bank decides that since money is wealth, their most important job is
                                              to keep it secure. To this end, they decide that banking machines can't be
                                              trusted (because they were built by developers who can't be trusted), and
                                              dismantle them. Now, I have to go talk to a teller to get the money But
                                              they don't trust to hire people off the street for tellers, so there is
                                              only one experienced teller in the bank that can access the money. Now the
                                              line to see the one teller is so long that I can't get through on a lunch
                                              hour, so I have to take special time off to go.

                                              So now I actually _losing_ real wealth, because I'm giving up my vacation
                                              time to get my money, and I can't get at it when I need it, all because the
                                              bank decided their job was to protect money, not to facilitate its _use_.

                                              I'm sure I don't need to spell out how this applies to the discussion. ;-)

                                              >I'm sure we've all seen normalized DB's where some despairing
                                              >coder has started to misuse fields for other purposes, or where the
                                              >semantics of the data relationships have been so filtered in the app
                                              >that the data structures don't make sense any more. Is this agility, I
                                              >wonder?

                                              Why was the coder despairing? Was he unable to talk to the DBA to get
                                              changes made to the schema? Or did the DBA refuse because he didn't want
                                              to mess up "his" schema for the sake of the application?

                                              I'm agreeing with you that the inconsistent use of fields is a bad thing,
                                              but if we accept that both the application and the data are needed to
                                              produce business value, then those responsible for data management and
                                              those responsible for application development need to make decisions
                                              together, trading off between all of the technical values to produce the
                                              most effective solution to the business needs.

                                              That is the agile part: instead of making the data principles drive the
                                              process e.g. "The database must contain all logic that constrains the
                                              data", the driver is the business value e.g. "How can we produce the most
                                              business value with the data by ensuring that it is both accessible and of
                                              high quality." Because it turns out that all those crazy changes the
                                              business people keep asking for is them trying to increase the business
                                              value of their data.

                                              Once the shift of goal is made, suddenly it doesn't seem so crazy to solve
                                              a small problem like my example inside the app. It can be done much faster
                                              than the corresponding schema changes, and that gives the users the
                                              opportunity to see if that solves their problem before too much effort is
                                              invested in the process.

                                              All the best,

                                              Marc
                                            • psadalage
                                              ... drive the ... the ... the most ... accessible and of ... the ... business ... to solve ... much faster ... the ... effort is ... This brings
                                              Message 22 of 26 , Jun 25, 2003
                                                <Marc>
                                                --- In agileDatabases@yahoogroups.com, Marc Hamann <marc@h...> wrote:
                                                > That is the agile part: instead of making the data principles
                                                drive the
                                                > process e.g. "The database must contain all logic that constrains
                                                the
                                                > data", the driver is the business value e.g. "How can we produce
                                                the most
                                                > business value with the data by ensuring that it is both
                                                accessible and of
                                                > high quality." Because it turns out that all those crazy changes
                                                the
                                                > business people keep asking for is them trying to increase the
                                                business
                                                > value of their data.
                                                >
                                                > Once the shift of goal is made, suddenly it doesn't seem so crazy
                                                to solve
                                                > a small problem like my example inside the app. It can be done
                                                much faster
                                                > than the corresponding schema changes, and that gives the users
                                                the
                                                > opportunity to see if that solves their problem before too much
                                                effort is
                                                > invested in the process.
                                                >
                                                > All the best,
                                                >
                                                > Marc
                                                </Marc>

                                                This brings up a excellent point, that the DBA's (or data people)
                                                and the developers (the application people) are not really separate
                                                teams. They have to be part of the same team actively working with
                                                each other and design the data access layer.

                                                This also means that the DBA has to be available for the developers,
                                                I would say that s/he should be sitting along with the developers
                                                and not in some room far off from the developers. The developers
                                                also have to consider the DBA as part of their team. This article
                                                brings up the point again
                                                http://www.martinfowler.com/articles/evodb.html

                                                Having automated tests helps everyone.

                                                Pramod
                                              • Richard Quinn
                                                Hello, ... OK, the data has a _potential_ value. I agree it s no use if you can t|don t access / analysze /use it. [The argument is a bit specious, since you
                                                Message 23 of 26 , Jun 25, 2003
                                                  Hello,

                                                  > This humorous comment triggered something for me about our
                                                  > discussion of
                                                  > data as a repository of business value. First, I think it
                                                  > would be better
                                                  > to say that data is _potential_ business value; it doesn't
                                                  > have a defined
                                                  > value until it is used.

                                                  OK, the data has a _potential_ value. I agree it's no use if you
                                                  can't|don't access / analysze /use it. [The argument is a bit specious,
                                                  since you could safely argue that all artifacts have only a potential
                                                  value - which can be meausured only when used. Its a little
                                                  over-encompasing]. It is especially frustrating when you want to use the
                                                  data, but find that you can't because it doesn't make any sense on its
                                                  own, and can only be interpreted when you use the App it was distributed
                                                  with. If we are talking about one-shot dedicated applications then there
                                                  is no causative need, as I pointed out earlier, to use a database. From
                                                  this follows the argument that the database could eventually feed a
                                                  multitude of apps or subsystems. From here my position is, that we
                                                  should be doing everything possible to retain the usefulness of the
                                                  data, in a long-sighted manner, since it can be used in a multitude of
                                                  ways to create value. I am not saying that all of your business logic
                                                  should be in the database, not at all. I don't work for Oracle, but I
                                                  have heard that this is their position :)

                                                  > That is the agile part: instead of making the data
                                                  > principles drive the
                                                  > process e.g. "The database must contain all logic that constrains the
                                                  > data", the driver is the business value e.g. "How can we
                                                  > produce the most
                                                  > business value with the data by ensuring that it is both
                                                  > accessible and of
                                                  > high quality."

                                                  Yes indeed, how?

                                                  I do believe that responsibility for data integrity, in a far-sighted
                                                  way, should reside in or near the data container. Which makes deciding
                                                  when to overhaul the schema as opposed to when to change the app
                                                  individual decisions to be made when the need arises. Its not really
                                                  helpful being dogmatic, its more important to understand the issues. My
                                                  main point was really about where to put your CRUD data access /
                                                  modification code. Since I believe that many apps could be sharing the
                                                  one data source, and accepting the future possibility that severe schema
                                                  improvements may be necessary to ensure that your data remains valuable,
                                                  my position is to abstract the data access to a layer common to the
                                                  applications and common to the data store. And hey presto you have
                                                  code/component reuse and can shoot a detractor with one of the silver
                                                  bullets you've been collecting :)

                                                  There are some new issues to deal with then, like how do you change the
                                                  schema without immediately breaking the apps, or vice versa, but because
                                                  you have abstracted some common functionality every one should, over the
                                                  long run, have less headaches.

                                                  Best Regards,
                                                  Richard Quinn
                                                Your message has been successfully submitted and would be delivered to recipients shortly.