Loading ...
Sorry, an error occurred while loading the content.
 

Introducing database refactoring to an agile team

Expand Messages
  • bill.mccrosky
    I am the new data architect on an agile development project. Having been a data architect for a long time (gray hair and all), I am also relatively new to
    Message 1 of 19 , Nov 13, 2007
      I am the new data architect on an agile development project. Having
      been a data architect for a long time (gray hair and all), I am also
      relatively new to agile techniques in general. I have read a lot of
      the database refactoring literature and can definitely see the benefits.

      I have begun discussions with the agile team about database
      refactoring and am running into resistance (or possibly just confusion
      resulting from different mindsets). After describing database
      refactoring to the team lead, she argued that the data refactoring was
      being done at the ORM level. She didn't see the benefit of
      refactoring the actual Oracle DDL.

      Having been around databases for a long time, I am concerned about the
      multiplicities of attribute names and attribute properties that must
      be in this agile application (code and DBMS). Am I right to be
      concerned? Or do I need to go back to database refactoring school?
    • Scott Ambler
      ... It s important to discuss ideas/techniques in theory to think things through to try to determine whether they re applicable for you. But, very quickly at
      Message 2 of 19 , Nov 13, 2007
        --- "bill.mccrosky" <mccrosky@...> wrote:


        >
        > I have begun discussions with the agile team about
        > database
        > refactoring and am running into resistance (or
        > possibly just confusion
        > resulting from different mindsets). After
        > describing database
        > refactoring to the team lead, she argued that the
        > data refactoring was
        > being done at the ORM level. She didn't see the
        > benefit of
        > refactoring the actual Oracle DDL.

        It's important to discuss ideas/techniques in theory
        to think things through to try to determine whether
        they're applicable for you. But, very quickly at some
        point you've got to decide to try things out for
        yourself and see how they actually work in practice.
        If you've got your own sandboxes to work in then
        experimenting with refactoring is a fairly painless
        thing to do.


        >
        > Having been around databases for a long time, I am
        > concerned about the
        > multiplicities of attribute names and attribute
        > properties that must
        > be in this agile application (code and DBMS). Am I
        > right to be
        > concerned?

        If it's a problem, what's stopping you from
        refactoring the code and schema to address the issue?

        - Scott

        Scott W. Ambler
        Practice Leader Agile Development, IBM Methods Group
        http://www-306.ibm.com/software/rational/bios/ambler.html
        Agility at Scale: http://www.ibm.com/rational/agile/


        Connect with friends from any web browser - no download required. Try the new Yahoo! Canada Messenger for the Web BETA at http://ca.messenger.yahoo.com/webmessengerpromo.php
      • Markus Samuel Gallagher
        ... ... was ... I don t see how database refactoring can be done at the ORM level. ORM is a modeling language which, in certain products, allows
        Message 3 of 19 , Nov 14, 2007
          --- In agileDatabases@yahoogroups.com, "bill.mccrosky"
          <mccrosky@...>
          wrote:

          > resulting from different mindsets). After describing database
          > refactoring to the team lead, she argued that the data refactoring
          was
          > being done at the ORM level. She didn't see the benefit of
          > refactoring the actual Oracle DDL.

          I don't see how database refactoring can be done at the ORM level.
          ORM is a modeling language which, in certain products, allows
          generation of DDL scripts. However, there is no feature that I know
          of, which would be able to generate schema evolution scripts, let
          alone manage various versions of databases along with incremental,
          automated deployment.

          I would argue that it's important to keep the ORM Model up-to-date
          and that it should be used to do exploratory work on new features
          (spikes) to see what the DB could/should evolve to... however, the
          physical refactoring of the database will still have to be managed
          using the agile database development practices and needs to be
          closely integrated into the overall development process (version
          control, automated builds/tests, deployment, etc).

          In my experience, ORM is a great way of modeling from scratch and
          generating an initial DB in 3NF... but once you are under way with
          data access, object relational mappings, etc... the ORM model
          becomes more of a hindrance than help (again with the exception for
          maybe modeling new features in a sandbox - spikes).

          HTH

          Markus
        • Clifford Heath
          Markus, You have the wrong acronym. Although Object Role Modeling was the first to use it, to most of the industry it now means Object/Relational Mapping, as
          Message 4 of 19 , Nov 14, 2007
            Markus,

            You have the wrong acronym.

            Although Object Role Modeling was the first to use it, to most of
            the industry it now means Object/Relational Mapping, as embodied
            in Hibernate for example.... and IMO rightly an area described as
            "the Vietnam of computer science". I refer to the latter as O/RM to
            distinguish it.

            Funnily enough the original ORM and related fact-based modeling
            techniques offer the way out of this Vietnam, and I'm working on
            a project towards that end.

            You're right in pointing out that existing Object Role Modeling
            tools are not conducive to agile information management, and
            that's why I believe a new query language is required. IOW,
            it's a feature of the tools, not the approach.

            Clifford Heath, Data Constellation, cjh@...

            On 14/11/2007, at 8:57 PM, Markus Samuel Gallagher wrote:

            > --- In agileDatabases@yahoogroups.com, "bill.mccrosky"
            > <mccrosky@...>
            > wrote:
            >
            >> resulting from different mindsets). After describing database
            >> refactoring to the team lead, she argued that the data refactoring
            > was
            >> being done at the ORM level. She didn't see the benefit of
            >> refactoring the actual Oracle DDL.
            >
            > I don't see how database refactoring can be done at the ORM level.
            > ORM is a modeling language which, in certain products, allows
            > generation of DDL scripts. However, there is no feature that I know
            > of, which would be able to generate schema evolution scripts, let
            > alone manage various versions of databases along with incremental,
            > automated deployment.
            >
            > I would argue that it's important to keep the ORM Model up-to-date
            > and that it should be used to do exploratory work on new features
            > (spikes) to see what the DB could/should evolve to... however, the
            > physical refactoring of the database will still have to be managed
            > using the agile database development practices and needs to be
            > closely integrated into the overall development process (version
            > control, automated builds/tests, deployment, etc).
            >
            > In my experience, ORM is a great way of modeling from scratch and
            > generating an initial DB in 3NF... but once you are under way with
            > data access, object relational mappings, etc... the ORM model
            > becomes more of a hindrance than help (again with the exception for
            > maybe modeling new features in a sandbox - spikes).
            >
            > HTH
            >
            > Markus
          • timander37
            Bill, I believe your motives are in the right direction. If you have to fight for it, it s probably not worth doing -- if you can collaborate towards a
            Message 5 of 19 , Nov 14, 2007
              Bill, I believe your motives are in the right direction. If you
              have to fight for it, it's probably not worth doing -- if you can
              collaborate towards a simpler solution, you may end up with a better
              relationship with the agile team and a better data model.

              "she argued that the data refactoring was being done at the ORM level."
              This ORM translation seems like an additional layer of unnecessary
              complexity, if you have the option to refactor the DDL.

              "She didn't see the benefit of refactoring the actual Oracle DDL"
              It's a simpler solution -- maybe not easier, but definitely simpler.

              Tim


              --- In agileDatabases@yahoogroups.com, "bill.mccrosky" <mccrosky@...>
              wrote:
              >
              > I am the new data architect on an agile development project. Having
              > been a data architect for a long time (gray hair and all), I am also
              > relatively new to agile techniques in general. I have read a lot of
              > the database refactoring literature and can definitely see the benefits.
              >
              > I have begun discussions with the agile team about database
              > refactoring and am running into resistance (or possibly just confusion
              > resulting from different mindsets). After describing database
              > refactoring to the team lead, she argued that the data refactoring was
              > being done at the ORM level. She didn't see the benefit of
              > refactoring the actual Oracle DDL.
              >
              > Having been around databases for a long time, I am concerned about the
              > multiplicities of attribute names and attribute properties that must
              > be in this agile application (code and DBMS). Am I right to be
              > concerned? Or do I need to go back to database refactoring school?
              >
            • Pramod Sadalage
              Hi, I think the ORM refactoring is okay for a certain level, when you have to map Multiple tables to one domain object. etc..via Hibernate or Toplink etc. but
              Message 6 of 19 , Nov 14, 2007
                Hi,

                I think the ORM refactoring is okay for a certain level, when you have
                to map Multiple tables to one domain object. etc..via Hibernate or
                Toplink etc. but after certain time your domain objects will not
                reflect the data model and lead to lots of mapping problems and
                certainly lots of understanding/communication problems, you will loose
                clarity of the domain. maybe you want to read about technical debt.
                http://c2.com/cgi/wiki?TechnicalDebt

                I tend to refactor both the schema, and the domain object and the rest
                that goes along with it at the same time.I find that keeps the
                metaphor of the domain in sync with the database and the domain
                objects (application)

                Cheers
                Pramod



                On Nov 13, 2007 12:54 PM, bill.mccrosky <mccrosky@...> wrote:
                >
                >
                >
                >
                >
                >
                > I am the new data architect on an agile development project. Having
                > been a data architect for a long time (gray hair and all), I am also
                > relatively new to agile techniques in general. I have read a lot of
                > the database refactoring literature and can definitely see the benefits.
                >
                > I have begun discussions with the agile team about database
                > refactoring and am running into resistance (or possibly just confusion
                > resulting from different mindsets). After describing database
                > refactoring to the team lead, she argued that the data refactoring was
                > being done at the ORM level. She didn't see the benefit of
                > refactoring the actual Oracle DDL.
                >
                > Having been around databases for a long time, I am concerned about the
                > multiplicities of attribute names and attribute properties that must
                > be in this agile application (code and DBMS). Am I right to be
                > concerned? Or do I need to go back to database refactoring school?
                >
                >
              • David Portas
                ... A new language along the lines of Date and Darwen s D perhaps? Microsoft came up with LINQ, which was a missed opportunity. They opted for familiarity
                Message 7 of 19 , Nov 14, 2007
                  On 14/11/2007, Clifford Heath <clifford.heath@...> wrote:
                  >
                  >
                  > You're right in pointing out that existing Object Role Modeling
                  > tools are not conducive to agile information management, and
                  > that's why I believe a new query language is required. IOW,
                  > it's a feature of the tools, not the approach.
                  >

                  A new language along the lines of Date and Darwen's "D" perhaps?
                  Microsoft came up with LINQ, which was a missed opportunity. They
                  opted for familiarity and too many compromises.

                  David
                • Clifford Heath
                  ... A lot of what MS are doing with Linq isn t far off the mark actually, but I have some issues with embedding query plans into executable code... even if the
                  Message 8 of 19 , Nov 19, 2007
                    On 15/11/2007, at 7:37 AM, David Portas wrote:
                    > On 14/11/2007, Clifford Heath <clifford.heath@...> wrote:
                    >> You're right in pointing out that existing Object Role Modeling
                    >> tools are not conducive to agile information management, and
                    >> that's why I believe a new query language is required. IOW,
                    >> it's a feature of the tools, not the approach.
                    > A new language along the lines of Date and Darwen's "D" perhaps?
                    > Microsoft came up with LINQ, which was a missed opportunity. They
                    > opted for familiarity and too many compromises.

                    A lot of what MS are doing with Linq isn't far off the mark actually,
                    but I have some issues with embedding query plans into executable
                    code... even if the code can be executed remotely inside the database
                    engine, and even if it's subject to query optimisation, which seems to
                    be the direction MS are going.

                    The relational model, even if all D's goals were met by a new language,
                    would still engender many of the same problems that SQL is blamed for.
                    I know it sounds like heresy, but the relational model is perfect for
                    static
                    systems, but not great for evolving ones. The problem is that the kinds
                    of change during an application's evolution are often small at the
                    conceptual level, but have much bigger impacts at the relational level.
                    I have written a more detailed description of why this is in an upcoming
                    paper on my new conceptual query language, CQL.

                    Conceptual information models have the property of elementarity.
                    This means that every fact type is as simple as it can be, and every
                    fact says exactly one thing, never more. This is patently not true of
                    the relational model, where the normalisation process exists to find
                    facts that should be grouped together because of their shared
                    functional dependence on the same key. The grouping is important
                    when designing physical storage, but it obfuscates the conceptual model.

                    The problem is that the attributes of a relation are accessed using
                    different syntax from that used to access fact types that don't
                    represent
                    a functional dependency - the join is used instead. This difference
                    is not
                    just an SQL thing, its a feature of the relational model. The result
                    is that
                    when the functional dependencies change during evolution, or when
                    attribute migration occurs, nearly every query needs to be revisited.
                    CQL does not have this difference in syntax for different fact types,
                    and
                    so does not suffer from the same problem. It translates to SQL
                    internally,
                    but can do so dynamically.

                    --
                    Clifford Heath, Data Constellation.
                    Agile Information Management and Design.
                  • Curt Sampson
                    ... I m afraid I disagree with this; in fact it seems to me a major misunderstanding of the relational model. Here s my understanding of the matter: In the
                    Message 9 of 19 , Nov 20, 2007
                      On Tue, Nov 20, 2007 at 05:11:54PM +1100, Clifford Heath wrote:
                      >
                      > Conceptual information models have the property of elementarity.
                      > This means that every fact type is as simple as it can be, and every
                      > fact says exactly one thing, never more. This is patently not true of
                      > the relational model, where the normalisation process exists to find
                      > facts that should be grouped together because of their shared
                      > functional dependence on the same key. The grouping is important
                      > when designing physical storage, but it obfuscates the conceptual model.

                      I'm afraid I disagree with this; in fact it seems to me a major
                      misunderstanding of the relational model. Here's my understanding of the
                      matter:

                      In the relational model, data are addressed by content. That means, for
                      each datum, we need a candidate key that identifies it, which leads us
                      to the whole concept of functional dependencies: it's how we find out
                      which candidate key identifies that datum.

                      Beyond that, your definition of "fact" seems rather limited; there is
                      a near-infinite number of different "facts" or even "fact types" in
                      any reasonably sophisticated database, because the RDBMS infers facts
                      for us from other facts. It will, for example, in a correctly designed
                      database, infer from these facts (each of which corresponds to a tuple
                      in a relation):

                      * there exists a company, C
                      * this company has a purchasing agent, A1
                      * this company has a second purchasing agent, A2
                      * there are no other purchasing agents than these two
                      * agent A1 made an order on date D1
                      * agent A2 made an order on date D2
                      * agent A2 made an order on date D3
                      * there are no other orders from A1 and A2

                      this fact, when given the appropriate query:

                      * company C has made exactly 3 orders in all time covered by the database.

                      Thus, this "grouping" of data into relations has nothing to do with
                      physical storage (which the relational model does not even address), and
                      has everything to do with schema and data integrity: making sure that
                      the data are addressable and logically coherent.

                      cjs
                      --
                      Curt Sampson <cjs@...> +81 90 7737 2974
                      Starling Software <office-admin@...>
                      Mobile sites and software consulting: http://www.starling-software.com
                    • Andrew Gregovich
                      Hi Clifford I think you missed the point. The relational model is the simplest logical model there is - any hierarchical and network models have failed
                      Message 10 of 19 , Nov 20, 2007
                        Hi Clifford

                        I think you missed the point. The relational model is the simplest logical model there is - any hierarchical and network models have failed miserably in the past due to their inability to easily evolve with changes. The problem with most RDBMS is that the separation of logical model vs. physical storage is not always present and most importantly SQL itself is badly designed. If you look for Fabian Pascal's articles, you'll see much discussion on this topic. His comments can be quite harsh but I tend to find myself in agreement with them.

                        Regards

                        Andrew

                        Clifford Heath <clifford.heath@...> wrote: On 15/11/2007, at 7:37 AM, David Portas wrote:
                        > On 14/11/2007, Clifford Heath <clifford.heath@...> wrote:
                        >> You're right in pointing out that existing Object Role Modeling
                        >> tools are not conducive to agile information management, and
                        >> that's why I believe a new query language is required. IOW,
                        >> it's a feature of the tools, not the approach.
                        > A new language along the lines of Date and Darwen's "D" perhaps?
                        > Microsoft came up with LINQ, which was a missed opportunity. They
                        > opted for familiarity and too many compromises.

                        A lot of what MS are doing with Linq isn't far off the mark actually,
                        but I have some issues with embedding query plans into executable
                        code... even if the code can be executed remotely inside the database
                        engine, and even if it's subject to query optimisation, which seems to
                        be the direction MS are going.

                        The relational model, even if all D's goals were met by a new language,
                        would still engender many of the same problems that SQL is blamed for.
                        I know it sounds like heresy, but the relational model is perfect for
                        static
                        systems, but not great for evolving ones. The problem is that the kinds
                        of change during an application's evolution are often small at the
                        conceptual level, but have much bigger impacts at the relational level.
                        I have written a more detailed description of why this is in an upcoming
                        paper on my new conceptual query language, CQL.

                        Conceptual information models have the property of elementarity.
                        This means that every fact type is as simple as it can be, and every
                        fact says exactly one thing, never more. This is patently not true of
                        the relational model, where the normalisation process exists to find
                        facts that should be grouped together because of their shared
                        functional dependence on the same key. The grouping is important
                        when designing physical storage, but it obfuscates the conceptual model.

                        The problem is that the attributes of a relation are accessed using
                        different syntax from that used to access fact types that don't
                        represent
                        a functional dependency - the join is used instead. This difference
                        is not
                        just an SQL thing, its a feature of the relational model. The result
                        is that
                        when the functional dependencies change during evolution, or when
                        attribute migration occurs, nearly every query needs to be revisited.
                        CQL does not have this difference in syntax for different fact types,
                        and
                        so does not suffer from the same problem. It translates to SQL
                        internally,
                        but can do so dynamically.

                        --
                        Clifford Heath, Data Constellation.
                        Agile Information Management and Design.





                        ---------------------------------
                        Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now.

                        [Non-text portions of this message have been removed]
                      • David Portas
                        ... Further to Curt s remarks (which I agree with). The problem you are describing seems to be the one addressed by 6th Normal Form - ie the reduction of all
                        Message 11 of 19 , Nov 20, 2007
                          On 20/11/2007, Clifford Heath <clifford.heath@...> wrote:
                          >
                          > Conceptual information models have the property of elementarity.
                          > This means that every fact type is as simple as it can be, and every
                          > fact says exactly one thing, never more. This is patently not true of
                          > the relational model, where the normalisation process exists to find
                          > facts that should be grouped together because of their shared
                          > functional dependence on the same key. The grouping is important
                          > when designing physical storage, but it obfuscates the conceptual model.
                          >

                          Further to Curt's remarks (which I agree with). The "problem" you are
                          describing seems to be the one addressed by 6th Normal Form - ie the
                          reduction of all relations to a single dependency involving no more
                          than one non-key attribute.

                          Normalisation has nothing to do with physical storage.

                          --
                          David
                        • Clifford Heath
                          ... That is *exactly* what I m talking about when I used the term elementary form . I made the assumption that folk here would be more familiar with
                          Message 12 of 19 , Nov 20, 2007
                            > Further to Curt's remarks (which I agree with). The "problem" you are
                            > describing seems to be the one addressed by 6th Normal Form

                            That is *exactly* what I'm talking about when I used the term
                            "elementary form". I made the assumption that folk here would
                            be more familiar with fact-based modeling. It is relational, but
                            further normalised than is practicable with current DBMS products.
                            We need an efficient DBMS that allows us to define a database
                            *and query it* in this form, but that internally makes efficiency
                            optimisations back to a lower normal form... there isn't one. And
                            that's what has to change.

                            > Normalisation has nothing to do with physical storage.

                            No, but the reason we don't use 6th normal form does. It'd make
                            operations on the database take forever due to excessive joining.
                            But conceptually, the joins are still there, even when we store them
                            using lower normal form, but because we we aggregate attributes
                            that share the same key, we then access such facts *differently*.
                            And that means that when the criteria for aggregation change -
                            which happens much more often than a change in the *shape* of
                            the elementary fact - the joins we wrote are wrong.

                            Curt:
                            Everything you've written is correct, except that you misunderstood
                            fact-based modeling. I suggest you do some reading on the topic.
                            Fact-based models are just as adamant about identification as
                            the relational model is, and more capable of preserving coherence.
                            Relations lose much semantic information that's representable in a
                            fact-based model, which is amply presented in Terry Halpin's book,
                            "Information Modeling and Relational Databases". I hope you aren't
                            suggesting that *he* doesn't "get it"? The additional semantic content
                            allows us to declare, for example, hierarchies and symmetric
                            associations, which are awkward with relations.

                            > when given the appropriate query

                            The appropriate query can quite hard to write when it involves joins,
                            whether you're writing it in SQL or D, but with elementary form it's
                            directly accessible to business users. Just look at the amazing
                            semantic modelling features available in the new Microsoft Reporting
                            Services and you'll see what I mean. It's a shame that this power
                            isn't built in to the DBMS as it should be, but it's a start, more is to
                            come.

                            SQL is a problem, but in regard to agility, the use of <6NF is also a
                            problem.

                            Clifford Heath.
                          • David Portas
                            ... If the problem is about performance then it can only be solved by making better implementations of the model. It makes no sense to reject the Relational
                            Message 13 of 19 , Nov 20, 2007
                              On 20/11/2007, Clifford Heath <clifford.heath@...> wrote:
                              >
                              >
                              > > Normalisation has nothing to do with physical storage.
                              >
                              > No, but the reason we don't use 6th normal form does. It'd make
                              > operations on the database take forever due to excessive joining.
                              > But conceptually, the joins are still there, even when we store them
                              > using lower normal form, but because we we aggregate attributes
                              > that share the same key, we then access such facts *differently*.
                              > And that means that when the criteria for aggregation change -
                              > which happens much more often than a change in the *shape* of
                              > the elementary fact - the joins we wrote are wrong.
                              >

                              If the problem is about performance then it can only be solved by
                              making better implementations of the model. It makes no sense to
                              reject the Relational Model on grounds of performance because
                              performance isn't a feature of the model at all. I fail to see why you
                              think this line of reasoning justifies a new query language.

                              --
                              David
                            • Clifford Heath
                              ... To follow up and address your misunderstanding of fact-based modeling, I should say that because it has a parallel history with relational modeling, it has
                              Message 14 of 19 , Nov 20, 2007
                                On 20/11/2007, at 9:45 PM, Curt Sampson wrote:
                                > Beyond that, your definition of "fact" seems rather limited

                                To follow up and address your misunderstanding of fact-based
                                modeling, I should say that because it has a parallel history with
                                relational modeling, it has its own jargon. I made the mistake of
                                assuming more understanding of that jargon than I should have.
                                It also has roots that are just as deep in mathematical logic as
                                relational modeling, and so there are equivalences for almost
                                everything. I'll give a brief summary of some of those here.

                                Firstly, domains in ER are called data types in FBM. These are
                                the basic lexical types; they need no further identification (no key).
                                Value Types (like GivenName) are object types associated with
                                exactly one data type.

                                Entity Types are also object types, but are non-lexical things like
                                Customer. All ETs *must* have at least one identifier (key), with
                                one being "preferred". The identifier is formed from its roles in
                                one or more fact types. So if we have the VTs GivenNames and
                                FamilyName and an entity type Person, and the fact types "Person
                                has FamilyName" and "Person has GivenNames", we might choose
                                the identifier (FamilyName, GivenNames) for Person.

                                Fact types link object types (both entity types and value types) in
                                a relationship that carries a reading. In the examples above, both
                                the readings use simply "has", but any linking words are allowed.
                                A fact type may be unary, binary, or higher-order (though some
                                styles of fact-based modeling only support binary, for example RDF).

                                Fact types may also act as objects (they may be "objectified").
                                So if we have the fact type "Person directs Company", the fact
                                type might be named "Directorship", which would be expressed in
                                CQL as "Directorship = Person directs Company". I can now use
                                Directorship in further fact types: "Directorship began on Date".

                                An example of a unary fact type is "Person is an adult". It maps to
                                a boolean attribute. An example of a higher order fact type is
                                "Quantity of Product is required in Region for Month", a quaternary.

                                Any fact type may be mapped to a 6NF relation by replacing each
                                entity type by its identifying role values. So the "PersonIsAdult"
                                table consists just of the foreign key to the Person table.

                                The mandatory requirement that every fact type is elementary is
                                identical to the 6NF requirement. However, the expression "Person
                                is an adult" is much easier to understand than the 6NF relation:
                                PersonIsAdult(FamilyName, GivenNames). The relation contains
                                semantic information only in its name, and even that is obfuscated
                                by needing to enumerate the key attributes. They're functionally
                                equivalent, but in "Person is an adult", I *know* that Person has a
                                key, I don't need to explain it. The entity name is a shorthand for
                                "The entity identified by the key (....)". Furthermore it's possible
                                to have other relations that have the same key, and the relation
                                name is the only thing to distinguish them.

                                Now, if I want to, for example, average the incomes of all adults
                                in my Person table, I must join it to the PersonIsAdult table:

                                select AVG(Income) from Person
                                where exists(select * from PersonIsAdult
                                where Person.FamilyName = PersonIsAdult.FamilyName
                                and Person. GIvenNames = PersonIsAdult. GIvenNames)

                                In relational algebra, the join between the PK and FK fields must
                                also be explicitly enumerated. In CQL, it's just the derived fact type:

                                average adult Income:
                                Person earns Income.avg (as average adult Income) and
                                Person is an adult;

                                No keys anywhere, yet the meaning is the same. If I have the
                                person's ZipCode, and want average adult income grouped by
                                zip code, I can simply say:

                                average Income by ZipCode:
                                Person earns Income.avg (as average Income) and
                                Person resides within ZipCode area and
                                Person is adult;

                                Again, no keys, no joins (or rather, *every* fact invocation is a join),
                                no "distinct" or "group by" operators, and it's entirely clear and
                                readable, at least compared to the SQL. But not just SQL; any
                                relational language has the same problem because its joins need
                                explicit PK/FK declarations; the join path is in terms of attributes,
                                not the facts those attributes represent.

                                Now you'll argue that "is adult" would have been just a column, and
                                the SQL would be much simpler, and you're right - but then you don't
                                have fully normalised data any more, and you lose the clarity that
                                comes from the elementary form. You have a non-elementary
                                (compound) fact type given by the relation:

                                Person(FamilyName, GivenNames, Income, IsAdult, ZipCode)

                                or by the CQL compound fact declaration (though this isn't legal):

                                Person is known by FamilyName and GivenNames and has Income
                                and is adult and resides within ZipCode area;

                                and it's not clear that any person can have an income regardless
                                of being an adult and vice versa. The rules aren't clear any more.
                                We might have to introduce check constraints within the relation to
                                enforce such a rule, when in fact, the constraint is *external* to the
                                elementary fact types.

                                In addition, a change to the storage arrangement caused by
                                absorbing "IsAdult" into the Person relation forced a re-write of the
                                relational query. The CQL query however is *unchanged*. IDENTICAL.
                                It just maps to different SQL underneath, but the syntax, usage and
                                meaning is identical. That's why CQL queries are more stable in
                                evolution.

                                This kind of unnecessary refactoring is needed in almost every
                                upgrade of every commercial application. Resistance to the extra work
                                required is one of the main causes of data quality issues in deployed
                                products - it's why their schemas turn to crap. These costs (either
                                refactoring or data quality costs) can be avoided.

                                For efficient storage, fact types must be compounded, but for definition
                                and querying, they should be elementary. The production of and
                                mapping to a compound model can be transparently automated. The
                                stability that comes from working with the elementary (6NF, if you will)
                                is the core value that fact-based modeling can add to implementations
                                of the relational model. It's not a step back, but a step forward. I
                                hope
                                this message makes this clearer for you all.

                                Clifford Heath, Data Constellation.
                              • Clifford Heath
                                ... No-one s rejecting the relational model. Fact-based models are a *refinement* of the relational model that allows capture of more semantic information, and
                                Message 15 of 19 , Nov 20, 2007
                                  On 21/11/2007, at 8:20 AM, David Portas wrote:
                                  > If the problem is about performance then it can only be solved by
                                  > making better implementations of the model. It makes no sense to
                                  > reject the Relational Model

                                  No-one's rejecting the relational model. Fact-based models are a
                                  *refinement* of the relational model that allows capture of more
                                  semantic information, and that extra information supports a better
                                  query language without losing anything from the relational model.

                                  Now read what I wrote again, because at the moment, you just
                                  seem to be defending your religion.

                                  Clifford Heath
                                • David Portas
                                  I m familiar with fact-based modeling. I m just attempting to understand what problem you are trying to solve. ... That s incorrect. Joins in relational
                                  Message 16 of 19 , Nov 20, 2007
                                    I'm familiar with fact-based modeling. I'm just attempting to
                                    understand what problem you are trying to solve.

                                    On 20/11/2007, Clifford Heath <clifford.heath@...> wrote:
                                    >
                                    > any
                                    > relational language has the same problem because its joins need
                                    > explicit PK/FK declarations;

                                    That's incorrect. Joins in relational languages need no such thing.

                                    > ... the join path is in terms of attributes,
                                    > not the facts those attributes represent.

                                    That's correct.

                                    > average adult Income:
                                    > Person earns Income.avg (as average adult Income) and
                                    > Person is an adult;

                                    In relational terms this seems equivalent to something like:

                                    AVERAGE( (Person SEMIJOIN PersonIsAdult), income );

                                    ok I inventend an "AVERAGE" operator, which seems quite reasonable to
                                    me. No keys or SQL-style ON clause because natural join is implicit.

                                    Just to be clear about the question I meant to imply in my previous
                                    post: If 6NF solves the problem then why not use 6NF together with a
                                    relational query language (not SQL)? You indicated that 6NF implied
                                    some performance penalty, which is not so because normalisation is
                                    obviously a purely logical matter.

                                    The consequence of your model seems to be exactly the kind of
                                    navigational structure that RM avoids.

                                    --
                                    David
                                  • Clifford Heath
                                    ... What I mean is that I need to explicitly declare the join as being across the attribute pairs that are being *treated as keys*. With multi-part keys that
                                    Message 17 of 19 , Nov 21, 2007
                                      On 21/11/2007, at 6:00 PM, David Portas wrote:
                                      >> any
                                      >> relational language has the same problem because its joins need
                                      >> explicit PK/FK declarations;
                                      > That's incorrect. Joins in relational languages need no such thing.

                                      What I mean is that I need to explicitly declare the join as being
                                      across the attribute pairs that are being *treated as keys*. With
                                      multi-part keys that becomes too onerous and is difficult for the
                                      business domain user to understand. To some extend, matching
                                      based on column headings can help, but where two tables share
                                      two separate relationships, it's impossible (like where person is
                                      employee and person is customer, for example).

                                      >> average adult Income:
                                      >> Person earns Income.avg (as average adult Income) and
                                      >> Person is an adult;
                                      > In relational terms this seems equivalent to something like:
                                      > AVERAGE( (Person SEMIJOIN PersonIsAdult), income );

                                      Yes. One is an algebraic representation, one is based on
                                      relational calculus and logic programming. Neither has a
                                      commercially deployed implementation. I plan to change that.

                                      > Just to be clear about the question I meant to imply in my previous
                                      > post: If 6NF solves the problem then why not use 6NF together with a
                                      > relational query language (not SQL)? You indicated that 6NF implied
                                      > some performance penalty, which is not so because normalisation is
                                      > obviously a purely logical matter.

                                      The use of 6NF in any existing commercial DBMS implies a
                                      performance penalty. I fail to see how you think that doesn't
                                      mean what I said, for any practical purpose. 6NF is not currently
                                      commercially practicable, despite its definite advantages.

                                      > The consequence of your model seems to be exactly the kind of
                                      > navigational structure that RM avoids.

                                      On the contrary. The fact types document the *reasons* why the
                                      relational structure was created, and serve as an explanation to
                                      someone who wouldn't be able to use the algebraic form of the
                                      queries. In fact, it becomes almost trivial to construct an end-user
                                      query tool using drag and drop, which can be used by a domain
                                      expert non-programmer to construct the most complex queries.
                                      Such tools have been built - I have one here that Microsoft built
                                      but never released - it's amazing. I've put computer nuff-nuffs in
                                      front of it and pretended to be a businessman wanting answers,
                                      and without *any* training or explanation they can construct any
                                      query I can describe. I don't know why MS didn't sell it, except
                                      perhaps that it would have been too disruptive to too many DBA's
                                      careers. Hey, that wouldn't be why you're opposing me would it? :-)

                                      The structure underlying a fact-based layer is still relational,
                                      probably in 3NF, maybe more or less depending on performance
                                      requirements (and maybe even dynamically changing based on
                                      observed data access patterns) yet the application and ad-hoc
                                      queries are always built in terms of 6NF, which makes them more
                                      stable against changes in the physical model.

                                      Isn't that what you want? How else would an efficient 6NF DBMS
                                      work?

                                      The fact that the elementary (6NF) form can be comprehensibly
                                      verbalised, and almost plain-language queries can have a formal
                                      interpretation that maps to relational expressions is a bonus, in my
                                      view. It brings the whole problem within the comprehension of the
                                      domain expert, which offers a way to minimise specification problems
                                      and build the right thing. Automation over the semantic model can
                                      also help build it right. It's the way of the future...

                                      I really wonder what you mean by claiming familiarity with fact-based
                                      modeling if you can't see (some of) this...

                                      Clifford Heath.
                                    • Gabriel Tanase
                                      Hello Mr. Heath, I seem to remember that, for ORM fact-based models, Prof. Halpin devised an algorithm to transform a fact-based schema into a
                                      Message 18 of 19 , Nov 22, 2007
                                        Hello Mr. Heath,

                                        I seem to remember that, for ORM fact-based models, Prof. Halpin devised an
                                        algorithm to transform a fact-based schema into a logically-minimal 3NF
                                        relational schema. I hope I'm not confused...
                                        Are you aware of anybody having attempted to find an algorithm that
                                        transforms a 6NF schema, perhaps considered together with a finite set of
                                        queries against that schema, into a 3NF or BCNF schema from which results
                                        for all equivalent queries could be obtained efficiently by today's SQL
                                        engines ?

                                        Thanks,
                                        Gabriel Tanase


                                        On 11/22/07, Clifford Heath <clifford.heath@...> wrote:
                                        >
                                        >
                                        > The use of 6NF in any existing commercial DBMS implies a
                                        > performance penalty. [...] 6NF is not currently
                                        > commercially practicable, despite its definite advantages.
                                        >
                                        >
                                        > Clifford Heath.
                                        >
                                        >


                                        [Non-text portions of this message have been removed]
                                      • Clifford Heath
                                        ... No, that s absolutely right. It s pretty simple actually, I have most my version of Rmap implemented. Finishing it isn t first priority though... I have
                                        Message 19 of 19 , Nov 22, 2007
                                          On 22/11/2007, at 8:42 PM, Gabriel Tanase wrote:
                                          > I seem to remember that, for ORM fact-based models, Prof. Halpin
                                          > devised an
                                          > algorithm to transform a fact-based schema into a logically-minimal
                                          > 3NF
                                          > relational schema. I hope I'm not confused...

                                          No, that's absolutely right. It's pretty simple actually, I have
                                          most my version of Rmap implemented. Finishing it isn't
                                          first priority though... I have several such projects on the
                                          way towards a larger goal.

                                          > Are you aware of anybody having attempted to find an algorithm that
                                          > transforms a 6NF schema,

                                          It's essentially the same problem. In fact 6NF is simpler,
                                          because you don't have to reduce subtypes or deal with
                                          ring constraints (like hierarchies, etc) - things that the
                                          relational model can *represent* but not *model*.

                                          I'm doing the reverse engineering from relational also,
                                          which I have working for Oracle, SQL Server and DB2
                                          already. MySQL and PostgreSQL are to follow.

                                          > perhaps considered together with a finite set of
                                          > queries against that schema, into a 3NF or BCNF schema from which
                                          > results
                                          > for all equivalent queries could be obtained efficiently by today's
                                          > SQL
                                          > engines ?

                                          Exactly what I'm doing, but from the elementary fact form,
                                          which as I said reduces to 6NF by subtype flattening, etc.

                                          Queries against the fact model are much easier to read and
                                          write - SQL against a 6NF database is a total disaster, just
                                          the road to hell. But as far as I know, there's never been a
                                          text-only form of ORM, let alone a query language for it. Terry
                                          keeps saying he's going to do it, but I'm too impatient and have
                                          recently sent him examples of my grammar, which is already
                                          parsing most of the DDL and queries - I'm starting to add the
                                          semantic layer under the language now.

                                          Clifford Heath, Data Constellation.
                                          Agile Information Management and Design.
                                        Your message has been successfully submitted and would be delivered to recipients shortly.