Loading ...
Sorry, an error occurred while loading the content.

Re: [agileDatabases] Maximum length constraints for agility

Expand Messages
  • Dawn Wolthuis
    On 7/31/06, Jay wrote: ... That is the case for the normal forms related to functional dependencies, with which I agree data should be
    Message 1 of 36 , Aug 1, 2006
    • 0 Attachment
      On 7/31/06, Jay <arboj@...> wrote:
      Dawn wrote:
      > >We should consider whether we would want the restrictions of an input device to dictate database-level constraints. We might also consider that today's typical screen entry or data exchange has no such issues. While screen size is still a factor for designers, this technology requirement need not >whittle down our original business requirements. I feel comfortable relegating this reason for a maxlen to the past.
      >
      > Relational databases attempt to implement a very fundamental and worth-while (in my opinion) objective. Their objective is to protect the integrity of what is often the life-blood of an organization: Its data. The goal of normalization, whether it be 1NF,2NF,3NF or 4NF, is the same: Minimize the chances of "bad" or incorrect data from making it's way into your database.

      That is the case for the normal forms related to functional
      dependencies, with which I agree data should be modeled. The
      traditional 1NF (it has been redefined by many, including Date) is
      different from the others. The purpose of 1NF was to simply the
      underlying mathematics rather than including nested relations or
      lists, for example. Most relational theorists today do not think that
      is essential and SQL-99 and following permit relation-valued
      attributes if I recall correctly. I'm not aware of many people using
      such features, however, making me think they have not been implemented
      in a way that is at all enticing.

      >
      > An institutions data can and often does come from varied and disparate sources. These sources consist of internal data entry systems, external electronic interfaces, EDI or XML exchanges, just to name a few. As a result, many (if not most) databases contain data that originated from more than one application. In addition, these databases often find themselves integrated with and exchanging data with older "less constrained" data repositories. Ensuring this data meets some minimum standards of quality before "admitting" it into your RDBMS database is what database constraints are all about.

      I agree.

      > A field length constraint is just one type of such a constraint.

      One point I am making is that in many cases it is not a constraint
      that helps with data quality. If a company name is: The Full Company
      Name with Associates and you truncate it because the database has a
      max length constraint designed into it, you are tampering with
      accurate data, forcing data NOT to be accurate. We should not be
      giving the company name a maximum length unless there is a world
      standards body that ensure that company names across the world,
      perhaps even when translated into a particular language, never exceed
      n characters.

      > If you're willing or even eager, to eliminate the field length constraint, are you comfortable relying on each of these external data sources to determine what a makes valid postal code?

      No. I definitely want a layer through which all Create, Update, and
      Deletes happen that verifies that either the source is trusted or
      (re)validates the data. Some have indicated that unless this layer is
      tightly coupled with the DBMS that cannot happen, but I disagree. I'm
      not a builder, so I don't know if this analogy will make sense to
      anyone but me, but if you do not pour walls and foundation together
      with the same materials, is it possible that you will build a new
      house where the basement could flood? Yes, but there are building
      industry standard best practices to keep from doing that. The cost of
      tightly coupling the foundation and the walls is too great to build
      that way. The risk in building the house with a leak if best
      practices are not followed is not high enough to pour the walls and
      the basement together. Perhaps there are some exceptions to this.

      > For example, a US zip code is usually modeled and constrained to digits. It should be at least 5 digits long and may be as long as 9 depending on whether you're using two fields or one.

      As with many other "codes", the postal code is definitely an attribute
      where there really is a maxlen (again I would not want to do so in the
      DBMS from a theory perspective, even though it would only be practical
      to do so when using a SQL-DBMS today). So, yes, I would want any
      software product to contrain the 5 and the 4 digit parts to those
      numbers.

      > What would be the point in removing a length constraint of this nature? I suppose if zip codes ever grew to 20 digits you wouldn't need to change your data model.

      In the case of codes, there typically are and should be max length
      constraints. In the case of the US zip code, there should even be
      exact length constraints. As I initially talked about in the blog
      entry prior to this one and also alluded to here, I would want those
      length constraints to be handled outside of what is typically called
      the DBMS, or perhaps simply handled differently by the DBMS (this is a
      matter of definition of the DBMS). I think the persistence layer and
      the constraint layer should not be so tightly coupled as they are in
      SQL-DBMS products today. I have seen much more agile solutions when
      these are not as tightly coupled.

      > My experience has been that such changes are far more rare than feature driven database changes. Usually it's changes like "We want to support multiple mailing addresses in the next release of Appwidget".

      Yes, and the non-1NF discussion would be relevant in that case. Each
      of the topics I have selected from multivalues, to 2-valued-logic, to
      variable length data, to providing less coupling between constraint
      handling and persistence, is an area that I think contributes to many
      DBMS's being relatively inflexible compared to DBMS's that are
      sometimes called "embedded" such as Cache' or IBM U2.

      >Where should the logical maxlen be placed in a logical data model?

      If there is a logical maxlen, then it should be in the logical data
      model. In that case, I only contest the implementation, about who
      should care about it. If there is not a logical maxlen, then it
      should not be included in the logical data model.

      > > Given that all UI and web service validation routines need access to this >constraint information, it can be made available by being captured as other business rules might be, in the database itself.
      > >
      > > I suggest that the database proper (rather than the schema) is a good place for specifying all business constraints. There must then be enforcement of >the use of standard CRUD services that employ such constraint logic for all software maintaining the database.
      >
      > If I understand you correctly, you're suggesting that max length constraints be persisted in a user-defined business table rather than the schema itself. By doing this, you're re-implementing something that has already been done!

      Yes.

      > The databases underlying "catalog" or schema definition consists of tables that define the rules by which data is either rejected or accepted into the database. These rules are easily configured and entered into the database through DDL statements.

      Which does not make them particularly easy to roll out for maintenance
      by end-users, right? I'm not suggesting that all business rules be
      made maintainable by end-users, but am suggesting that all metadata be
      managed like other data.

      >
      > It's about separation of responsibility and I'm of the opinion that a database ought to protect itself from "bad data".

      And perhaps if instead of having construction companies that managed
      the construction of a house, we had people whose sole job was to
      ensure that basements never leaked, then we would have the foundations
      people pouring the concrete walls of the basement and the first floor
      at the same time. But when we look at the projects as a whole, there
      might just be a better way to do it, even if there are risks, as
      always, if the professionals do not follow best practices.

      My take is that software construction and maintenance teams definitely
      do need to protect against bad data, but that we should pull it out of
      the DBMS (or put it into the DBMS differently, again depending on
      definitions).

      > > I would venture that the application of the maxlen in places where the proposition should have no such constraint, in addition to the tight coupling between the logical and physical maxlen, has cost the industry and individual organizations a bundle. It is but one area related to constraint handling where we have been bleeding, but one about which I hear little mention—and no noise, no fix, I'm guessing.
      >
      > I think there's no noise because there's no problem.

      This is just one of many areas that make many of today's DBMS's less
      agile than they could be. While I am all for optimizing the
      flexibility of software "products" given the state of the DBMS today
      using the techniques found in Refactoring Databases, I would like to
      push further in making our software products as a whole more agile and
      I think that SQL-DBMS's are one area that is holding us back as an
      industry.

      > Length constraints are one way to ensure valid data.

      As I indicated, they are one way to force the entry of invalid data.

      > Just because a field might to turn out to require 20 bytes instead of 10 doesn't invalidate the entire concept of a length constraint. It simply means the constraint was implemented while lacking sufficient information. Increasing it from 10 to 20 bytes is a 5 second operation (up to 4000 bytes), at least in Oracle.

      If there is no conceptual reason to limit the length, then what do we
      gain by adding an artificial length constraint in the implementation?
      (I'll grant that you gain something if using SQL-DBMS tools).

      >
      > >Relational theory is predicate logic and set theory. Relational theory does not speak to maxlen, but every implementation of the relational model >leads developers to make extensive use of such.
      >
      > Because RDBM's are concrete implementations of theory. While it's true that perfect data quality can be achieved in theory, concrete implementations require constraints.

      That didn't follow for me. Even in "theory" you cannot keep me from
      typing my name as Dwan when it is entered for the first time into a
      DBMS. Additionally, I am not at all opposed to constraints. I use a
      DBMS where you specify constraints the way you write the rest of your
      code, not with SQL statements in the DBMS. I would not be researching
      and writing in this area except that it hit me at some point that from
      my experience it was a considerably bigger bang for the buck to use
      MultiValue databases than other DBMS tools on the whole. While I
      don't think that everything about MV is good and everything about
      SQL-DBMS's is bad, I really want to understand why it is so much
      cheaper to use the one over the other for the long haul. It seems to
      come down to the flexibility of the tool. So, I'm looking for more
      agile databases in the future, perhaps pulling some tips from the MV
      data model, for example. Thanks for your comments. --dawn

      > -Jra
      >
      > ________________________________

      >
      > From: agileDatabases@yahoogroups.com [mailto:agileDatabases@yahoogroups.com] On Behalf Of Dawn Wolthuis
      > Sent: Thursday, July 27, 2006 5:54 PM
      > To: agileDatabases@yahoogroups.com
      > Subject: [agileDatabases] Maximum length constraints for agility
      >
      >
      >
      >
      >
      >
      >
      > I just read through much of Database Refactoring and gave a nod to it
      > (amazon link and all) in my latest blog entry (blatant ad coming up)
      > entitled "To Whom Should Size Matter?" at
      > http://www.tincat-group.com/mewsings
      >
      > The book is great! It does an excellent job in assisting sites with
      > implementations using existing SQL-DBMS products. This is an
      > important contribution to an aspect of the industry that seems to have
      > become less agile over the past couple of decades.
      >
      > My goals include nudging the industry toward more agile DBMS tools,
      > which likely means moving beyond SQL, in my opinion. This is perhaps
      > a Don Quixote effort on my part, but there have definitely been steps
      > made in this direction by companies such as Microsoft, Oracle, and IBM
      > since the introduction of XML, for example.
      >
      > I'm particularly interested in whether those working on agility with
      > current DBMS's share or disagree with my perspective on the size
      > matter addressed in this blog entry. Thanks in advance to anyone who
      > can give it a read. Comments on the blog itself or in this forum are
      > welcome. Cheers! --dawn
      >
      > --
      > Dawn M. Wolthuis
      > Tincat Group, Inc.
      >
      > Take and give some delight today
      >
      >
      >
      >
      >



      --
      Dawn M. Wolthuis
      Tincat Group, Inc. tincat-group.com

      Take and give some delight today
    • Keith Ray
      ... The developers of recent versions of Java JVM and .Net CLR have worked very hard so that the double-checked-locking implementations of Singleton actually
      Message 36 of 36 , Aug 19, 2006
      • 0 Attachment
        > By this argument, since Java doesn't come with
        > out-of-the-box implementation of the Singleton
        > pattern, and everyone who wants a Singleton has to
        > code it themselves, and these implementations can (and
        > usually are) different and incompatible, then an
        > implementation of the Singleton design pattern should
        > be added to Java.
        >
        > This is absurd. Just because a feature is desired by
        > a large subset of the users does not automatically
        > mean that that feature should be added to the common framework.

        The developers of recent versions of Java JVM and .Net CLR have worked
        very hard so that the double-checked-locking implementations of
        Singleton actually work properly.

        So they did work to fulfill the desires of a large subset of users.
        Not "automatically", but as a reasonable approach to help people
        develop software.

        --

        C. Keith Ray
        <http://homepage.mac.com/keithray/blog/index.html>
        <http://homepage.mac.com/keithray/xpminifaq.html>
        <http://homepage.mac.com/keithray/resume2.html>
      Your message has been successfully submitted and would be delivered to recipients shortly.