Re: [agileDatabases] Maximum length constraints for agility
On 7/31/06, Jay <arboj@...> wrote:
> >We should consider whether we would want the restrictions of an input device to dictate database-level constraints. We might also consider that today's typical screen entry or data exchange has no such issues. While screen size is still a factor for designers, this technology requirement need not >whittle down our original business requirements. I feel comfortable relegating this reason for a maxlen to the past.
> Relational databases attempt to implement a very fundamental and worth-while (in my opinion) objective. Their objective is to protect the integrity of what is often the life-blood of an organization: Its data. The goal of normalization, whether it be 1NF,2NF,3NF or 4NF, is the same: Minimize the chances of "bad" or incorrect data from making it's way into your database.
That is the case for the normal forms related to functional
dependencies, with which I agree data should be modeled. The
traditional 1NF (it has been redefined by many, including Date) is
different from the others. The purpose of 1NF was to simply the
underlying mathematics rather than including nested relations or
lists, for example. Most relational theorists today do not think that
is essential and SQL-99 and following permit relation-valued
attributes if I recall correctly. I'm not aware of many people using
such features, however, making me think they have not been implemented
in a way that is at all enticing.
> An institutions data can and often does come from varied and disparate sources. These sources consist of internal data entry systems, external electronic interfaces, EDI or XML exchanges, just to name a few. As a result, many (if not most) databases contain data that originated from more than one application. In addition, these databases often find themselves integrated with and exchanging data with older "less constrained" data repositories. Ensuring this data meets some minimum standards of quality before "admitting" it into your RDBMS database is what database constraints are all about.
> A field length constraint is just one type of such a constraint.
One point I am making is that in many cases it is not a constraint
that helps with data quality. If a company name is: The Full Company
Name with Associates and you truncate it because the database has a
max length constraint designed into it, you are tampering with
accurate data, forcing data NOT to be accurate. We should not be
giving the company name a maximum length unless there is a world
standards body that ensure that company names across the world,
perhaps even when translated into a particular language, never exceed
> If you're willing or even eager, to eliminate the field length constraint, are you comfortable relying on each of these external data sources to determine what a makes valid postal code?
No. I definitely want a layer through which all Create, Update, and
Deletes happen that verifies that either the source is trusted or
(re)validates the data. Some have indicated that unless this layer is
tightly coupled with the DBMS that cannot happen, but I disagree. I'm
not a builder, so I don't know if this analogy will make sense to
anyone but me, but if you do not pour walls and foundation together
with the same materials, is it possible that you will build a new
house where the basement could flood? Yes, but there are building
industry standard best practices to keep from doing that. The cost of
tightly coupling the foundation and the walls is too great to build
that way. The risk in building the house with a leak if best
practices are not followed is not high enough to pour the walls and
the basement together. Perhaps there are some exceptions to this.
> For example, a US zip code is usually modeled and constrained to digits. It should be at least 5 digits long and may be as long as 9 depending on whether you're using two fields or one.
As with many other "codes", the postal code is definitely an attribute
where there really is a maxlen (again I would not want to do so in the
DBMS from a theory perspective, even though it would only be practical
to do so when using a SQL-DBMS today). So, yes, I would want any
software product to contrain the 5 and the 4 digit parts to those
> What would be the point in removing a length constraint of this nature? I suppose if zip codes ever grew to 20 digits you wouldn't need to change your data model.
In the case of codes, there typically are and should be max length
constraints. In the case of the US zip code, there should even be
exact length constraints. As I initially talked about in the blog
entry prior to this one and also alluded to here, I would want those
length constraints to be handled outside of what is typically called
the DBMS, or perhaps simply handled differently by the DBMS (this is a
matter of definition of the DBMS). I think the persistence layer and
the constraint layer should not be so tightly coupled as they are in
SQL-DBMS products today. I have seen much more agile solutions when
these are not as tightly coupled.
> My experience has been that such changes are far more rare than feature driven database changes. Usually it's changes like "We want to support multiple mailing addresses in the next release of Appwidget".
Yes, and the non-1NF discussion would be relevant in that case. Each
of the topics I have selected from multivalues, to 2-valued-logic, to
variable length data, to providing less coupling between constraint
handling and persistence, is an area that I think contributes to many
DBMS's being relatively inflexible compared to DBMS's that are
sometimes called "embedded" such as Cache' or IBM U2.
>Where should the logical maxlen be placed in a logical data model?
If there is a logical maxlen, then it should be in the logical data
model. In that case, I only contest the implementation, about who
should care about it. If there is not a logical maxlen, then it
should not be included in the logical data model.
> > Given that all UI and web service validation routines need access to this >constraint information, it can be made available by being captured as other business rules might be, in the database itself.
> > I suggest that the database proper (rather than the schema) is a good place for specifying all business constraints. There must then be enforcement of >the use of standard CRUD services that employ such constraint logic for all software maintaining the database.
> If I understand you correctly, you're suggesting that max length constraints be persisted in a user-defined business table rather than the schema itself. By doing this, you're re-implementing something that has already been done!
> The databases underlying "catalog" or schema definition consists of tables that define the rules by which data is either rejected or accepted into the database. These rules are easily configured and entered into the database through DDL statements.
Which does not make them particularly easy to roll out for maintenance
by end-users, right? I'm not suggesting that all business rules be
made maintainable by end-users, but am suggesting that all metadata be
managed like other data.
> It's about separation of responsibility and I'm of the opinion that a database ought to protect itself from "bad data".
And perhaps if instead of having construction companies that managed
the construction of a house, we had people whose sole job was to
ensure that basements never leaked, then we would have the foundations
people pouring the concrete walls of the basement and the first floor
at the same time. But when we look at the projects as a whole, there
might just be a better way to do it, even if there are risks, as
always, if the professionals do not follow best practices.
My take is that software construction and maintenance teams definitely
do need to protect against bad data, but that we should pull it out of
the DBMS (or put it into the DBMS differently, again depending on
> > I would venture that the application of the maxlen in places where the proposition should have no such constraint, in addition to the tight coupling between the logical and physical maxlen, has cost the industry and individual organizations a bundle. It is but one area related to constraint handling where we have been bleeding, but one about which I hear little mention—and no noise, no fix, I'm guessing.
> I think there's no noise because there's no problem.
This is just one of many areas that make many of today's DBMS's less
agile than they could be. While I am all for optimizing the
flexibility of software "products" given the state of the DBMS today
using the techniques found in Refactoring Databases, I would like to
push further in making our software products as a whole more agile and
I think that SQL-DBMS's are one area that is holding us back as an
> Length constraints are one way to ensure valid data.
As I indicated, they are one way to force the entry of invalid data.
> Just because a field might to turn out to require 20 bytes instead of 10 doesn't invalidate the entire concept of a length constraint. It simply means the constraint was implemented while lacking sufficient information. Increasing it from 10 to 20 bytes is a 5 second operation (up to 4000 bytes), at least in Oracle.
If there is no conceptual reason to limit the length, then what do we
gain by adding an artificial length constraint in the implementation?
(I'll grant that you gain something if using SQL-DBMS tools).
> >Relational theory is predicate logic and set theory. Relational theory does not speak to maxlen, but every implementation of the relational model >leads developers to make extensive use of such.
> Because RDBM's are concrete implementations of theory. While it's true that perfect data quality can be achieved in theory, concrete implementations require constraints.
That didn't follow for me. Even in "theory" you cannot keep me from
typing my name as Dwan when it is entered for the first time into a
DBMS. Additionally, I am not at all opposed to constraints. I use a
DBMS where you specify constraints the way you write the rest of your
code, not with SQL statements in the DBMS. I would not be researching
and writing in this area except that it hit me at some point that from
my experience it was a considerably bigger bang for the buck to use
MultiValue databases than other DBMS tools on the whole. While I
don't think that everything about MV is good and everything about
SQL-DBMS's is bad, I really want to understand why it is so much
cheaper to use the one over the other for the long haul. It seems to
come down to the flexibility of the tool. So, I'm looking for more
agile databases in the future, perhaps pulling some tips from the MV
data model, for example. Thanks for your comments. --dawn
> From: agileDatabases@yahoogroups.com [mailto:agileDatabases@yahoogroups.com] On Behalf Of Dawn Wolthuis
> Sent: Thursday, July 27, 2006 5:54 PM
> To: agileDatabases@yahoogroups.com
> Subject: [agileDatabases] Maximum length constraints for agility
> I just read through much of Database Refactoring and gave a nod to it
> (amazon link and all) in my latest blog entry (blatant ad coming up)
> entitled "To Whom Should Size Matter?" at
> The book is great! It does an excellent job in assisting sites with
> implementations using existing SQL-DBMS products. This is an
> important contribution to an aspect of the industry that seems to have
> become less agile over the past couple of decades.
> My goals include nudging the industry toward more agile DBMS tools,
> which likely means moving beyond SQL, in my opinion. This is perhaps
> a Don Quixote effort on my part, but there have definitely been steps
> made in this direction by companies such as Microsoft, Oracle, and IBM
> since the introduction of XML, for example.
> I'm particularly interested in whether those working on agility with
> current DBMS's share or disagree with my perspective on the size
> matter addressed in this blog entry. Thanks in advance to anyone who
> can give it a read. Comments on the blog itself or in this forum are
> welcome. Cheers! --dawn
> Dawn M. Wolthuis
> Tincat Group, Inc.
> Take and give some delight today
Dawn M. Wolthuis
Tincat Group, Inc. tincat-group.com
Take and give some delight today
> By this argument, since Java doesn't come withThe developers of recent versions of Java JVM and .Net CLR have worked
> out-of-the-box implementation of the Singleton
> pattern, and everyone who wants a Singleton has to
> code it themselves, and these implementations can (and
> usually are) different and incompatible, then an
> implementation of the Singleton design pattern should
> be added to Java.
> This is absurd. Just because a feature is desired by
> a large subset of the users does not automatically
> mean that that feature should be added to the common framework.
very hard so that the double-checked-locking implementations of
Singleton actually work properly.
So they did work to fulfill the desires of a large subset of users.
Not "automatically", but as a reasonable approach to help people
C. Keith Ray