On 7/27/07, Curt Sampson <yahoo@...
> On Mon, 23 Jul 2007, Dawn Wolthuis wrote:
> > On 7/23/07, Curt Sampson <yahoo@...> wrote:
> >> Just two more data types. Nested relations in fact can be first-normal
> >> form as well, if you accept that a relation is a data type like any
> >> other.
> > Yes, this is a redefining of 1NF that renders "1NF" meaningless (or
> > almost meaningless).
> Not at all. Here is Date's definition of 1NF (from Introduction to Database
> Systems, 7th ed., p. 357)
> First normal form: A relvar is in 1NF if and only if, in every legal
> value of that relvar, every tuple contains exactly one value for
> each attribute.
> So long as the attribute contains only one, never more than one, relation,
> it's in 1NF.
Can you give an example of a (database) relation that is not in 1NF?
> Don't be confused by it being a composite data type. In some sense,
> text strings could be considered composite data types as well,
Yes, agreed. I have no problem with the domain of an attribute
including relations or even lists. My issue is that First Normal Form
(1NF) had a meaning such that "Non-first Normal Form" databases were
NOT in 1NF (rather by definition, one would think). Now those
databases that would be classified as NF2, including those that
encourage lists or relations as attribute values (by including
operations and queries on such, for example) are NOT in Non-First
Normal Form by the newer defs of 1NF. When "Non-First Normal Form"
does not mean that a relation is not in first normal form, you can see
how communication might be a bit confusing.
This has also made it difficult to get the word out that 1NF as we
knew is, is dead. Relational theorists simply redefined it so they did
not have to say that it no longer applies. Unfortunately, of all the
normal forms, the form formerly known as 1NF is the only one that DBMS
tools, such as SQL-92 and products that employ such, enforce within
the DBMS -- it is the only one you cannot violate in many environments
and would be well-advised not to violate in those that use SQL as THE
sole langauge of the DBMS.
> it's essentially a list of zero more characters, yet when we update an
> attribute from "a" to "abc", we wouldn't consider that attribute to have
> gone from one to three values.
> > Most RDBMS tools are not set up to work well with nested relations.
> I am not entirely sure what you mean by the term "nested relations," but
> it makes me quite uncomfortable.
A relation with an attribute whose values are relations.
> After all, we don't refer to a string
> or an integer as being "nested" inside a relation; why should we refer
> to any other data type as being "nested"?
It not only seems reasonable terminology to say that a relation within
a relation is "nested" it is also the terminology used by some NF2
products. In IBM UniData, for example, one might issue an SQL
statement with an UNNEST in it if you wish to have each of a number of
multivalues in a separeate row in the result set. It is OK if you
prefer GROUP and UNGROUP to NEST and UNNEST (although that could also
> > Lists are an even bigger issue for most RDBMS tools.
> Sure, but they're mostly misused, in my experience, anyway.
Sometimes they are, but most often the issue is that NF2 developers
might decide not to refactor when they should, just as happens with
> Have a look
> at the PostgreSQL catalogue for some classic examples. There are plenty
> of queries I'd like to be able to do on those that I can't do because
> they put stuff in arrays instead of normalizing the data.
That could be the query tool, however. SQL was born as a query tool
that only had to handle a single "simple" value for an attribute.
> >>> The industry would be well-served if we would switch from
> >>> three-valued logic back to two-valued logic too.
> >> Actually, something along the lines of Haskell's "Maybe" is really what
> >> we need.
> > There are pros and cons to each approach, and I'll confess that I hae
> > not done much with Haskell to see the advantages.
> Haskell's type system is well worth study.
I'll put it on the list, but I'm out of "heads-down student mode" for
a while and into practitioner mode.
> > I will add that those DBMS's that serve up all data as strings, where
> > you can handily cast to numbers, dates, names, or any other type (or
> > even multiple types depending on the situation -- this Person is-an
> > Employee so cast this field to a number, else leave as a string),
> > have provided the biggest bang for the buck in productivity in my
> > experience.
> Ouch. That seems destined to lead to run-time type errors.
There are trade-offs with any such design decisions. You might notice
the internet, but you can often still get your job done. Add a bit
more QA for some sites to elimiate more such errors.
> > So, I tend to think there is something to having the type checking
> > somewhere prior to the persistence logic in our applications, letting
> > persistence logic "simply" persist data or objects.
> If all you need is persistence, why bother with an RDBMS?
Agreed, but there are some reasons that a big honkin' RDBMS that might
be both over- (constraint-handling) and under- (lack of support for
lists) featured might be incorporated into a software project during
even the very first sprint/iteration of software construction.
Let's say that a software developer or development team is starting
from scratch and has the full range of options available to them.
When they design something that they think must stick around even when
their software is not running, they either look at file system options
or at DBMS solutions. They often choose an RDBMS so that they and
others can execute queries against it using SQL and/or query tools
that employ SQL.
Persistence as a requirement implies the ability to retrieve persisted
information, while the use of standard tools for such retrieval likely
needs to be stated as a separate requirement. But I would contend
that a requirement of "CRUD services" (aka "persistence") plus the
requirement of being able to access data using industry-standard
(today SQL-based) tools are the two primary requirements under
consideration when a team decides to employ a DBMS, even though a
(R)DBMS has considerably more features than these two (security,
constraint-handling, RI). Deciding to use these additional features up
front might be a good example of how NOT to do agile software
Also, rather than looking at the conceptual design and determining
whether an NF2 or 1NF DBMS might be the best option, developers tend
to go with a standard, often even portraying a conceptual or at least
logical data model in the form formerly known as 1NF, and then design
to SQL, for example.
> Keep in mind,
> the primary advantage of a good RDBMS is that it serves as an inference
> engine; you present it with a set of propositions and queries, and it
> infers the answers for you.
Yes, and which software development teams have a need for an inference
engine in the first few iterations of their software development
efforts? An RDBMS might just be one of the biggest detractors from
doing agile software development as it injects a big solution even
when the requirements are much smaller (such as the two I listed that
typically prompt use of an RDBMS).
> > select partyId, Person->name, ssn from Employee;
> That seems to me syntatic sugar for a join.
Fine, let's call it "chocolate" and then I'm in--sweet is good.
> In fact, I find something like
> this much more clear:
> Employee JOIN Person PROJECT partyId, name, ssn
That is fine for those who think exclusively in sets. There is not a
good reason, in my opinion, to reduce the developers choices in this
regard. I understand that the coupling of constraint-handling with
persistence is what drives this, so that first order logic can be
employed. I am not willing to give up what we once had and can still
have in software development agility when using NF2 tools. Again,
there are trade-offs. Not every project is the same, of course, but
for a typical complex database application, requiring good
performance, ease of maintenance, data accuracy, etc, I have seen
better bang for the buck and also happier end-users (again, for the
buck, as one can accomplish pretty much the same with each toolset)
with tools like IBM UniData than with the average SQL-DBMS.
> > ...and then add in there that you might be looking for the current
> > status, that might have been implemented with history as a list (or
> > stack) where the top entry is the current status, so you might want
> > Person->status(0) or Person->status.
> Why not just model status as a relation. Then it works like anything
> else in the database, refactoring is easier (e.g., you can trivially add
> other dependent attributes quite easily)
I disagree that refactoring is easier when developing in tools that
traditionally have required 1NF than with NF2 databases. This is
where there can be truly significant developer productivity gains from
a tool that permits lists (e.g.). I have detailed some of the gains
in the articles that I wrote up as "mewsings" last year, but will give
one example here.
Let's say that you start with 'status' as a single-valued attribute of
Product and later the requirements are to keep a history (without
dates to start with). Let's say that the precise requirements prompt
you to think of 'status' now as a logical stack with the most recent
status on top.
Some NF2 DBMS tools permit everything that has worked, screens and
reports, to keep working after you change the definition of 'status'
to be multi-valued. The reports might even run successfuly, now
showing multiple statuses instead of just one. The user interface is
then enhanced by changing the UI widget to a window, or widget that
collects and shows the multivalues in a prettier way, but the product
was able to be enhanced to permit the new functionality with a change
of a description from S (singlevalued) to M (multivalued).
There are many such examples where a change to requirements when using
an NF2 database yields much higher developer productivity than with a
traditional SQL-DBMS, which is why I think of NF2 tools as "agile
databases". I came to this hypothesis as a manager based on what
seemed to be significant differences in the bottom line budgets for
development in an NF2 DBMS compared to an SQL-DBMS. I later started
investigating why this might be. NF2 and 2-valued-logic are two of
the reasons, even though not the entirety of the reasons, I now think.
> and you can do queries such as,
> "give me the information based on status as it was last July 15th."
> > Compare that to the corresponding SQL statement....
> No, don't. My whole point is that, given better syntax, many of these
> difficulties that send you to non-relational solutions go away.
It will be interesting to see if the industry continues heading in the
NF2 direction, modeling more data with lists and relations within
relations. XML is only one reason to head toward NF2 and 2VL. Agile
database application software development is another. Given the
enhancements that are being made to tools from IBM (DB2 Viper),
Microsoft (such as linq), Oracle (with purchase of Berkeley DB), and
elsewhere, I suspect the s/w development industry will rediscover this
soon, even if only by adding new tricks rather than with a revolution
as we saw with the introduction of relational theory and subsequent
(unfortunate IMO) introduction of 1NF. --dawn