--- Bill Lewis <datamodel@...
> SA: The primary reason for struggling to effectively
> address data
> quality problems is because data management groups
> have based their
> processes on assumptions which prove to be
> questionable at best and
> downright false at worst.
> BL: "Data management groups" do not have
> responsibility for
> originating nor managing data content-business
> users do.
I didn't realize that data management groups weren't
responsible for managing data. ;-)
> professionals, in fact, usually encounter
> significant resistance when
> recommending the implementation of data integrity in
> the database, "for performance reasons", especially.
As I've written about extensively in Agile Database
Techniques and online there are many options for
implementing data integrity. Very often the database
proves to be the best option for doing it, but not
> SA: When data professionals first hear about
> refactoring they often
> profess that it's a great idea for small databases,
> which it is, but
> that it isn't realistic for "large databases" due to
> the sheer volume.
> BL: My experience has been that data professionals,
> to a fault, bend
> over backwards to accommodate data model change
> requests from developers,
I hear that claim a lot from data professionals, but
in practice I discover that it's not the case. Taking
an agile data approach the agile DBA is "embedded" in
the team and is an active member of that team. So
when someone needs to change the database schema they
work with that person, or someone else with those
skills, and make the change right then and there. The
change is made on the order of minutes. Is that the
level of service that you're talking about? If so,
great, if not, then perhaps there's more that you can
In the data surveys that we did through DDJ last year
(see www.ambysoft.com/surveys/) we found that 2/3 of
respondents indicated that the development teams will
choose to go around the data group within their org.
Of those that do, 75% indicated that the reason why
was because the data group was either too slow, too
difficult to work with, or provided too little value
to the development team. Apparently development teams
are concerned with the level of actual service that
they're getting from data groups.
> yet encounter resistance to their own
> recommendations due to the "unacceptable" impact
> these changes would
> have on software that was developed based on the
> current model.
Yes, many non-agile teams struggle to rework their
code, and frankly that's typical in my experience too.
I don't know of any agile teams with this problem.
> no, you can't change that table/column
> name/datatype/length, we'd
> have to change <enter a number> of <enter a type of
> program code>,
> and then test it all again".
Agilists test their code constantly. Retesting is an
absolutely trivial thing for us to do.
Non-agile teams, on the other hand,....
> BL: I agreed about problems created by doing
> "detailed" modeling
> immediately (Big Data Up Front). One thing that data
> supports quite
> well is abstraction, and models should start as
> (or "conceptual") early on. In my experience, the
> "people" who
> become most committed to a detailed design are
> usually the developers
> who write code that's immutably dependent on a
> "frozen" design.
Once again, definitely a problem on traditional teams.
Not a problem within the agile community. The
original article was written from the point of view of
agile development, not traditional development. I
guess that wasn't clear in the article.
> SA: We see extra columns, tables, and views which
> actually detract
> from the quality of the design, and existing columns
> and tables being
> used for purposes other than originally intended.
> BL: Who makes the decisions to use columns and
> tables differently
> from how they were designed?
If the data management folks were actually managing
the databases effectively then I guess they're
responsible for the state of the databases. If
someone else is doing this then the data management
folks have lost control and are clearly ineffectual.
Either way the DM folks don't seem to be up to speed.
> Who determines how they
> are "used"?
> Software and business people, of coursethey're the
> users of the database.
Someone needs to have a viable and coherent strategy
for evolving the database over time. This is why you
need to be good at refactoring, testing, redeployment,
... If the databases are messed up then clearly such
a strategy isn't in place.
> BL: Absolutely, wholeheartedly agreed. This is
> indeed an area that
> deserves significant focus. Entities and
> relationships (e.g., primary
> and foreign keys) need to be identified and agreed
> to early on-in
> fact, before a line of code is written/generated.
> This foundation of
> functional dependency and order of precedence is
> what the team needs
> to get right early on, because that's what's really
> difficult to
> change later.
Actually, it's fairly easy to change later. See
Process of DB Refactoring
It's an assumption that it's difficult to evolve
> Details of individual attributes can
> be added, moved
> and/or changed as the application evolves.
Tables can be split, renamed, ... very easily too.
> BL: Once again, it's completely unfair to attribute
> the overall state
> of data quality to data professionals,
Then what is it that Data Management people should be
held accountable for? Seems to me from the title that
if they want to claim to be data managers that they
should be, well... , managing the data.
I think that the fundamental difference that we see
between agilists and traditionalists is that they
agilists have stepped up and accepted responsiblity
for quality. We've adopted quality techniques such as
refactoring, TDD, pairing, ... because we've
discovered that they work incredibly well for us in
practice. Now we're inviting the data community to
step up and do the same thing. Perhaps the state of
data quality in production databases is in the state
that it's in because our expectations of the data
community have been so low for too long. It's time to
raise the bar.
> or to a
> approach". A data-driven approach IS preferable
> because data indeed
> does pervade any significant business system.
Security issues also pervade any significant business
system. Shouldn't we take a security-driven approach
by that logic?
Usability issues also pervade any significant business
system. Should we take a usability-driven approach by
And so on?
> Software exists to
> maintain and expose data. Everything elsesecurity,
> functionalityis dependent on the data. No data?
> Then no need for
> security, nothing to be used (screens with blank
> fields?), no need
> for any functions that use the data.
No security? Pretty soon the data isn't trustable.
No UI? Can't get to the data.
No usability? Doesn't really matter if the data is
there because it's not consumable.
No network? Good luck connecting to the data sources.
Data is only one of many issues.
> BL: Surely you've heard of data profiling. Lots has
> been written
> about it; several companies specializing in it have
> been started,
> merged, sold, etc.-quite a market. But again, the
> data content is
> not originated by data professionals...although they
> should be
> actively involved in, if not responsible for,
> generating and managing
> test data.
Yes, but that's not really testing. That's more along
the lines of reviewing/inspecting the database
content. As I indicated in a previous post, my next
newsletter will be on database testing because it's
clearly a foreign concept to many data professionals.
> BL: Unfortunately I would agree. It's also
> unfortunate that most of
> the evangelizing about data governance has had to
> originate within
> the data management community rather than the
> business community,
> where it really belongs. Case in point: DBAs don't
> govern the general
IT Governance in general needs to come from the
business community. Sadly, that's going to be a
> SA: Lack-lustre performance of the traditional data
> community during
> the past three decades...As the agile community has
> clearly shown
> over the past few years these assumptions don't seem
> to hold water in
> BL: Broad generalizations such as these futile to
> try to disprove-or
> to prove, for that matter.
Actually, we've done a few surveys via DDJ and it's
reasonably clear that the data management community is
struggling in practice. If you believe TDWI's
assertion that data quality problems is a $600Billion
a year issue for US organizations that might be
another sign that the data community has some room for
> I'd srongly suggest that
> and "Agile" approaches to application engineering
> are by no means
> mutually exclusive-in fact, taking data seriously
> can significantly
> increase development agility. Watch for my article
> in the July issue
> of The Data Administration Newsletter (www.tdan.com)
> for more details.
At www.agiledata.org and this list we've been pretty
clear about that for years.
Get a sneak peak at messages with a handy reading pane with All new Yahoo! Mail: http://mrd.mail.yahoo.com/try_beta?.intl=ca