December 1, 2001: Damning Metadata
I can't remember which IA blog pointed me to Doug Kaye's blog, but I found
his frustration with metadata to be... well, frustrating.
I'll get into why in just a moment, but first, it's interesting to note a
bit of an anti-metadata backlash of late. The pendulum swings away: back
in the mid-90s (boy, does it feel strange to say that!), Argus would try to
sell clients on the value of developing controlled vocabularies and
thesauri. We often heard this response: "Nope, we have this great new
search engine, and it will solve all of our users' information problems. No
need to ever manually 'touch' our content." Just like that. End of
During the past year or two, a wave of painful realization swept these same
folks. The search engine snake oil had dissolved, leaving a residue of poor
performance and general dyspepsia. Now, finally believing that "Taxonomies
are Chic," they were interested in hiring Argus to create vocabularies to
describe their content. *All* of their content. Which, of course, was
entirely unrealistic. And so we went about trying to convince these people
*not* to classify everything, only the most important content.
Now there must be some sort of counter-counter-movement afoot: people
who've experimented with classification schemes, and were disappointed to
find that, yet again, there was no silver bullet to be found, just as with
search engines. I don't know if Doug Kaye is one of those poor souls
afflicted with silver bulletitis, but he is down on metadata for two
"First, every required step acts as a deterrent to the use of the system.
I've found that to be true in every software product or web-based system
with which I've been involved. In some cases (such as an on-line dating
service for which I was CTO) I've actually tested it. The more you ask, the
less likely people are to participate."
Of course, Doug is raising an important point: metadata is about *process*
as much as syntax and semantics. But intelligent metadata design doesn't
ignore procedural issues, such as how the work is going to get done and
who's going to do it. Sometimes it makes sense to have authors suggest
metadata for their own content, sometimes separate subject matter experts,
sometimes indexers, and sometimes you use software. In certain cases, you
use some combination of the above. There are countless factors that
influence these decisions, not the least of which are how dynamic and
ephemeral your content is, how much of it there is, and how much you can
spend on it.
More from Doug:
"Second, contrived taxonomies typically associated with metadata are a
disaster. I've tested this, too. No one person--or committee--can design a
taxonomy for the ideas of others. Library science is inadequate for the
range of knowledge and thought are encountered with weblogs."
Weblogs are certainly diverse, and classification, as noted above, is no
panacea. But library science has done a passable job at classifying
something even broader than weblogs: the entirety of human knowledge that
is found in the Library of Congress. Sure, you'll find many problems with
LoC classification, but considering its age and non-digital inception, you
could do a lot worse. Certainly author-supplied keywords can be... a lot
Personally, I'm sure glad that that committee at the National Library of
Medicine came up MESH headings to represent the ideas of all those medical
researchers have been coming up with for years. Accessible medical research
might have been what saved my dad's life last summer.
Instead of throwing out babies with bathwater, we need to create value by
selecting and combining the subset of architectural approaches--search
engines and classification schemes included--that are most appropriate for
each unique situation.
I wish this damned pendulum would stop swinging soon.
John Kaye's blog posting ::
"Taxonomies are Chic" :: http://www.slabf.org/oxbrw114.PDF
Library of Congress Classification ::
Medical Subject Headings (MESH) :: http://www.nlm.nih.gov/mesh/meshhome.html