Loading ...
Sorry, an error occurred while loading the content.

3373Re: [TaxoCoP] data modeling and taxonomy

Expand Messages
  • John O'Gorman
    Jan 6, 2010
      Brilliant, Cherie - absolutely brilliant.
      -----Original Message-----
      From: cheriewagner@... [mailto:cheriewagner@...]
      Sent: Wednesday, January 6, 2010 12:04 PM
      To: TaxoCoP@yahoogroups.com
      Subject: RE: [TaxoCoP] data modeling and taxonomy


      In reading this I wanted to express my appreciation for the time and knowledge that all of you on this list share…I’m a behind-the-scenes lurker, so by way of brief introduction I worked in the content management and taxonomy space for many years and I am working now in different areas.  I know that I am quickly falling behind in what is a rapidly developing and ever-changing information modeling arena, so the following comment may seem obvious or archaic or just plain off!…but in reading this exchange it makes me think of fractals or fractal geometry and how it helps to predict the systematic chaos of nature.  Perhaps one could apply the concepts around fractal geometry to information or information modeling?  or maybe it would just result in some very cool geometric shapes... J


      http://en.wikipedia .org/wiki/ Fractal



      From: TaxoCoP@yahoogroups .com [mailto:TaxoCoP@ yahoogroups. com] On Behalf Of John O'Gorman
      Sent: Wednesday, January 06, 2010 12:14 PM
      To: TaxoCoP@yahoogroups .com
      Subject: Re: [TaxoCoP] data modeling and taxonomy


      I'd like to introduce one more abstract into the mix, followed by a concrete example as per Patrick's excellent suggestion. As Lisa mentioned, the mathematical subtleties of taxonomies and data models and such are of little interest outside groups like ours, but the truth is that this line of inquiry is predicated on a flat geometry. The digital universe - owing primarily to its binary origins - is comprised of only two dimensions. Manifestation of this singular truth is everywhere and in spite of some very clever attempts to mitigate the flatness of things, we still have folder structures, naming conventions, hierarchies and super- and sub-types. This is not to suggest that these inventions are not and have not been useful, but we need something more elegant to save ourselves from drowning in a sea of digits and bytes.


      Take search...enter 'cricket' and get back two point seven million hits on the sport, the insect, the ethical construct (as in "not cricket") and Buddy Holly. Because humans live in a multi-faceted universe and computers in a flat one, reconciling the semantics (i.e. the gap between n-dimensions and two) is up to us. What is needed is a new 'geometry' of information that simutaneously incorporates more precision and recognizes the existing symmetry of information.

      Concrete example: In programming the stupid computer must be 'told' what a string is and how it is going to be used. So a given string may be a variable, a global variable, an object or a method depending on the context. To avoid 'collision', the same string may not be used in any way other than the one for which it has been declared. In the context of the 'cricket' search a similar approach may be taken, albeit with a twist. For every unique concept behind the string 'cricket' a unique identifier is declared. Now we have something like: 1234 - cricket - sport; 3456 - cricket - status;  4567 - cricket - insect; 6789 - cricket - member of Buddy Holly's band.

      As Bob correctly points out, individual data models, taxonomies and ontologies (DM-T-O) are by necessity fairly narrow in scope. That's typically why taxonomies tend to break and data models fail with the introduction of information classes from a wider scope. Wouldn't it be interesting, though if in spite of these focused artifacts their individual members already had a declarative that uniquely identified not only what they represent but also what class they are in and how they can be connected to other patterns of use? In other words, have a new geometry built in to the vocabulary values to encourage reuse at a very granular level.

      I can expand on the 'patterns' concept in a separate post (like Lisa says, I risk being the only one interested) but for now, think of any formally constructed language and think of the universal patterns used to exchange information. There must be an agreement about the what and the how, and there must also be an understanding about the context and construction, and there is always semantics. A taxonomy (as would a data model) become a new pattern in a given language using existing elements.

      -----Original Message-----
      From: lisa colvin [mailto:lisacolvin@ gmail.com]
      Sent: Wednesday, January 6, 2010 09:19 AM
      To: TaxoCoP@yahoogroups .com
      Subject: Re: [TaxoCoP] data modeling and taxonomy


      Thanks for the lively discussion. It's exciting to see these ideas coming together.

      While there are some accepted standards for ontology modeling practice (RDFS/OWL), there are multiple knowledge representation languages which can be used to express any 'ontology'. Typically the more expressive the language, the more expensive it is computationally. So, you need to pick the representation language which best fits your needs. If you're not building a model to drive some sort of expert system or related capabilities,  a simpler knowledge representation scheme is probably better.

      However, one reason people use ontology languages in general is when there is a need for strong semantics which define the relationships/ context. Even if you don't want to build an expert/recommendati on/QA/NL- based system, you can still use a more formal ontology language as just a pure specification language.

      So, is a faceted classification scheme an ontology? Some would say 'yes, if it uses an ontology language to express it'. Others might say it's not if you're not expressing/defining any inheritance relations. Overall, it probably doesn't matter what you call it as long as the semantics are rich enough to solve whatever problem you needed solving.

      There are fundamental differences to how the various disciplines approach information modeling. What I've found most helpful in working with people in another discipline is to be very explicit on how basic terms (like "term" :) , "class", "instance", "inference") are used in expressing the model that you're sharing. The idea of "inference", for example, can vary widely between an expert system developer and an OO developer. If these terms aren't described explicitly and used consistently, people get confused.

      I also found that defining the capabilities and mathematical relationship distinctions between "controlled vocabulary list", "synonym rings/synsets" , taxonomy", "thesaurus", "ontology", "desciption logics",etc.  is really only interesting to taxonomists/ ontologists and other curious people like us. :)

      :) Lisa

      On Tue, Jan 5, 2010 at 7:36 PM, Patrick Lambe <plambe@straitsknowl edge.com> wrote:


      Well I was just sitting back and enjoying the conversation, Bob. But since you ask, I 'll start with a comment that Matt made early on, that there might be usability issues with reusing structures from data models in taxonomies, even though in principle such reuse makes sense.

      I think there's a tendency for us to get very entity focused in these discussions and definitions and stop there. There's a good reason for this. The common ground for data models, ontologies, taxonomies is their need to establish relatively stable entities at the very least; they each do slightly different different things around the language referring to those entities, and they diverge in the type and extent of work around establishing and defining relationships and maybe inference-generatin g capabilities (which some taxonomy forms can support as well as ontologies). But the entities are the core point of reference.

      But Matt's comment reminds us that it's important to remember that data models, taxonomies and ontologies are at the end of the day just instruments, and to understand the instrument is not just about understanding the entities it manipulates, but how the instrument is used, and for what purpose. 

      The design of a tool is driven by its functionality, not its components. DM-T-Os serve related purposes via different means and in different contexts. There are important differences in the amount of human vs machine processing expected or served. As Matt suggests master data management is one way of getting a handle on how they can inter-operate. But fixing an entity and definition in one space (eg a data model) does not unquestionably qualify it for use in another space (eg a taxonomy).

      I think we also assume that usability is only really relevant at the taxonomy level. In my book I suggested that taxonomies are for humans and ontologies are for machines, which risks feeding that assumption. But at the end of the day, the rationale for using any of these instruments whether data models, taxonomies or ontologies, is that they must emerge into human use in some way. It's just that for DMs and Os machine processes provide different opportunities and constraints from human ones. If we can't see the pathway to human use (which is where some of the visionary talk on ontologies falls down, I feel) then they risk floating away into philosophical (or organisational) abstractions. I think this is where a lot of the hard wrestling work needs to be done, to resolve relationships between the instruments, preserve a common core where possible, and reflect the context-driven needs at organisational and user levels.

      This is all very abstract still... I think what would be useful would be some good clear cases where we can see the relationships in specific contexts.


      Patrick Lambe

      weblog: www.greenchameleon. com

      website: www.straitsknowledg e.com

      book: www.organisingknowl edge.com

      Have you seen our KM Method Cards or

      Organisation Culture Cards?  

      http://www.straitsk nowledge. com/store/

      On Jan 6, 2010, at 7:30 AM, Bob Bater wrote:

      Heather, Gabriel, John, Keith & anyone else who's following this thread:

      I'm still feeling my way around these kinds of issues (have been for years), and have no hard-and-fast solutions. However, I do have some 'working hypotheses' which I find to be helpful. I'll refer to them as I respond to a few points made by John, Keith and Gabriel.

      Firstly, John is quite right in pointing out that both data models and taxonomies are necessarily bounded. Who'd want to undertake a data model or a taxonomy of *everything* ? Well, I suppose Melville Dewey, UDC, LCC have all attempted it, with varying degrees of success. But that's a topic for another day. In an organizational context, both data models and taxonomies need to be restricted to a specific domain, if only for practical reasons.

      John also says:

      > For example, if all of the 'entities' that a data modeller wanted to use were already classified by a taxonomist and resided in a master data management inventory, then a sort of symbiotic relationship could exist between the necessarily narrow application of the data and the universal 'connectivity' of a fully faceted business vocabulary. <

      I see this as the role of the 'over-arching ontology which expresses the context of both data model and taxonomy', to quote my own post. The ontology, developed first, ensures that both data modeller and taxonomist are singing from the same hymn sheet. That will also prove of great benefit to data warehouse developers, document managers, records managers and information architects, further down the line.

      Keith says that he finds taxonomies are regarded as:

      > "THE solution" rather than being viewed as "A solution" or part of a larger system of models and decision-making depending on the nature of the enterprise <

      Taxonomies have been over-egged. Many in the field think 'taxonomy' first and context later. IMHO bad! Build the ontology first, then do your data modelling. Then you'll have done a PoC (Proof of Concept) for the domain - identifying the entities which are important, their important attributes (for the data modellers) and a first lead-in to the language people use to refer to them (for the taxonomists) . Using both the ontology and the data model, define the key attributes which different communities regard as important to them when they want to access and process information. That gives you a metadata application profile for each community which can be aggregated into a corporate metadata profile. Only then do you look at each attribute in each profile and decide how it is to be populated. Sometimes, it will be an /ad hoc/ value; sometimes the value will be drawn from a fixed, flat list; sometimes the value will be drawn from an organized, maintained hierarchy of values - a taxonomy. For me, the metadata profile comes first. A taxonomy only becomes relevant if a metadata element requires it.

      Gabriel said:

      > (I said  "ontology / taxonomy" in the above because I'm not clear myself whether our CM does satisfy a full definition of "ontology"; for example as yet we have no mechanisms for making inferences). <

      My 'working hypothesis' in this respect does not include the need for ontologies to enable the making of inferences. That is a requirement of strict 'ontologies' in the Semantic Web sense. For me, ontologies provide the context for ensuring that information and knowledge management structures and systems are coherent and interoperable.

      Keith said:

      > Getting at just where taxonomy, data modeling, and ontology specification begin, end, and overlap is really welcome.  <

      Again, my 'working hypothesis' is that ontologies come first, specifying the entities involved in an activity system, and their relationships. Data modellers will want to define the attributes of each entity and to characterize their relationships more rigorously, to enable their capture in the highly structured world of the DBMS, focused on logical consistency.

      Information managers, on the other hand, are less data-focused and more user-focused, concerned with linking entities and their key attributes to the concepts - and the terms which represent those concepts - employed by workers. So - where appropriate - they build a taxonomy proposing terms to be used for those concepts, reflecting the taxonomic relationships inherent in any domain - generic, partitive, instantial. While the taxonomy can establish the entities (concepts) involved, and their relationships, it cannot dictate the terms which people use to refer to those concepts. Provision is made therefore for variance in terminology by developing a thesaurus, which allows people to search using their native term, and for back-end software to translate this into the 'preferred term' established by the taxonomy.

      Hope that stimulates some thoughts. Meanwhile, where's Patrick Lambe in this thread? Patrick, I'm sure you have some informative views on these issues. Please join us.




    • Show all 21 messages in this topic