Loading ...
Sorry, an error occurred while loading the content.

4641Re: [TaxoCoP] Tagging to levels

Expand Messages
  • James Morris
    Oct 25, 2013
    • 0 Attachment
      Hi Leonard -
      Oh, I agree completely.  At the time and in that situation it was fairly clever.  It was a well controlled database with a dedicated thesaurus that could each be kept in sync pretty easily.  What was less flexible was the search engine. But what you describe makes much more sense if possible. 

      Also the "taxonomy" I used was just adapted from the Cutter quote. He seemed to be implying a hierarchy of terms like that. I wasn't proposing it as a great taxonomy.  Yours is much better! 
      Jim

      On Oct 25, 2013, at 2:50 PM, Leonard Will <L.Will@...> wrote:



      On 2013-10-19 15:45, James Morris wrote:

      Hi all
      Some systems (well, at least one with which I have direct experience) added to the content not only the specific subject term identified by the indexer, but also, into a different field, all the parents of that term up to the top facet.   That way a user could search for the specific term (or synonym) for a precise result. Or if they entered any term in the tree above the specific term they would  still retrieve any content tagged with the lower-level term.  It seemed like an clever way to put all the intelligence of the taxonomy into the content itself.  The search engine itself can remain fairly stupid - no complex taxonomy integration needed.   don't know how wide-spread that practice is - for that I'll defer to the knowledge of this group.  But maybe it's something you database designer could consider.

      So, given the taxonomy:
      ZOOLOGY
      => ANIMALS
      => DOMESTIC ANIMALS
      => CAT

      The content metadata would include:
      Title: The Cat
      Subject: "CAT"
      Broader subjects: DOMESTIC ANIMALS, ANIMALS, ZOOLOGY

      For a precise search, a pro-searcher could specifically look for subject="cat".  Also, the subject would be weighted more highly in the search results to assist the general users.  But if a user was looking for any information about "animals", they would also retrieve books tagged with "CAT". 

      Jim Morris
      This is a rather cumbersome and redundant way of doing it, by embedding a hierarchy in the index terms of each document. If the hierarchy changes, the snippets of hierarchy in previously-indexed documents will no longer match the authoritative current version. 

      Better is a search system that can, optionally, deal with "exploded" searches, which search for a term and all its narrower terms. This should not be too difficult to implement - there are SQL functions to handle it, if that is the sort of database on which the system is built. If the hierarchy is properly constructed, so that narrower concepts are true species of their parents, (i.e. they satisfy the "is_a" relationship) then any items retrieved from such a search will all be instances of the concept searched for.

      The example hierarchy given above is not really satisfactory in this respect. ANIMALS are not a kind of ZOOLOGY; these concepts belong to distinct facets which may be labelled "disciplines" and "organisms" respectively, so cannot have a genus/species relationship. 

      CATS are not all DOMESTIC ANIMALS either - we have wild cats in Scotland! If the concept is restricted to domestic cats, then is should be labelled by a more specific term such as DOMESTIC CATS, though this might be better expressed as a combination, e.g.
      domestic cats USE cats + domestic animals

      So you really have two hierarchies, with simplified extracts such as this:

      disciplines
      -- biology
      -- -- botany
      -- -- zoology

      animals
      -- <animals by domesticity>
      -- domestic animals
      -- wild animals
      -- <animals by species>
      -- cats
      -- dogs
      -- zebras

      The index terms that would be allocated to the book on "The cat" referred to in the above example would then be those which are not hierarchically related, viz.:

      zoology
      domestic animals
      cats


      Leonard Will



      -- 
      Willpower Information     (Partners: Dr Leonard D Will, Sheena E Will)
      Information Management Consultants            Tel: +44 (0)20 8372 0092
      27 Calshot Way                              L.Will@...
      ENFIELD                                Sheena.Will@...
      EN2 7BQ, UK                            http://www.willpowerinfo.co.uk/



    • Show all 10 messages in this topic