4640Re: [TaxoCoP] Tagging to levels
- Oct 25, 2013On 2013-10-19 15:45, James Morris wrote:
This is a rather cumbersome and redundant way of doing it, by embedding a hierarchy in the index terms of each document. If the hierarchy changes, the snippets of hierarchy in previously-indexed documents will no longer match the authoritative current version.
Hi allSome systems (well, at least one with which I have direct experience) added to the content not only the specific subject term identified by the indexer, but also, into a different field, all the parents of that term up to the top facet. That way a user could search for the specific term (or synonym) for a precise result. Or if they entered any term in the tree above the specific term they would still retrieve any content tagged with the lower-level term. It seemed like an clever way to put all the intelligence of the taxonomy into the content itself. The search engine itself can remain fairly stupid - no complex taxonomy integration needed. don't know how wide-spread that practice is - for that I'll defer to the knowledge of this group. But maybe it's something you database designer could consider.So, given the taxonomy:ZOOLOGY=> ANIMALS=> DOMESTIC ANIMALS=> CATThe content metadata would include:Title: The CatSubject: "CAT"Broader subjects: DOMESTIC ANIMALS, ANIMALS, ZOOLOGYFor a precise search, a pro-searcher could specifically look for subject="cat". Also, the subject would be weighted more highly in the search results to assist the general users. But if a user was looking for any information about "animals", they would also retrieve books tagged with "CAT".Jim Morris
Better is a search system that can, optionally, deal with "exploded" searches, which search for a term and all its narrower terms. This should not be too difficult to implement - there are SQL functions to handle it, if that is the sort of database on which the system is built. If the hierarchy is properly constructed, so that narrower concepts are true species of their parents, (i.e. they satisfy the "is_a" relationship) then any items retrieved from such a search will all be instances of the concept searched for.
The example hierarchy given above is not really satisfactory in this respect. ANIMALS are not a kind of ZOOLOGY; these concepts belong to distinct facets which may be labelled "disciplines" and "organisms" respectively, so cannot have a genus/species relationship.
CATS are not all DOMESTIC ANIMALS either - we have wild cats in Scotland! If the concept is restricted to domestic cats, then is should be labelled by a more specific term such as DOMESTIC CATS, though this might be better expressed as a combination, e.g.
domestic cats USE cats + domestic animals
So you really have two hierarchies, with simplified extracts such as this:
-- -- botany
-- -- zoology
-- <animals by domesticity>
-- domestic animals
-- wild animals
-- <animals by species>
The index terms that would be allocated to the book on "The cat" referred to in the above example would then be those which are not hierarchically related, viz.:
-- Willpower Information (Partners: Dr Leonard D Will, Sheena E Will) Information Management Consultants Tel: +44 (0)20 8372 0092 27 Calshot Way L.Will@... ENFIELD Sheena.Will@... EN2 7BQ, UK http://www.willpowerinfo.co.uk/
- << Previous post in topic Next post in topic >>