Loading ...
Sorry, an error occurred while loading the content.
 

Orphan terms in thesauruses

Expand Messages
  • Webindexing
    There is (I think) a general assumption that all terms in a thesaurus should be neatly grouped under, say, 20 top terms. In practice, it is often difficult to
    Message 1 of 9 , Jun 15, 2009
      There is (I think) a general assumption that all terms in a thesaurus
      should be neatly grouped under, say, 20 top terms. In practice, it is
      often difficult to allocate a broader term to every term in a thesaurus
      without some distortion.

      Is there any reason why every term has to have a broader term? Is there
      anything wrong with many top terms? I am looking for general principles
      that would apply across a range of projects.

      I notice that the World Bank thesaurus
      (http://www.multites.net/mtsql/wb/site/) lets you search for orphan
      terms, of which they have many. I also noticed that the Australian
      Government TAGS thesaurus has a number of orphan terms. It is possible
      that the assumption that we need neat hierarchies is not as widespread
      as thought.

      Thanks,

      Glenda.

      --
      Glenda Browne
      Indexer, Writer, Teacher
      www.webindexing.biz
    • Heather Hedden
      Hi Glenda, Well, that s one of the main differences between a taxonomy and a thesaurus. A taxonomy is structured in hierarchies, not permitting orphan terms. A
      Message 2 of 9 , Jun 15, 2009
        Hi Glenda,

        Well, that's one of the main differences between a taxonomy and a
        thesaurus. A taxonomy is structured in hierarchies, not permitting
        orphan terms. A thesaurus, on the other hand, focuses on the term, not
        the structure, and while each term may have a number of hierarchical
        relationships, orphan terms may be permitted. Now you could also have a
        thesaurus that prohibits orphan terms and rather contains any number of
        small hierarchies (in a sense orphan hierarchies of just two levels
        perhaps). The is a design policy decision that needs to be made at the
        outset.

        If users enter primarily by browsing or drilling down through a
        hierarchical display, then you want a hierarchical taxonomy with no
        orphan terms, and the number of hierarchies should be limited.
        If users enter primarily by searching on terms or by browsing an
        alphabetical list of terms, then you want a thesaurus (or at least a
        controlled vocabulary), and you don't need to worry about structuring
        complete hierarchies (merely ensuring that a term's immediate
        relationships are correct), and orphan terms can be tolerated.

        -- Heather

        Heather Hedden
        Hedden Information Management
        Heather@...
        www.Hedden-Information.com



        Webindexing wrote:
        > There is (I think) a general assumption that all terms in a thesaurus
        > should be neatly grouped under, say, 20 top terms. In practice, it is
        > often difficult to allocate a broader term to every term in a thesaurus
        > without some distortion.
        >
        > Is there any reason why every term has to have a broader term? Is there
        > anything wrong with many top terms? I am looking for general principles
        > that would apply across a range of projects.
        >
        > I notice that the World Bank thesaurus
        > (http://www.multites.net/mtsql/wb/site/) lets you search for orphan
        > terms, of which they have many. I also noticed that the Australian
        > Government TAGS thesaurus has a number of orphan terms. It is possible
        > that the assumption that we need neat hierarchies is not as widespread
        > as thought.
        >
        > Thanks,
        >
        > Glenda.
        >
        >
      • Avi Rappoport
        ... That reminds me of an old version of the Autonomy auto-categorizer, which was so obsessed with the right number of nodes and branches that they d move
        Message 3 of 9 , Jun 15, 2009
          At 6:41 PM -0400 6/15/09, Heather Hedden wrote:
          >Hi Glenda,
          >
          >Well, that's one of the main differences between a taxonomy and a
          >thesaurus. A taxonomy is structured in hierarchies, not permitting
          >orphan terms. A thesaurus, on the other hand, focuses on the term, not
          >the structure, and while each term may have a number of hierarchical
          >relationships, orphan terms may be permitted.

          That reminds me of an old version of the Autonomy auto-categorizer,
          which was so obsessed with the right number of nodes and branches
          that they'd move things around to keep them balanced. Pretty hard on
          the humans, because their learned navigation paths disappeared.
          Sometimes the best is the enemy of the good.

          Avi


          --
          Search Analysis and Help -- Search Tools Consulting
          (510) 845-2551 / analyst@...
          Complete Guide to Search Engines for Web Sites and Intranets:
          <http://searchtools.com>
        • ahrenlehnert
          Glenda, I’m a taxonomy consultant who started off in the field by working on the Modern Language Association’s International Bibliography Thesaurus. I
          Message 4 of 9 , Jun 16, 2009
            Glenda,

            I’m a taxonomy consultant who started off in the field by working on the Modern Language Association’s International Bibliography Thesaurus.

            I agree with Heather’s assessment, and I would add the following considerations and expansions based on my experience as a thesaurus editor:

            1. How is the thesaurus displayed? We used Lotus Notes and the 56K+ terms were displayed in an alphabetized list. It was possible to discover orphan terms, and, believe me, there were many. From the users’ perspective, each term was essentially “equal” in the list. Getting to the right term was easy if you knew it, but the structure itself may lead users to the exact term they wish to use if they are browsing rather than searching.

            2. Can users discover an orphaned term? Terms used for indexing were as obvious or obscure as the article being indexed, the indexers subject area knowledge, and the indexers familiarity with the thesaurus. If, for example, the article dealt with race in William Faulkner, the terms “race” and “Faulkner, William” were easy enough to locate in a list. However, if the article were about race and marriage, are the terms “race” and “marriage,” “interracial marriage,” or is it really “miscegenation”? I can’t speak for everyone, but miscegenation was not in my vocabulary prior to the MLA, and I would have never found it if it had not been for ensuring terms had at least a broader or related term for discoverability.

            3. How is the thesaurus maintained? I know if I had unlimited time, resources, and authority, I would have audited the MLA Thesaurus and begun hacking and retiring terms that were of little or no value. These include terms that were outdated, infrequently used, or were really covered by a more useful entry. While I know from experience that adding and deleting terms was carefully controlled, the age of the thesaurus and the volume of articles indexed made it inevitable that there would be hundreds, if not thousands, of terms which were no longer useful. My bet is that many of these unused terms are actually orphan terms which had no path of discoverability.

            Overall, I don’t see any problem with orphan terms, unless, as Heather pointed out, it is a hierarchical display used for browsing. I’m sure you can imagine what the top level will eventually look like if any term is allowed to be orphaned. The nice thing about a thesaurus is that you can have valuable terms with no parent, child, or related term without forcing structures that don’t make sense.

            The caution with orphan terms is, of course, to not let them get out of control. If orphan terms are undiscoverable because they are not linked, you will find yourself with synonyms, variants, and unused terms cluttering your thesaurus. The policy at the MLA was link a new term to something whenever possible. Research what’s already in the thesaurus and decide how this new term could be made more discoverable.

            Ahren
            ahren.lehnert@...

            --- In TaxoCoP@yahoogroups.com, Avi Rappoport <analyst@...> wrote:
            >
            > At 6:41 PM -0400 6/15/09, Heather Hedden wrote:
            > >Hi Glenda,
            > >
            > >Well, that's one of the main differences between a taxonomy and a
            > >thesaurus. A taxonomy is structured in hierarchies, not permitting
            > >orphan terms. A thesaurus, on the other hand, focuses on the term, not
            > >the structure, and while each term may have a number of hierarchical
            > >relationships, orphan terms may be permitted.
            >
            > That reminds me of an old version of the Autonomy auto-categorizer,
            > which was so obsessed with the right number of nodes and branches
            > that they'd move things around to keep them balanced. Pretty hard on
            > the humans, because their learned navigation paths disappeared.
            > Sometimes the best is the enemy of the good.
            >
            > Avi
            >
            >
            > --
            > Search Analysis and Help -- Search Tools Consulting
            > (510) 845-2551 / analyst@...
            > Complete Guide to Search Engines for Web Sites and Intranets:
            > <http://searchtools.com>
            >
          • aredmondneal
            Hi, Glenda, I agree that it s often difficult to smash all terms into an arbitrary recommended max number of top terms, be it 20 or 30 or whatever. However,
            Message 5 of 9 , Jun 18, 2009
              Hi, Glenda,

              I agree that it's often difficult to smash all terms into an arbitrary recommended max number of top terms, be it 20 or 30 or whatever. However, it's important to go back to the definition of taxonomy and thesaurus in the Z39.19 standard. For both, organization is integral.

              A taxonomy is "a hierarchically organized vocabulary based on a classification scheme". "A classification scheme is a method of organization, usually a hierarchical structure of relationships among entities." A thesaurus is "a controlled vocabulary arranged in a known order in which equivalence, homographic, hierarchical, and associative relationships among terms are clearly displayed and identified... Its purposes are...to facilitate browsing and searching."

              As Heather pointed out, if end user/searchers don't ever see the structure of the vocabulary, it doesn't matter much beyond the fact that they don't get what could be helpful information about broader or narrower concepts, which orphans don't have. And as Ahren pointed out, orphan terms can easily multiply and get out of control. A vocabulary with a few is one thing, but one where they start multiplying loses sight of the original goal of organization to support navigation, whether it be for an end user or the indexer.

              I don't feel so strongly about a set number of top terms. However, orphan terms suggests to me that it's time to reconsider the basic structure to find a happy place for them. That might mean reconsidering the scope of some concepts or tinkering with the wording of terms to make them more welcoming to the strays.

              Alice

              --- In TaxoCoP@yahoogroups.com, Webindexing <webindexing@...> wrote:
              >
              > There is (I think) a general assumption that all terms in a thesaurus
              > should be neatly grouped under, say, 20 top terms. In practice, it is
              > often difficult to allocate a broader term to every term in a thesaurus
              > without some distortion.
              >
              > Is there any reason why every term has to have a broader term? Is there
              > anything wrong with many top terms? I am looking for general principles
              > that would apply across a range of projects.
              >
              > I notice that the World Bank thesaurus
              > (http://www.multites.net/mtsql/wb/site/) lets you search for orphan
              > terms, of which they have many. I also noticed that the Australian
              > Government TAGS thesaurus has a number of orphan terms. It is possible
              > that the assumption that we need neat hierarchies is not as widespread
              > as thought.
              >
              > Thanks,
              >
              > Glenda.
              >
              > --
              > Glenda Browne
              > Indexer, Writer, Teacher
              > www.webindexing.biz
              >
            • Leonard Will
              On Tue, 16 Jun 2009 at 08:24:02, Webindexing wrote ... I think that it is often helpful to apply facet analysis to the concepts
              Message 6 of 9 , Jun 18, 2009
                On Tue, 16 Jun 2009 at 08:24:02, Webindexing
                <webindexing@...> wrote
                >There is (I think) a general assumption that all terms in a thesaurus
                >should be neatly grouped under, say, 20 top terms. In practice, it is
                >often difficult to allocate a broader term to every term in a thesaurus
                >without some distortion.
                >
                >Is there any reason why every term has to have a broader term? Is there
                >anything wrong with many top terms? I am looking for general principles
                >that would apply across a range of projects.
                >
                >Thanks,
                >
                >Glenda.

                I think that it is often helpful to apply facet analysis to the concepts
                being organised, so that we group concepts depending on the fundamental
                categories to which they belong. Thus we can group them into facets such
                as "objects", "materials", "living things", "people", "organizations",
                "abstract concepts", "places" and so on. These facet names can become
                top terms of hierarchies.

                Each orphan term will belong to a facet such as these, and this is the
                initial step in constructing hierarchies, because hierarchical
                relationships can only apply to concepts in the same facet. (Part/whole
                relationships may sometimes break this rule, but these should be used
                only in certain specific and limited circumstances.)

                This top-down approach has to be used in conjunction with the bottom-up
                approach of examining concepts and considering what relationships they
                should have, but it does avoid having a lot of orphans.

                Leonard
                --
                Willpower Information (Partners: Dr Leonard D Will, Sheena E Will)
                Information Management Consultants Tel: +44 (0)20 8372 0092
                27 Calshot Way L.Will@...
                ENFIELD Sheena.Will@...
                EN2 7BQ, UK http://www.willpowerinfo.co.uk/
              • Gabriel Tanase
                Hello, Could you recommend a book or two, or perhaps a series articles available online, from which a beginner might learn about, and how to do, facet
                Message 7 of 9 , Jun 19, 2009
                  Hello,

                  Could you recommend a book or two, or perhaps a series articles available online, from which a beginner might learn about, and how to do, facet analysis? Doesn't need to be a whole book dedicated to this; one good chapter would do.

                  Thank you very much,
                  Gabriel
                  http://www.linkedin.com/in/gabrieltanase


                  2009/6/18 Leonard Will <L.Will@...>
                  On Tue, 16 Jun 2009 at 08:24:02, Webindexing
                  <webindexing@...> wrote
                  >There is (I think) a general assumption that all terms in a thesaurus
                  >should be neatly grouped under, say, 20 top terms. In practice, it is
                  >often difficult to allocate a broader term to every term in a thesaurus
                  >without some distortion.
                  >
                  >Is there any reason why every term has to have a broader term? Is there
                  >anything wrong with many top terms? I am looking for general principles
                  >that would apply across a range of projects.
                  >
                  >Thanks,
                  >
                  >Glenda.

                  I think that it is often helpful to apply facet analysis to the concepts
                  being organised, so that we group concepts depending on the fundamental
                  categories to which they belong. Thus we can group them into facets such
                  as "objects", "materials", "living things", "people", "organizations",
                  "abstract concepts", "places" and so on. These facet names can become
                  top terms of hierarchies.

                  Each orphan term will belong to a facet such as these, and this is the
                  initial step in constructing hierarchies, because hierarchical
                  relationships can only apply to concepts in the same facet. (Part/whole
                  relationships may sometimes break this rule, but these should be used
                  only in certain specific and limited circumstances.)

                  This top-down approach has to be used in conjunction with the bottom-up
                  approach of examining concepts and considering what relationships they
                  should have, but it does avoid having a lot of orphans.

                  Leonard
                  --
                  Willpower Information     (Partners: Dr Leonard D Will, Sheena E Will)
                  Information Management Consultants            Tel: +44 (0)20 8372 0092
                  27 Calshot Way                              L.Will@...
                  ENFIELD                                Sheena.Will@...
                  EN2 7BQ, UK                            http://www.willpowerinfo.co.uk/

                • marijane white
                  William Denton s How to Make a Faceted Classification and Put it on the Web is a good place to start, in my experience.
                  Message 8 of 9 , Jun 19, 2009
                    William Denton's "How to Make a Faceted Classification and Put it on the Web" is a good place to start, in my experience.

                    http://www.miskatonic.org/library/facet-web-howto.html


                    On Fri, Jun 19, 2009 at 2:41 AM, Gabriel Tanase <gabtanase@...> wrote:


                    Hello,

                    Could you recommend a book or two, or perhaps a series articles available online, from which a beginner might learn about, and how to do, facet analysis? Doesn't need to be a whole book dedicated to this; one good chapter would do.

                    Thank you very much,
                    Gabriel
                    http://www.linkedin.com/in/gabrieltanase


                    2009/6/18 Leonard Will <L.Will@...>

                    On Tue, 16 Jun 2009 at 08:24:02, Webindexing
                    <webindexing@...> wrote
                    >There is (I think) a general assumption that all terms in a thesaurus
                    >should be neatly grouped under, say, 20 top terms. In practice, it is
                    >often difficult to allocate a broader term to every term in a thesaurus
                    >without some distortion.
                    >
                    >Is there any reason why every term has to have a broader term? Is there
                    >anything wrong with many top terms? I am looking for general principles
                    >that would apply across a range of projects.
                    >
                    >Thanks,
                    >
                    >Glenda.

                    I think that it is often helpful to apply facet analysis to the concepts
                    being organised, so that we group concepts depending on the fundamental
                    categories to which they belong. Thus we can group them into facets such
                    as "objects", "materials", "living things", "people", "organizations",
                    "abstract concepts", "places" and so on. These facet names can become
                    top terms of hierarchies.

                    Each orphan term will belong to a facet such as these, and this is the
                    initial step in constructing hierarchies, because hierarchical
                    relationships can only apply to concepts in the same facet. (Part/whole
                    relationships may sometimes break this rule, but these should be used
                    only in certain specific and limited circumstances.)

                    This top-down approach has to be used in conjunction with the bottom-up
                    approach of examining concepts and considering what relationships they
                    should have, but it does avoid having a lot of orphans.

                    Leonard
                    --
                    Willpower Information     (Partners: Dr Leonard D Will, Sheena E Will)
                    Information Management Consultants            Tel: +44 (0)20 8372 0092
                    27 Calshot Way                              L.Will@...
                    ENFIELD                                Sheena.Will@...
                    EN2 7BQ, UK                            http://www.willpowerinfo.co.uk/




                  • Patrick Lambe
                    Gabriel My book Organising Knowledge has an extensive discussion of facets and facet analysis with examples and further references. Facet analysis is a
                    Message 9 of 9 , Jun 28, 2009
                      Gabriel

                      My book 'Organising Knowledge' has an extensive discussion of facets and facet analysis with examples and further references. Facet analysis is a process of abstraction and is not always intuitive to the "general user" (which can compromise their ability to exploit facets in categorisation and search/browse), and it can fall victim to logical "high science" where facets are developed because they are possible or logical, rather than because they reflect important user perspectives on content. 

                      P

                      Patrick Lambe

                      website: www.straitsknowledge.com

                      Have you seen our KM Method Cards?   http://www.straitsknowledge.com/store/



                      On Jun 19, 2009, at 5:41 PM, Gabriel Tanase wrote:



                      Hello,

                      Could you recommend a book or two, or perhaps a series articles available online, from which a beginner might learn about, and how to do, facet analysis? Doesn't need to be a whole book dedicated to this; one good chapter would do.

                      Thank you very much,
                      Gabriel
                      http://www.linkedin .com/in/gabrielt anase


                      2009/6/18 Leonard Will <L.Will@willpowerinf o.co.uk>
                      On Tue, 16 Jun 2009 at 08:24:02, Webindexing
                      <webindexing@ optusnet. com.au> wrote
                      >There is (I think) a general assumption that all terms in a thesaurus
                      >should be neatly grouped under, say, 20 top terms. In practice, it is
                      >often difficult to allocate a broader term to every term in a thesaurus
                      >without some distortion.
                      >
                      >Is there any reason why every term has to have a broader term? Is there
                      >anything wrong with many top terms? I am looking for general principles
                      >that would apply across a range of projects.
                      >
                      >Thanks,
                      >
                      >Glenda.

                      I think that it is often helpful to apply facet analysis to the concepts
                      being organised, so that we group concepts depending on the fundamental
                      categories to which they belong. Thus we can group them into facets such
                      as "objects", "materials", "living things", "people", "organizations",
                      "abstract concepts", "places" and so on. These facet names can become
                      top terms of hierarchies.

                      Each orphan term will belong to a facet such as these, and this is the
                      initial step in constructing hierarchies, because hierarchical
                      relationships can only apply to concepts in the same facet. (Part/whole
                      relationships may sometimes break this rule, but these should be used
                      only in certain specific and limited circumstances. )

                      This top-down approach has to be used in conjunction with the bottom-up
                      approach of examining concepts and considering what relationships they
                      should have, but it does avoid having a lot of orphans.

                      Leonard
                      --
                      Willpower Information     (Partners: Dr Leonard D Will, Sheena E Will)
                      Information Management Consultants            Tel: +44 (0)20 8372 0092
                      27 Calshot Way                              L.Will@Willpowerinf o.co.uk
                      ENFIELD                                Sheena.Will@ Willpowerinfo. co.uk
                      EN2 7BQ, UK                            http://www.willpowe rinfo.co. uk/



                    Your message has been successfully submitted and would be delivered to recipients shortly.