Loading ...
Sorry, an error occurred while loading the content.

Re: [TaxoCoP] Tools to merge vocabularies

Expand Messages
  • marijane white
    In theory you could do 1) and 2) with Topic Maps, if you could export each vocabulary into XTM format and then merge them using the XTM mergeMap element and an
    Message 1 of 11 , Sep 22, 2009
    • 0 Attachment
      In theory you could do 1) and 2) with Topic Maps, if you could export each vocabulary into XTM format and then merge them using the XTM mergeMap element and an XTM processor, which would treat topics with the same name (exact matches of preferred terms) as identical, and would merge all it's relationships to other topics (the non-preferred terms) automatically.  In theory.  =)  Topic maps might have too much of a learning curve to explore this approach for a boot camp presentation.


      Marijane White



      On Sat, Sep 19, 2009 at 4:08 PM, Heather Hedden <heather@...> wrote:
      If you want to merge one controlled vocabulary into another, what tools
      are their to do a first pass (before a human review) of automtic
      matching? This matching would include 1) exact matches of preferred
      terms between the two vocabularies, 2) matches between preferred terms
      of one and nonpreferred terms of the other, and 3) possibly even
      additional fuzzy matches based on natural language processing.

      Information sought in preparation for a Taxonomy Boot Camp presentation.
      Thanks.

      -- Heather

      --
      Heather Hedden
      Hedden Information Management
      Heather@...
      www.Hedden-Information.com



      ------------------------------------

      Yahoo! Groups Links

      <*> To visit your group on the web, go to:
         http://groups.yahoo.com/group/TaxoCoP/

      <*> Your email settings:
         Individual Email | Traditional

      <*> To change settings online go to:
         http://groups.yahoo.com/group/TaxoCoP/join
         (Yahoo! ID required)

      <*> To change settings via email:
         mailto:TaxoCoP-digest@yahoogroups.com
         mailto:TaxoCoP-fullfeatured@yahoogroups.com

      <*> To unsubscribe from this group, send an email to:
         TaxoCoP-unsubscribe@yahoogroups.com

      <*> Your use of Yahoo! Groups is subject to:
         http://docs.yahoo.com/info/terms/


    • Alice
      I would be interested to hear of software that meets Heather s specifications. We use Data Harmony s Thesaurus Master for merging taxonomies (full disclosure:
      Message 2 of 11 , Sep 29, 2009
      • 0 Attachment
        I would be interested to hear of software that meets Heather's specifications. We use Data Harmony's Thesaurus Master for merging taxonomies (full disclosure: I'm employed by its producer Access Innovations). The software permits successive imports from external files, providing various views of the combined file. Each source taxo becomes a main branch of the target taxo and the aggregate can be worked with, analyzing for similar or overlapping terms and concepts. Viewing the aggregate in permuted format reveals common words in terms and sometimes common underlying concepts that can be obscured by how the term is expressed. The software alerts you to incoming terms that are identical to target file NPTs. Identical preferred terms from the separate files are merged; they should be analyzed to be sure they represent the same concept--if not, one or the other term must be reworded to be more accurate and clear.

        Working in the hierarchy view, identical terms in separate taxo files may acquire multiple broader terms if the additional new parents work out well; if they don't, the expression must change to reflect the concept more accurately. Through the merge, show some terms may adopt children under the identical BT from the other source files. If the collected children play well together, great; otherwise the incompatible family groups get separated and the name of one of the parents must change to disambiguate.

        We've used the software to blend up five source files into one and also for mapping multilingual files.

        A first pass run to spot duplicates is good. Before long, humans must, of course, analyze and compare overall structure and specific terms.

        Alice

        --- In TaxoCoP@yahoogroups.com, Heather Hedden <heather@...> wrote:
        >
        > If you want to merge one controlled vocabulary into another, what tools
        > are their to do a first pass (before a human review) of automtic
        > matching? This matching would include 1) exact matches of preferred
        > terms between the two vocabularies, 2) matches between preferred terms
        > of one and nonpreferred terms of the other, and 3) possibly even
        > additional fuzzy matches based on natural language processing.
        >
        > Information sought in preparation for a Taxonomy Boot Camp presentation.
        > Thanks.
        >
        > -- Heather
        >
        > --
        > Heather Hedden
        > Hedden Information Management
        > Heather@...
        > www.Hedden-Information.com
        >
      • Seth Earley
        This might be a good topic for one of our monthly conference calls. ________________________________ From: TaxoCoP@yahoogroups.com
        Message 3 of 11 , Sep 29, 2009
        • 1 Attachment
        • 18 KB
        This might be a good topic for one of our monthly conference calls.

        ________________________________
        From: TaxoCoP@yahoogroups.com <TaxoCoP@yahoogroups.com>
        To: TaxoCoP@yahoogroups.com <TaxoCoP@yahoogroups.com>
        Sent: Tue Sep 29 08:35:15 2009
        Subject: [TaxoCoP] Re: Tools to merge vocabularies



        I would be interested to hear of software that meets Heather's specifications. We use Data Harmony's Thesaurus Master for merging taxonomies (full disclosure: I'm employed by its producer Access Innovations). The software permits successive imports from external files, providing various views of the combined file. Each source taxo becomes a main branch of the target taxo and the aggregate can be worked with, analyzing for similar or overlapping terms and concepts. Viewing the aggregate in permuted format reveals common words in terms and sometimes common underlying concepts that can be obscured by how the term is expressed. The software alerts you to incoming terms that are identical to target file NPTs. Identical preferred terms from the separate files are merged; they should be analyzed to be sure they represent the same concept--if not, one or the other term must be reworded to be more accurate and clear.

        Working in the hierarchy view, identical terms in separate taxo files may acquire multiple broader terms if the additional new parents work out well; if they don't, the expression must change to reflect the concept more accurately. Through the merge, show some terms may adopt children under the identical BT from the other source files. If the collected children play well together, great; otherwise the incompatible family groups get separated and the name of one of the parents must change to disambiguate.

        We've used the software to blend up five source files into one and also for mapping multilingual files.

        A first pass run to spot duplicates is good. Before long, humans must, of course, analyze and compare overall structure and specific terms.

        Alice

        --- In TaxoCoP@yahoogroups.com<mailto:TaxoCoP%40yahoogroups.com>, Heather Hedden <heather@...> wrote:
        >
        > If you want to merge one controlled vocabulary into another, what tools
        > are their to do a first pass (before a human review) of automtic
        > matching? This matching would include 1) exact matches of preferred
        > terms between the two vocabularies, 2) matches between preferred terms
        > of one and nonpreferred terms of the other, and 3) possibly even
        > additional fuzzy matches based on natural language processing.
        >
        > Information sought in preparation for a Taxonomy Boot Camp presentation.
        > Thanks.
        >
        > -- Heather
        >
        > --
        > Heather Hedden
        > Hedden Information Management
        > Heather@...
        > www.Hedden-Information.com
        >
      • laptopjockey
        ... I m curious...if merging multiple taxonomies / thesauri is focused on (among other things) getting rid of duplicates how do homographs live in such an
        Message 4 of 11 , Sep 29, 2009
        • 0 Attachment
          --- In TaxoCoP@yahoogroups.com, "Alice" <aredmondneal@...> wrote:
          >
          > I would be interested to hear of software that meets Heather's specifications.

          I'm curious...if merging multiple taxonomies / thesauri is focused on (among other things) getting rid of 'duplicates' how do homographs live in such an environment?

          For example, how does 'Appendix' in a medical thesaurus live with 'Appendix' from a publishing thesaurus?

          John O'
        • Heather Hedden
          First of all, merging should only be done on thesauri in the same field (subject area). But if the field is broad, then undifferentiated homographs with
          Message 5 of 11 , Sep 29, 2009
          • 0 Attachment
            First of all, merging should only be done on thesauri in the same field
            (subject area). But if the field is broad, then undifferentiated
            homographs with different meaning may occur and be matched. A
            taxonomist must review the automated matchings. Parenthetical qualifiers
            can be added if both homographs with different meanings are kept.

            -- Heather

            Heather Hedden
            Hedden Information Management
            www.Hedden-Information.com



            laptopjockey wrote:
            > --- In TaxoCoP@yahoogroups.com, "Alice" <aredmondneal@...> wrote:
            >
            >> I would be interested to hear of software that meets Heather's specifications.
            >>
            >
            > I'm curious...if merging multiple taxonomies / thesauri is focused on (among other things) getting rid of 'duplicates' how do homographs live in such an environment?
            >
            > For example, how does 'Appendix' in a medical thesaurus live with 'Appendix' from a publishing thesaurus?
            >
            > John O'
            >
            >
            >
            > ------------------------------------
            >
            > Yahoo! Groups Links
            >
            >
            >
            >
            >
          • Janice M Herd
            Hi John, It would be necessary to disambiguate the two terms if one merges two vocabularies of distinct subject areas such as medicine and publishing. Example:
            Message 6 of 11 , Sep 29, 2009
            • 0 Attachment
              Hi John,
              It would be necessary to disambiguate the two terms if one merges two vocabularies of distinct subject areas such as medicine and publishing.
              Example: Appendix (Anatomy) uses a paranthetical qualifier
              Appendices (to printed material) might be used in the plural since Z39.19 (NISO monolingual thesaurus construction standard) requires most terms to be plural and it could also receive a paranthetical qualifier.
              Jan


              >>> "laptopjockey" <jogorman@...> 9/29/09 1:20 PM >>>
              --- In TaxoCoP@yahoogroups.com, "Alice" <aredmondneal@...> wrote:
              >
              > I would be interested to hear of software that meets Heather's specifications.

              I'm curious...if merging multiple taxonomies / thesauri is focused on (among other things) getting rid of 'duplicates' how do homographs live in such an environment?

              For example, how does 'Appendix' in a medical thesaurus live with 'Appendix' from a publishing thesaurus?

              John O'
            • John O'Gorman
              That doesn t seem like a very scalable solution... Subject fields intersect all the time - and are more likely to so with greater regularity in the future.
              Message 7 of 11 , Sep 29, 2009
              • 0 Attachment
                That doesn't seem like a very scalable solution...
                 
                Subject fields intersect all the time - and are more likely to so with greater regularity in the future. Think of all the areas where 'Law' touches the ground, or 'Medicine' or heck 'Technology'. Just ten cents worth, but if Taxonomists as a group are going to make an even more significant contribution in the future, we're going to have to find a more elegant way to manage ambiguity in mixed subject areas.
                 
                John O'
                 
                 
                -----Original Message-----
                From: Heather Hedden [mailto:heather@...]
                Sent: Tuesday, September 29, 2009 11:31 AM
                To: TaxoCoP@yahoogroups.com
                Subject: Re: [TaxoCoP] Re: Tools to merge vocabularies

                 

                First of all, merging should only be done on thesauri in the same field
                (subject area). But if the field is broad, then undifferentiated
                homographs with different meaning may occur and be matched. A
                taxonomist must review the automated matchings. Parenthetical qualifiers
                can be added if both homographs with different meanings are kept.

                -- Heather

                Heather Hedden
                Hedden Information Management
                www.Hedden-Informat ion.com

                laptopjockey wrote:
                > --- In TaxoCoP@yahoogroups .com, "Alice" <aredmondneal@ ...> wrote:
                >
                >> I would be interested to hear of software that meets Heather's specifications.
                >>
                >
                > I'm curious...if merging multiple taxonomies / thesauri is focused on (among other things) getting rid of 'duplicates' how do homographs live in such an environment?
                >
                > For example, how does 'Appendix' in a medical thesaurus live with 'Appendix' from a publishing thesaurus?
                >
                > John O'
                >
                >
                >
                > ------------ --------- --------- ------
                >
                > Yahoo! Groups Links
                >
                >
                >
                >
                >

              • Heather Hedden
                My mistake, I was thinking of mapping projects. Merging, yes, can be done in overlapping fields. But Jan is correct, that parenthetical qualifiers are used to
                Message 8 of 11 , Sep 29, 2009
                • 0 Attachment
                  My mistake, I was thinking of mapping projects. Merging, yes, can be
                  done in overlapping fields.
                  But Jan is correct, that parenthetical qualifiers are used to
                  disambiguate homonyms in the same taxonomy, or at least the same facet.
                  If that's not elegant, the if the terms are in separate facets, then the
                  parenthetical qualifiers may not be necessary in the display, but the
                  terms have to be distinguished under the hood and in the indexing.

                  -- Heather



                  John O'Gorman wrote:
                  >
                  >
                  > That doesn't seem like a very scalable solution...
                  >
                  > Subject fields intersect all the time - and are more likely to so with
                  > greater regularity in the future. Think of all the areas where 'Law'
                  > touches the ground, or 'Medicine' or heck 'Technology'. Just ten cents
                  > worth, but if Taxonomists as a group are going to make an even more
                  > significant contribution in the future, we're going to have to find a
                  > more elegant way to manage ambiguity in mixed subject areas.
                  >
                  > John O'
                  >
                  >
                  >
                  > -----Original Message-----
                  > *From:* Heather Hedden [mailto:heather@...]
                  > *Sent:* Tuesday, September 29, 2009 11:31 AM
                  > *To:* TaxoCoP@yahoogroups.com
                  > *Subject:* Re: [TaxoCoP] Re: Tools to merge vocabularies
                  >
                  >
                  >
                  > First of all, merging should only be done on thesauri in the same
                  > field
                  > (subject area). But if the field is broad, then undifferentiated
                  > homographs with different meaning may occur and be matched. A
                  > taxonomist must review the automated matchings. Parenthetical
                  > qualifiers
                  > can be added if both homographs with different meanings are kept.
                  >
                  > -- Heather
                  >
                  > Heather Hedden
                  > Hedden Information Management
                  > www.Hedden-Information.com
                  >
                  > laptopjockey wrote:
                  > > --- In TaxoCoP@yahoogroups.com
                  > <mailto:TaxoCoP%40yahoogroups.com>, "Alice" <aredmondneal@...> wrote:
                  > >
                  > >> I would be interested to hear of software that meets Heather's
                  > specifications.
                  > >>
                  > >
                  > > I'm curious...if merging multiple taxonomies / thesauri is
                  > focused on (among other things) getting rid of 'duplicates' how do
                  > homographs live in such an environment?
                  > >! ;
                  > > For example, how does 'Appendix' in a medical thesaurus live
                  > with 'Appendix' from a publishing thesaurus?
                  > >
                  > > John O'
                  > >
                  > >
                  > >
                  > > ------------------------------------
                  > >
                  > > Yahoo! Groups Links
                  > >
                  > >
                  > >
                  > >
                  > >
                  >
                  >
                  >
                  >
                  >
                • John O'Gorman
                  This is interesting. Your response uses the word mapping in a way that I can t disambiiguate: does mapping refer to Geography or to a broader qualifier in
                  Message 9 of 11 , Sep 29, 2009
                  • 0 Attachment
                    This is interesting. Your response uses the word 'mapping' in a way that I can't disambiiguate: does 'mapping' refer to Geography or to a broader qualifier in the Taxonomy field?  :~)
                     
                    Definitely agree that homographs have to be disambiguated somewhere, the point I was trying to make (and as usual not very elegantly) is that we as taxonomists have an opportunity in this rapidly shrinking environment to push the process to the fore.
                     
                    John O'
                     
                     
                    -----Original Message-----
                    From: Heather Hedden [mailto:heather@...]
                    Sent: Tuesday, September 29, 2009 01:02 PM
                    To: TaxoCoP@yahoogroups.com
                    Subject: Re: [TaxoCoP] Re: Tools to merge vocabularies

                     

                    My mistake, I was thinking of mapping projects. Merging, yes, can be
                    done in overlapping fields.
                    But Jan is correct, that parenthetical qualifiers are used to
                    disambiguate homonyms in the same taxonomy, or at least the same facet.
                    If that's not elegant, the if the terms are in separate facets, then the
                    parenthetical qualifiers may not be necessary in the display, but the
                    terms have to be distinguished under the hood and in the indexing.

                    -- Heather

                    John O'Gorman wrote:
                    >
                    >
                    > That doesn't seem like a very scalable solution...
                    >
                    > Subject fields intersect all the time - and are more likely to so with
                    > greater regularity in the future. Think of all the areas where 'Law'
                    > touches the ground, or 'Medicine' or heck 'Technology' . Just ten cents
                    > worth, but if Taxonomists as a group are going to make an even more
                    > significant contribution in the future, we're going to have to find a
                    > more elegant way to manage ambiguity in mixed subject areas.
                    >
                    > John O'
                    >
                    >
                    >
                    > -----Original Message-----
                    > *From:* Heather Hedden [mailto:heather@hedden. net]
                    > *Sent:* Tuesday, September 29, 2009 11:31 AM
                    > *To:* TaxoCoP@yahoogroups .com
                    > *Subject:* Re: [TaxoCoP] Re: Tools to merge vocabularies
                    >
                    >
                    >
                    > First of all, merging should only be done on thesauri in the same
                    > field
                    > (subject area). But if the field is broad, then undifferentiated
                    > homographs with different meaning may occur and be matched. A
                    > taxonomist must review the automated matchings. Parenthetical
                    > qualifiers
                    > can be added if both homographs with different meanings are kept.
                    >
                    > -- Heather
                    >
                    > Heather Hedden
                    > Hedden Information Management
                    > www.Hedden-Informat ion.com
                    >
                    > laptopjockey wrote:
                    > > --- In TaxoCoP@yahoogroups .com
                    > <mailto:TaxoCoP% 40yahoogroups. com>, "Alice" <aredmondneal@ ...> wrote:
                    > >
                    > >> I would be interested to hear of software that meets Heather's
                    > specifications.
                    > >>
                    > >
                    > > I'm curious...if merging multiple taxonomies / thesauri is
                    > focused on (among other things) getting rid of 'duplicates' how do
                    > homographs live in such an environment?
                    > >! ;
                    > > For example, how does 'Appendix' in a medical thesaurus live
                    > with 'Appendix' from a publishing thesaurus?
                    > >
                    > > John O'
                    > >
                    > >
                    > >
                    > > ------------ --------- --------- ------
                    > >
                    > > Yahoo! Groups Links
                    > >
                    > >
                    > >
                    > >
                    > >
                    >
                    >
                    >
                    >
                    >

                  Your message has been successfully submitted and would be delivered to recipients shortly.