Loading ...
Sorry, an error occurred while loading the content.

Re: [TaxoCoP] Re: Use of GUID's instead of actual terms for tagging content

Expand Messages
  • Donna M. Fritzsche
    Hi Seth and Clint, Additional points to consider: 1) How are homonyms handled? GUIDs might help in this case 2) Will there ever be a need to create crosswalks
    Message 1 of 15 , Jul 20 9:37 AM
    • 0 Attachment
      Hi Seth and Clint,

      Additional points to consider:
      1) How are homonyms handled? GUIDs might help in this case
      2) Will there ever be a need to create crosswalks - in this case the use of GUIDs might make it easier for a game of "telephone" to be started.
      3) Will there be one authoritative store/mapping that all  databases will access?  - highly recommended
      4) Debugging/troubleshooting downstream systems - this is easier if the actual terms are used and not GUIDs
      5) Efficiency of search, sorting, indexing, etc - there may be computational reasons for using integers/floats of over strings, etc.
      6) Storing the id and the term may not be a bad idea in some cases, since the IDs may make for a more efficient computation (see above) and memory is generally considered inexpensive)

      Hope this is helpful,
      Donna Fritzsche




      -----Original Message-----
      From: Clint
      Sent: Jul 20, 2010 9:20 AM
      To: TaxoCoP@yahoogroups.com
      Subject: [TaxoCoP] Re: Use of GUID's instead of actual terms for tagging content

       



      For those that haven't worked with GUIDs before, the Microsoft Product key is a good example of a GUID.

      I have been involved with taxo systems in the past that used GUIDs. They are extremely useful and more importantly, powerful. A user can change the hierarchical name of node without any back end work or negative consequences because the GUID is static. A user could change the name of the attribute, but feeds will not be compromised because the GUID is still the same. The Parent-Child relationship is between the GUIDs not the terms. When communicating with DBAs there is no confusion as too what you were referring to because of a 64-bit GUID. They help to minimize the errors of minimum wage data monkeys who don't know how to spell.

      In the world of ecommerce, static taxonomies are much closer to myth than reality. GUIDs help stabilize these taxonomies and are priceless.

      <i>"the term and the term id is stored which kind of defeats the purpose of having a term id instead of the term"</i> Depending on how it is called by the db having the name stored with the GUID defeats nothing. In fact unless it is an unbreakable link between the two it hurts nothing at all.

      I didn't read an actual question in your post so I am unsure if I answered it. Please let me know if there is something else that I can share

      Clint Elmore
      Data Governance Czar and Taxonomist
      Sears Holdings

      --- In TaxoCoP@yahoogroups.com, Seth Earley <seth@...> wrote:
      >
      > I am posting this at the request of a client interested in how other organizations are addressing this issue.
      >
      > We have a client dealing with deployment of a taxonomy management tool where their downstream systems are consuming not the taxonomy term itself, but a global unique identifier that represents the term.
      >
      > The advantage of this is that one can change a taxonomy term and not have to retag content. It is also easier to handle translations of the term.
      >
      > There are a number of nuances to the use cases for changing terms but one issue is how consuming systems handle term id and the actual term.
      >
      > In some systems, the term and the term id is stored which kind of defeats the purpose of having a term id instead of the term. If only the term id is stored, then the relationship to the term needs to be cached and refreshed when the term changes but the concept, represented by the term id, is the same.
      >
      > In some unusual cases, when a document designated as a legal record is tagged, there is a problem with this scenario (since metadata is supposed to be locked down with the legal record).
      >
      > I'd be interested in hearing how others have addressed these areas.
      >
      > Seth
      >
      > Seth Earley
      > CEO
      > _____________________________
      > EARLEY & ASSOCIATES, Inc.
      > Cell:
      > Email: seth@...<mailto:seth@...>
      > Web: www.earley.com<http://www.earley.com>
      >
      > Follow me on twitter: sethearley
      > Connect with me on LinkedIn: www.linkedin.com/in/sethearley
      >

    • James Morris
      Hi everyone I agree this is a great, very relevant, topic. My question is how do you use GUIDs in a large complex organisation that requires different parts
      Message 2 of 15 , Jul 22 4:35 AM
      • 0 Attachment
        Hi everyone
        I agree this is a great, very relevant, topic.  My question is how do you use GUIDs in a large complex organisation that requires different parts of the business and different systems to independently manage their presentation of a concept.  People seem to be saying that a GUID makes this easier, but I'm having a hard time understanding this.  So if the concept is the same throughout the enterprise, could they use the same GUID?  Different systems, different menus, interface elements as well as general subject tagging might need distinct versions of the terms, either equivalences or translations.  Would each concept/GUID have any number of variations/translations related to it to serve different purposes?   Then the consuming system would display to the user their local version of the term, but actually store the concept GUID?     Or would it be simpler to just have separate vocabularies, all with their own GUIDs, and then draw relationships between the vocabularies to tie them together.  The first would be more elegant in many ways, but the architecture and how different systems would have to be programmed to interact with it seems very complex -- or is it?   

        Thanks for any thoughts,

        Jim Morris
        West Chester, PA

        On Jul 21, 2010, at 12:16 PM, Jonathan Studiman wrote:



        Good topic!  We currently use our subject thesaurus to tag content in a few downstream systems.  Currently we tag using the terms themselves rather than term IDs.  This model was necessary because we were constrained by the search architecture in these systems.  Term updates in our master version are pushed out to systems through reuse of a script that matches and updates content metadata, which can be a little more labour intensive.
         
        As we discuss upgrades to systems and search, I am advocating a move to global unique identifiers for many of the same reasons you and Clint describe.
         
        Seth, in your legal record instance, could the record and metadata be locked down, but current terms (and future updates) be cached and used for search separately in the search application itself?  This is where storing both the term and term ID in record metadata would be useful.  Perhaps that might negate the legal purpose of locking the record & metadata though. 
         
        Donna raises some really good points on storing ID and term too.  Thanks for those!
         
        best regards
         
         
         
         
         
        Jonathan Studiman
        416-392-4271
        Sr. Records and Information Analyst
        Policies, Programs and Standards
        Records and Information Management
        City Clerk's Division
        City of Toronto



      • laptopjockey
        The model I am working with combines two concepts to manage these types of scenarios: tandem identifiers and upstream classification against a fixed number of
        Message 3 of 15 , Jul 22 8:08 AM
        • 0 Attachment
          The model I am working with combines two concepts to manage these types of scenarios: tandem identifiers and upstream classification against a fixed number of facets. Some of you have heard me discuss the latter before. The former idea is borrowed from the content management field with respect to version tracking.

          Think about it like a controlled, faceted vocabulary and you get the gist. You don't have to use GUIDs if, as a previous poster mentioned, your coverage is relatively narrow.

          All of Jim's scenarios point to the value of such an architecture for things like information integration, master data management, translation management and the use of something like RDF to manage associations. Oh, and the content object itself should be treated in the same way.

          Hope that helps. Shameless promotional plug: the book's out in October.

          John O'



          --- In TaxoCoP@yahoogroups.com, James Morris <jamesraymorris@...> wrote:
          >
          > Hi everyone
          > I agree this is a great, very relevant, topic. My question is how do you use GUIDs in a large complex organisation that requires different parts of the business and different systems to independently manage their presentation of a concept. People seem to be saying that a GUID makes this easier, but I'm having a hard time understanding this. So if the concept is the same throughout the enterprise, could they use the same GUID? Different systems, different menus, interface elements as well as general subject tagging might need distinct versions of the terms, either equivalences or translations. Would each concept/GUID have any number of variations/translations related to it to serve different purposes? Then the consuming system would display to the user their local version of the term, but actually store the concept GUID? Or would it be simpler to just have separate vocabularies, all with their own GUIDs, and then draw relationships between the vocabularies to tie them together. The first would be more elegant in many ways, but the architecture and how different systems would have to be programmed to interact with it seems very complex -- or is it?
          >
          > Thanks for any thoughts,
          >
          > Jim Morris
          > jamesraymorris@...
          > West Chester, PA
          >
          > On Jul 21, 2010, at 12:16 PM, Jonathan Studiman wrote:
          >
          > >
          > >
          > > Good topic! We currently use our subject thesaurus to tag content in a few downstream systems. Currently we tag using the terms themselves rather than term IDs. This model was necessary because we were constrained by the search architecture in these systems. Term updates in our master version are pushed out to systems through reuse of a script that matches and updates content metadata, which can be a little more labour intensive.
          > >
          > > As we discuss upgrades to systems and search, I am advocating a move to global unique identifiers for many of the same reasons you and Clint describe.
          > >
          > > Seth, in your legal record instance, could the record and metadata be locked down, but current terms (and future updates) be cached and used for search separately in the search application itself? This is where storing both the term and term ID in record metadata would be useful. Perhaps that might negate the legal purpose of locking the record & metadata though.
          > >
          > > Donna raises some really good points on storing ID and term too. Thanks for those!
          > >
          > > best regards
          > >
          > >
          > >
          > >
          > >
          > > Jonathan Studiman
          > > 416-392-4271
          > > jstudim@...
          > > Sr. Records and Information Analyst
          > > Policies, Programs and Standards
          > > Records and Information Management
          > > City Clerk's Division
          > > City of Toronto
          > >
          > >
          > >
          >
        • marijane white
          James, The situation you describe would be a perfect application for a Topic Map. Through the use of scope, Topic Maps would allow you to provide the
          Message 4 of 15 , Jul 22 8:34 AM
          • 0 Attachment
            James,

            The situation you describe would be a perfect application for a Topic Map.  Through the use of scope, Topic Maps would allow you to provide the contextual presentation in each business or system context while identifying them as the same concept by using the GUID as a subject identifier. 


            -Marijane

            On Thu, Jul 22, 2010 at 4:35 AM, James Morris <jamesraymorris@...> wrote:


            Hi everyone
            I agree this is a great, very relevant, topic.  My question is how do you use GUIDs in a large complex organisation that requires different parts of the business and different systems to independently manage their presentation of a concept.  People seem to be saying that a GUID makes this easier, but I'm having a hard time understanding this.  So if the concept is the same throughout the enterprise, could they use the same GUID?  Different systems, different menus, interface elements as well as general subject tagging might need distinct versions of the terms, either equivalences or translations.  Would each concept/GUID have any number of variations/translations related to it to serve different purposes?   Then the consuming system would display to the user their local version of the term, but actually store the concept GUID?     Or would it be simpler to just have separate vocabularies, all with their own GUIDs, and t! hen draw relationships between the vocabularies to tie them together.  The first would be more elegant in many ways, but the architecture and how different systems would have to be programmed to interact with it seems very complex -- or is it?   

            Thanks for any thoughts,

            Jim Morris
            West Chester, PA

            On Jul 21, 2010, at 12:16 PM, Jonathan Studiman wrote:



            Good topic!  We currently use our subject thesaurus to tag content in a few downstream systems.  Currently we tag using the terms themselves rather than term IDs.  This model was necessary because we were constrained by the search architecture in these systems.  Term updates in our master version are pushed out to systems through reuse of a script that matches and updates content metadata, which can be a little more labour intensive.
             
            As we discuss upgrades to systems and search, I am advocating a move to global unique identifiers for many of the same reasons you and Clint describe.
             
            Seth, in your legal record instance, could the&! nbsp;record and metadata be locked down, but current terms (and future updates) be cached and used for search separately in the search application itself?  This is where storing both the term and term ID in record metadata would be useful.  Perhaps that might negate the legal purpose of locking the record & metadata though. 
             
            Donna raises some really good points on storing ID and term too.  Thanks for those!
             
            best regards
             
             
             
             
             
            Jonathan Studiman
            416-392-4271
            Sr. Records and Information Analyst
            Policies, Programs and Standards
            Records and Information Management
            City Clerk's Division
            City of Toronto


            < /div>



          • Seth Earley
            Hi Clint, Yes, I realize I did not actually ask a question. :) (Thanks everyone for all of the contributions to this post) This was really to throw the issue
            Message 5 of 15 , Jul 25 6:46 PM
            • 0 Attachment

              Hi Clint,

               

              Yes, I realize I did not actually ask a question.  J

               

              (Thanks everyone for all of the contributions to this post)

               

              This was really to throw the issue out there and get people’s input on how this is put into practice and stimulate some conversation. 

               

              I had some additional discussions (amazing how much debate internally at our organization and at the client this has stimulated) and the nuance here is “Depending on how it is called by the db having the name stored with the GUID defeats nothing. In fact unless it is an unbreakable link between the two it hurts nothing at all.”

               

              The dependency is on how downstream systems handle changes.

               

              The use case is as follows: (making up the details here but the concept is valid)

               

              As of July 1, 2010, the preferred term for materials that explain the offerings of the organization is “Brochure”

              The GUID in the taxonomy management system is 111

              Brochure = 111

               

              Brochure is used in 2 downstream applications – a portal which sits on top of a content management application (the Portal) and a Digital Asset Management System (the DAM)

               

              Both systems store the GUID with the preferred term

               

              The DAM  system has the ability to retag content.  If a term changes, the system can update the metadata. A web service provides near real time tagging values and reference data for forms from the taxonomy management system. 

              The Portal does not have the capability to retag content. Instead, an XML feed each night from the taxonomy management system.  This provides a mapping of preferred terms to GUID and tagging values and reference data  for forms.

               

              Search applications use a synonym ring to perform term expansion so that in either case, entering Brochure or Pamphlet will return appropriate documents within each system

               

              On July 15th, the marketing group decides to start calling these materials “Pamphlet”  

              Brochure is now designated as a synonym to Pamphlet
              The GUID in the taxonomy management system for Pamphlet is still 111

              Brochure is given a new GUID of 222

              Pamphlet = 111

              Brochure = 222

               

              The DAM retags content with the new preferred term of Pamphlet and 111.  No content contains the old term of Brochure.  New content is tagged with the new preferred term and old content is retagged with the new preferred term.  The user interface will display pamphlet in faceted (attribute based) search.  

               

              In the Portal, when terms change, content is not retagged.  Drop downs are updated via an XML feed.  The new term is associated with the original preferred term GUID.

               

              Since attribute based navigation is displays the new preferred term, the GUID is used to tell the system what term to display in the UI.  Therefore if two pieces of content are tagged with different terms but the same GUID, the UI will show the new preferred term as the navigational node. 

               

              Some content will contain “Brochure” and GUID 111 and others will have “Pamphlet” and GUID 111 but the UI will display “Pamphlet”. 

               

              The bottom line is that systems can handle term changes in different ways based on their architecture, functionality and UI

               

              Seth

               

              Seth Earley

              CEO
              _____________________________

              EARLEY & ASSOCIATES, Inc.
              Cell: 781-820-8080

              Email: seth@...  
              Web: www.earley.com

               

              Follow me on twitter: sethearley

              Connect with me on  LinkedIn: www.linkedin.com/in/sethearley

               

              From: TaxoCoP@yahoogroups.com [mailto:TaxoCoP@yahoogroups.com] On Behalf Of Clint
              Sent: Tuesday, July 20, 2010 8:21 AM
              To: TaxoCoP@yahoogroups.com
              Subject: [TaxoCoP] Re: Use of GUID's instead of actual terms for tagging content

               

               



              For those that haven't worked with GUIDs before, the Microsoft Product key is a good example of a GUID.

              I have been involved with taxo systems in the past that used GUIDs. They are extremely useful and more importantly, powerful. A user can change the hierarchical name of node without any back end work or negative consequences because the GUID is static. A user could change the name of the attribute, but feeds will not be compromised because the GUID is still the same. The Parent-Child relationship is between the GUIDs not the terms. When communicating with DBAs there is no confusion as too what you were referring to because of a 64-bit GUID. They help to minimize the errors of minimum wage data monkeys who don't know how to spell.

              In the world of ecommerce, static taxonomies are much closer to myth than reality. GUIDs help stabilize these taxonomies and are priceless.

              <i>"the term and the term id is stored which kind of defeats the purpose of having a term id instead of the term"</i> Depending on how it is called by the db having the name stored with the GUID defeats nothing. In fact unless it is an unbreakable link between the two it hurts nothing at all.

              I didn't read an actual question in your post so I am unsure if I answered it. Please let me know if there is something else that I can share

              Clint Elmore
              Data Governance Czar and Taxonomist
              Sears Holdings

              --- In TaxoCoP@yahoogroups.com, Seth Earley <seth@...> wrote:

              >
              > I am posting this at the request of a client interested in how other
              organizations are addressing this issue.
              >
              > We have a client dealing with deployment of a taxonomy management tool
              where their downstream systems are consuming not the taxonomy term itself, but a global unique identifier that represents the term.
              >
              > The advantage of this is that one can change a taxonomy term and not have
              to retag content. It is also easier to handle translations of the term.
              >
              > There are a number of nuances to the use cases for changing terms but one
              issue is how consuming systems handle term id and the actual term.
              >
              > In some systems, the term and the term id is stored which kind of defeats
              the purpose of having a term id instead of the term. If only the term id is stored, then the relationship to the term needs to be cached and refreshed when the term changes but the concept, represented by the term id, is the same.
              >
              > In some unusual cases, when a document designated as a legal record is
              tagged, there is a problem with this scenario (since metadata is supposed to be locked down with the legal record).
              >
              > I'd be interested in hearing how others have addressed these areas.
              >
              > Seth
              >
              > Seth Earley
              > CEO
              > _____________________________
              > EARLEY & ASSOCIATES, Inc.
              > Cell:
              > Email: seth@...<mailto:seth@...>
              > Web: www.earley.com<http://www.earley.com>
              >
              > Follow me on twitter: sethearley
              > Connect with me on LinkedIn: www.linkedin.com/in/sethearley
              >

            • laptopjockey
              Hi Seth; With all due respect, the process you outline here is not sustainable in a widely distributed, heavily used, multi-language environment. The
              Message 6 of 15 , Jul 26 8:11 AM
              • 0 Attachment
                Hi Seth;

                With all due respect, the process you outline here is not sustainable in a widely distributed, heavily used, multi-language environment. The relationship between the number and the term must be persistent regardless whether they are paired or not. Brochure has one permanent number and pamphlet has another...flipping the term associated with one to the other would be like changing the primary key in a database.

                I think you've mis-stepped on this one.



                --- In TaxoCoP@yahoogroups.com, Seth Earley <seth@...> wrote:
                >
                > Hi Clint,
                >
                > Yes, I realize I did not actually ask a question. :)
                >
                > (Thanks everyone for all of the contributions to this post)
                >
                > This was really to throw the issue out there and get people's input on how this is put into practice and stimulate some conversation.
                >
                > I had some additional discussions (amazing how much debate internally at our organization and at the client this has stimulated) and the nuance here is "Depending on how it is called by the db having the name stored with the GUID defeats nothing. In fact unless it is an unbreakable link between the two it hurts nothing at all."
                >
                > The dependency is on how downstream systems handle changes.
                >
                > The use case is as follows: (making up the details here but the concept is valid)
                >
                > As of July 1, 2010, the preferred term for materials that explain the offerings of the organization is "Brochure"
                > The GUID in the taxonomy management system is 111
                > Brochure = 111
                >
                > Brochure is used in 2 downstream applications - a portal which sits on top of a content management application (the Portal) and a Digital Asset Management System (the DAM)
                >
                > Both systems store the GUID with the preferred term
                >
                > The DAM system has the ability to retag content. If a term changes, the system can update the metadata. A web service provides near real time tagging values and reference data for forms from the taxonomy management system.
                > The Portal does not have the capability to retag content. Instead, an XML feed each night from the taxonomy management system. This provides a mapping of preferred terms to GUID and tagging values and reference data for forms.
                >
                > Search applications use a synonym ring to perform term expansion so that in either case, entering Brochure or Pamphlet will return appropriate documents within each system
                >
                > On July 15th, the marketing group decides to start calling these materials "Pamphlet"
                > Brochure is now designated as a synonym to Pamphlet
                > The GUID in the taxonomy management system for Pamphlet is still 111
                > Brochure is given a new GUID of 222
                > Pamphlet = 111
                > Brochure = 222
                >
                > The DAM retags content with the new preferred term of Pamphlet and 111. No content contains the old term of Brochure. New content is tagged with the new preferred term and old content is retagged with the new preferred term. The user interface will display pamphlet in faceted (attribute based) search.
                >
                > In the Portal, when terms change, content is not retagged. Drop downs are updated via an XML feed. The new term is associated with the original preferred term GUID.
                >
                > Since attribute based navigation is displays the new preferred term, the GUID is used to tell the system what term to display in the UI. Therefore if two pieces of content are tagged with different terms but the same GUID, the UI will show the new preferred term as the navigational node.
                >
                > Some content will contain "Brochure" and GUID 111 and others will have "Pamphlet" and GUID 111 but the UI will display "Pamphlet".
                >
                > The bottom line is that systems can handle term changes in different ways based on their architecture, functionality and UI
                >
                > Seth
                >
                > Seth Earley
                > CEO
                > _____________________________
                > EARLEY & ASSOCIATES, Inc.
                > Cell: 781-820-8080
                > Email: seth@...<mailto:seth@...>
                > Web: www.earley.com<http://www.earley.com>
                >
                > Follow me on twitter: sethearley
                > Connect with me on LinkedIn: www.linkedin.com/in/sethearley
                >
                > From: TaxoCoP@yahoogroups.com [mailto:TaxoCoP@yahoogroups.com] On Behalf Of Clint
                > Sent: Tuesday, July 20, 2010 8:21 AM
                > To: TaxoCoP@yahoogroups.com
                > Subject: [TaxoCoP] Re: Use of GUID's instead of actual terms for tagging content
                >
                >
                >
                >
                > For those that haven't worked with GUIDs before, the Microsoft Product key is a good example of a GUID.
                >
                > I have been involved with taxo systems in the past that used GUIDs. They are extremely useful and more importantly, powerful. A user can change the hierarchical name of node without any back end work or negative consequences because the GUID is static. A user could change the name of the attribute, but feeds will not be compromised because the GUID is still the same. The Parent-Child relationship is between the GUIDs not the terms. When communicating with DBAs there is no confusion as too what you were referring to because of a 64-bit GUID. They help to minimize the errors of minimum wage data monkeys who don't know how to spell.
                >
                > In the world of ecommerce, static taxonomies are much closer to myth than reality. GUIDs help stabilize these taxonomies and are priceless.
                >
                > <i>"the term and the term id is stored which kind of defeats the purpose of having a term id instead of the term"</i> Depending on how it is called by the db having the name stored with the GUID defeats nothing. In fact unless it is an unbreakable link between the two it hurts nothing at all.
                >
                > I didn't read an actual question in your post so I am unsure if I answered it. Please let me know if there is something else that I can share
                >
                > Clint Elmore
                > Data Governance Czar and Taxonomist
                > Sears Holdings
                >
                > --- In TaxoCoP@yahoogroups.com<mailto:TaxoCoP%40yahoogroups.com>, Seth Earley <seth@> wrote:
                > >
                > > I am posting this at the request of a client interested in how other organizations are addressing this issue.
                > >
                > > We have a client dealing with deployment of a taxonomy management tool where their downstream systems are consuming not the taxonomy term itself, but a global unique identifier that represents the term.
                > >
                > > The advantage of this is that one can change a taxonomy term and not have to retag content. It is also easier to handle translations of the term.
                > >
                > > There are a number of nuances to the use cases for changing terms but one issue is how consuming systems handle term id and the actual term.
                > >
                > > In some systems, the term and the term id is stored which kind of defeats the purpose of having a term id instead of the term. If only the term id is stored, then the relationship to the term needs to be cached and refreshed when the term changes but the concept, represented by the term id, is the same.
                > >
                > > In some unusual cases, when a document designated as a legal record is tagged, there is a problem with this scenario (since metadata is supposed to be locked down with the legal record).
                > >
                > > I'd be interested in hearing how others have addressed these areas.
                > >
                > > Seth
                > >
                > > Seth Earley
                > > CEO
                > > _____________________________
                > > EARLEY & ASSOCIATES, Inc.
                > > Cell:
                > > Email: seth@<mailto:seth@>
                > > Web: www.earley.com<http://www.earley.com>
                > >
                > > Follow me on twitter: sethearley
                > > Connect with me on LinkedIn: www.linkedin.com/in/sethearley
                > >
                >
              • Seth Earley
                Hi John, (Thanks for sending this off line last night - as I mentioned, I am OK with some debate on this. ) I appreciate your feedback on that. I can see from
                Message 7 of 15 , Jul 27 4:57 AM
                • 0 Attachment

                  Hi John,

                   

                  (Thanks for sending this off line last night – as I mentioned, I am OK with some debate on this. )

                   

                  I appreciate your feedback on that. I can see from a data integrity perspective an architect might not like that answer. 

                   

                  I was relaying an actual situation with a customer.  The challenge is that the downstream system does not retag.  So if that is a given, then what else can be done?  I outlined the current approach, which, until content can be retagged or the system re-architected, is the best scenario. 

                   

                  Here is some additional research in the area of taxonomy maintenance:

                   

                         One of the most difficult aspects is deciding whether a new term represents a new concept, or is a synonym for something already there

                         Deloitte & Touche defines a “perpetual taxonomy” as a taxonomy in which each concept retains its identity and meaning in perpetuity:

                  §  “…the process of “copying” an element from one namespace to another creates an entirely new element, even if the element names are identical…, the direct effect of which is to introduce taxonomy/concept version-management challenges that could impair the development, maintenance, and comparability of [concepts]”

                  §  "A perpetual taxonomy... would not differentiate between “core” and “deprecated” concepts because all concepts would exist in perpetuity. If the intended purpose of deprecating a concept is to remove it from the “current” schema when it is superseded, then the perpetual taxonomy approach would be preferable because it accommodates past (superseded), present (effective), and future (issued but not fully effective) concepts simultaneously and unambiguously within a discoverable taxonomy set."

                         It is a Best Practice to tag with a GUID rather than explicit terms where practical to avoid re-tagging and re-indexing penalties

                         When content has been tagged with concepts, the important distinction is whether old term tags should be retained or the content updated with the new preferred term – which does the content actually represent?

                   

                  No concept in SNOMED CT ever disappears: once created, 'it's immortal'. If SNOMED CT had started in the 16th century, the concept of malaria as 'ague' would still be there, but deprecated for current use. Over time, concepts may move from one hierarchy to another, or new hierarchies may be introduced. Genomics, for example, will cause many changes.

                   

                  The concept in a document is represented by the id.  ‘Agu’ would be 5678 back in the 16th century and when replaced by ‘malaria’ in (who knows when), it would be id 5678.  Agu would stay in the taxonomy and be given a new id.  But if one of those old rickety content management applications from the 16th century was still around, and no one got around to updating it with the latest version of Vignette, there may be some documents that would contain term id 5678 and term label “agu”.  Ye olde CMS user interface would have to deal with the disparity in terminology.  That certainly may affect performance.  Would someone have decided to retag if it became a problem?  Perhaps.  I think in the current situation , that has not been an issue.

                   

                  I think it is a matter of practicality.  There will be heterogeneity of systems and processes.  In some cases, it is not possible to go back and have everything retagged perfectly.    At one client, there were something on the order of 200 million documents. A migration and retagging effort went on for weeks of the system running on a continuous basis. 

                   

                  The content world is messy.  With all l the poorly tagged or untagged content out there, I think the scenario I outlined represents a good compromise between what is possible and what is practical. 

                   

                  This is what is being done at an actual client implementation as far as I could interpret the details (there were some my contact was not completely clear about so it is possible I misinterpreted something or relayed what they knew that did not explain the system completely)  My point was that this was one way to handle the issue given that there was a conflict between current id and current concept.  I would not suggest architecting a system this way, but it is not an unreasonable approach.  Your argument about performance is a reasonable one.

                   

                  The original issue was tagging with both the id and the term.  The challenge was, if content is retagged in one system, and not retagged in another, what do you do?  If starting from scratch, I would not suggest that a system be architected this way.  However this is not a perfect world when it comes to metadata and tagging.  This is at least better than no mapping of terminology and no retagging.   

                   

                  Thanks all for the contributions.  I still need to get through then so apologies if I have not responded to all the thoughtful responses.

                   

                  Seth

                   

                  Seth Earley

                  CEO
                  _____________________________

                  EARLEY & ASSOCIATES, Inc.
                  Cell: 781-820-8080

                  Email: seth@...  
                  Web: www.earley.com

                   

                  Follow me on twitter: sethearley

                  Connect with me on  LinkedIn: www.linkedin.com/in/sethearley

                   

                  From: TaxoCoP@yahoogroups.com [mailto:TaxoCoP@yahoogroups.com] On Behalf Of laptopjockey
                  Sent: Monday, July 26, 2010 10:12 AM
                  To: TaxoCoP@yahoogroups.com
                  Subject: [TaxoCoP] Re: Use of GUID's instead of actual terms for tagging content

                   

                   

                  Hi Seth;

                  With all due respect, the process you outline here is not sustainable in a widely distributed, heavily used, multi-language environment. The relationship between the number and the term must be persistent regardless whether they are paired or not. Brochure has one permanent number and pamphlet has another...flipping the term associated with one to the other would be like changing the primary key in a database.

                  I think you've mis-stepped on this one.

                  --- In TaxoCoP@yahoogroups.com, Seth Earley <seth@...> wrote:

                  >
                  > Hi Clint,
                  >
                  > Yes, I realize I did not actually ask a question. :)
                  >
                  > (Thanks everyone for all of the contributions to this post)
                  >
                  > This was really to throw the issue out there and get people's input on how
                  this is put into practice and stimulate some conversation.
                  >
                  > I had some additional discussions (amazing how much debate internally at
                  our organization and at the client this has stimulated) and the nuance here is "Depending on how it is called by the db having the name stored with the GUID defeats nothing. In fact unless it is an unbreakable link between the two it hurts nothing at all."
                  >
                  > The dependency is on how downstream systems handle changes.
                  >
                  > The use case is as follows: (making up the details here but the concept is
                  valid)
                  >
                  > As of July 1, 2010, the preferred term for materials that explain the
                  offerings of the organization is "Brochure"
                  > The GUID in the taxonomy management system is 111
                  > Brochure = 111
                  >
                  > Brochure is used in 2 downstream applications - a portal which sits on top
                  of a content management application (the Portal) and a Digital Asset Management System (the DAM)
                  >
                  > Both systems store the GUID with the preferred term
                  >
                  > The DAM system has the ability to retag content. If a term changes, the
                  system can update the metadata. A web service provides near real time tagging values and reference data for forms from the taxonomy management system.
                  > The Portal does not have the capability to retag content. Instead, an XML
                  feed each night from the taxonomy management system. This provides a mapping of preferred terms to GUID and tagging values and reference data for forms.
                  >
                  > Search applications use a synonym ring to perform term expansion so that
                  in either case, entering Brochure or Pamphlet will return appropriate documents within each system
                  >
                  > On July 15th, the marketing group decides to start calling these materials
                  "Pamphlet"
                  > Brochure is now designated as a synonym to Pamphlet
                  > The GUID in the taxonomy management system for Pamphlet is still 111
                  > Brochure is given a new GUID of 222
                  > Pamphlet = 111
                  > Brochure = 222
                  >
                  > The DAM retags content with the new preferred term of Pamphlet and 111. No
                  content contains the old term of Brochure. New content is tagged with the new preferred term and old content is retagged with the new preferred term. The user interface will display pamphlet in faceted (attribute based) search.
                  >
                  > In the Portal, when terms change, content is not retagged. Drop downs are
                  updated via an XML feed. The new term is associated with the original preferred term GUID.
                  >
                  > Since attribute based navigation is displays the new preferred term, the
                  GUID is used to tell the system what term to display in the UI. Therefore if two pieces of content are tagged with different terms but the same GUID, the UI will show the new preferred term as the navigational node.
                  >
                  > Some content will contain "Brochure" and GUID 111 and others
                  will have "Pamphlet" and GUID 111 but the UI will display "Pamphlet".
                  >
                  > The bottom line is that systems can handle term changes in different ways
                  based on their architecture, functionality and UI
                  >
                  > Seth
                  >
                  > Seth Earley
                  > CEO
                  > _____________________________
                  > EARLEY & ASSOCIATES, Inc.
                  > Cell: 781-820-8080
                  > Email: seth@...<mailto:seth@...>
                  > Web: www.earley.com<http://www.earley.com>
                  >
                  > Follow me on twitter: sethearley
                  > Connect with me on LinkedIn: www.linkedin.com/in/sethearley
                  >
                  > From: TaxoCoP@yahoogroups.com
                  [mailto:TaxoCoP@yahoogroups.com] On Behalf Of Clint
                  > Sent: Tuesday, July 20, 2010 8:21 AM
                  > To: TaxoCoP@yahoogroups.com
                  > Subject: [TaxoCoP] Re: Use of GUID's instead of actual terms for tagging
                  content
                  >
                  >
                  >
                  >
                  > For those that haven't worked with GUIDs before, the Microsoft Product key
                  is a good example of a GUID.
                  >
                  > I have been involved with taxo systems in the past that used GUIDs. They
                  are extremely useful and more importantly, powerful. A user can change the hierarchical name of node without any back end work or negative consequences because the GUID is static. A user could change the name of the attribute, but feeds will not be compromised because the GUID is still the same. The Parent-Child relationship is between the GUIDs not the terms. When communicating with DBAs there is no confusion as too what you were referring to because of a 64-bit GUID. They help to minimize the errors of minimum wage data monkeys who don't know how to spell.
                  >
                  > In the world of ecommerce, static taxonomies are much closer to myth than
                  reality. GUIDs help stabilize these taxonomies and are priceless.
                  >
                  > <i>"the term and the term id is stored which kind of defeats
                  the purpose of having a term id instead of the term"</i> Depending on how it is called by the db having the name stored with the GUID defeats nothing. In fact unless it is an unbreakable link between the two it hurts nothing at all.
                  >
                  > I didn't read an actual question in your post so I am unsure if I answered
                  it. Please let me know if there is something else that I can share
                  >
                  > Clint Elmore
                  > Data Governance Czar and Taxonomist
                  > Sears Holdings
                  >
                  > --- In TaxoCoP@yahoogroups.com<mailto:TaxoCoP%40yahoogroups.com>,
                  Seth Earley <seth@> wrote:
                  > >
                  > > I am posting this at the request of a client interested in how other
                  organizations are addressing this issue.
                  > >
                  > > We have a client dealing with deployment of a taxonomy management
                  tool where their downstream systems are consuming not the taxonomy term itself, but a global unique identifier that represents the term.
                  > >
                  > > The advantage of this is that one can change a taxonomy term and not
                  have to retag content. It is also easier to handle translations of the term.
                  > >
                  > > There are a number of nuances to the use cases for changing terms but
                  one issue is how consuming systems handle term id and the actual term.
                  > >
                  > > In some systems, the term and the term id is stored which kind of
                  defeats the purpose of having a term id instead of the term. If only the term id is stored, then the relationship to the term needs to be cached and refreshed when the term changes but the concept, represented by the term id, is the same.
                  > >
                  > > In some unusual cases, when a document designated as a legal record
                  is tagged, there is a problem with this scenario (since metadata is supposed to be locked down with the legal record).
                  > >
                  > > I'd be interested in hearing how others have addressed these areas.
                  > >
                  > > Seth
                  > >
                  > > Seth Earley
                  > > CEO
                  > > _____________________________
                  > > EARLEY & ASSOCIATES, Inc.
                  > > Cell:
                  > > Email: seth@<mailto:seth@>
                  > > Web: www.earley.com<http://www.earley.com>
                  > >
                  > > Follow me on twitter: sethearley
                  > > Connect with me on LinkedIn: www.linkedin.com/in/sethearley
                  > >
                  >

                • laptopjockey
                  Hi Seth; There are some fairly major issues here, the bulk of which highlight the on-going disconnect between what is required to make applications work and
                  Message 8 of 15 , Jul 27 7:55 AM
                  • 0 Attachment
                    Hi Seth;

                    There are some fairly major issues here, the bulk of which highlight the on-going disconnect between what is required to make applications work and why following those requirements make it nearly impossible to 'humanize' information in a reasonable way across those same applications.

                    The common sticking point is that a computer cannot disambiguate two identical terms. You can design and build the most elegant taxonomy on the planet, but if you don't have unique names and/or terms you're hooped. Enter the identifier. Now you can say 111 is different than 222 even if the terms are identical: 'bat' the noun is different than 'bat' the verb; 'bat' the noun.animal is different than 'bat' the noun.sportsequipment. Stupid applications (like indexes) however, tend to see 'bat' all the same way, which brings up the second sticking point.

                    There are two challenges being talked about at the same time here: the nature and status of the term, and the nature and status of the relationship between the tag and a piece of content. I can deprecate a term like 'agu' or I can deprecate the relationship between the tag and the content to which it is associated. If I make 'malaria' an equivalent of 'agu' I shouldn't need to retag anything. A search for the term 'malaria' would pick up that term and 'agu' redardless. Managing these two challenges separately means that I can keep (if I want to) the term 'agu' current in the taxonomy but deprecated in the content.

                    Someone brought up the legal implications of changing tags and tag ids and I think that is probably one of the more compelling arguments for reconsidering your approach. In accounting, there are things called journal entries to make adjustments to existing assignment of costs and revenues. In my opinion, assignment of tags should work the same way.






                    --- In TaxoCoP@yahoogroups.com, Seth Earley <seth@...> wrote:
                    >
                    > Hi John,
                    >
                    > (Thanks for sending this off line last night - as I mentioned, I am OK with some debate on this. )
                    >
                    > I appreciate your feedback on that. I can see from a data integrity perspective an architect might not like that answer.
                    >
                    > I was relaying an actual situation with a customer. The challenge is that the downstream system does not retag. So if that is a given, then what else can be done? I outlined the current approach, which, until content can be retagged or the system re-architected, is the best scenario.
                    >
                    > Here is some additional research in the area of taxonomy maintenance:
                    >
                    > * One of the most difficult aspects is deciding whether a new term represents a new concept, or is a synonym for something already there
                    > * Deloitte & Touche defines a "perpetual taxonomy" as a taxonomy in which each concept retains its identity and meaning in perpetuity:
                    > § "...the process of "copying" an element from one namespace to another creates an entirely new element, even if the element names are identical..., the direct effect of which is to introduce taxonomy/concept version-management challenges that could impair the development, maintenance, and comparability of [concepts]"
                    > § "A perpetual taxonomy... would not differentiate between "core" and "deprecated" concepts because all concepts would exist in perpetuity. If the intended purpose of deprecating a concept is to remove it from the "current" schema when it is superseded, then the perpetual taxonomy approach would be preferable because it accommodates past (superseded), present (effective), and future (issued but not fully effective) concepts simultaneously and unambiguously within a discoverable taxonomy set."
                    > * It is a Best Practice to tag with a GUID rather than explicit terms where practical to avoid re-tagging and re-indexing penalties
                    > * When content has been tagged with concepts, the important distinction is whether old term tags should be retained or the content updated with the new preferred term - which does the content actually represent?
                    >
                    > No concept in SNOMED CT ever disappears: once created, 'it's immortal'. If SNOMED CT had started in the 16th century, the concept of malaria as 'ague' would still be there, but deprecated for current use. Over time, concepts may move from one hierarchy to another, or new hierarchies may be introduced. Genomics, for example, will cause many changes.
                    >
                    > The concept in a document is represented by the id. 'Agu' would be 5678 back in the 16th century and when replaced by 'malaria' in (who knows when), it would be id 5678. Agu would stay in the taxonomy and be given a new id. But if one of those old rickety content management applications from the 16th century was still around, and no one got around to updating it with the latest version of Vignette, there may be some documents that would contain term id 5678 and term label "agu". Ye olde CMS user interface would have to deal with the disparity in terminology. That certainly may affect performance. Would someone have decided to retag if it became a problem? Perhaps. I think in the current situation , that has not been an issue.
                    >
                    > I think it is a matter of practicality. There will be heterogeneity of systems and processes. In some cases, it is not possible to go back and have everything retagged perfectly. At one client, there were something on the order of 200 million documents. A migration and retagging effort went on for weeks of the system running on a continuous basis.
                    >
                    > The content world is messy. With all l the poorly tagged or untagged content out there, I think the scenario I outlined represents a good compromise between what is possible and what is practical.
                    >
                    > This is what is being done at an actual client implementation as far as I could interpret the details (there were some my contact was not completely clear about so it is possible I misinterpreted something or relayed what they knew that did not explain the system completely) My point was that this was one way to handle the issue given that there was a conflict between current id and current concept. I would not suggest architecting a system this way, but it is not an unreasonable approach. Your argument about performance is a reasonable one.
                    >
                    > The original issue was tagging with both the id and the term. The challenge was, if content is retagged in one system, and not retagged in another, what do you do? If starting from scratch, I would not suggest that a system be architected this way. However this is not a perfect world when it comes to metadata and tagging. This is at least better than no mapping of terminology and no retagging.
                    >
                    > Thanks all for the contributions. I still need to get through then so apologies if I have not responded to all the thoughtful responses.
                    >
                    > Seth
                    >
                    > Seth Earley
                    > CEO
                    > _____________________________
                    > EARLEY & ASSOCIATES, Inc.
                    > Cell: 781-820-8080
                    > Email: seth@...<mailto:seth@...>
                    > Web: www.earley.com<http://www.earley.com>
                    >
                    > Follow me on twitter: sethearley
                    > Connect with me on LinkedIn: www.linkedin.com/in/sethearley
                    >
                    > From: TaxoCoP@yahoogroups.com [mailto:TaxoCoP@yahoogroups.com] On Behalf Of laptopjockey
                    > Sent: Monday, July 26, 2010 10:12 AM
                    > To: TaxoCoP@yahoogroups.com
                    > Subject: [TaxoCoP] Re: Use of GUID's instead of actual terms for tagging content
                    >
                    >
                    >
                    > Hi Seth;
                    >
                    > With all due respect, the process you outline here is not sustainable in a widely distributed, heavily used, multi-language environment. The relationship between the number and the term must be persistent regardless whether they are paired or not. Brochure has one permanent number and pamphlet has another...flipping the term associated with one to the other would be like changing the primary key in a database.
                    >
                    > I think you've mis-stepped on this one.
                    >
                    > --- In TaxoCoP@yahoogroups.com<mailto:TaxoCoP%40yahoogroups.com>, Seth Earley <seth@> wrote:
                    > >
                    > > Hi Clint,
                    > >
                    > > Yes, I realize I did not actually ask a question. :)
                    > >
                    > > (Thanks everyone for all of the contributions to this post)
                    > >
                    > > This was really to throw the issue out there and get people's input on how this is put into practice and stimulate some conversation.
                    > >
                    > > I had some additional discussions (amazing how much debate internally at our organization and at the client this has stimulated) and the nuance here is "Depending on how it is called by the db having the name stored with the GUID defeats nothing. In fact unless it is an unbreakable link between the two it hurts nothing at all."
                    > >
                    > > The dependency is on how downstream systems handle changes.
                    > >
                    > > The use case is as follows: (making up the details here but the concept is valid)
                    > >
                    > > As of July 1, 2010, the preferred term for materials that explain the offerings of the organization is "Brochure"
                    > > The GUID in the taxonomy management system is 111
                    > > Brochure = 111
                    > >
                    > > Brochure is used in 2 downstream applications - a portal which sits on top of a content management application (the Portal) and a Digital Asset Management System (the DAM)
                    > >
                    > > Both systems store the GUID with the preferred term
                    > >
                    > > The DAM system has the ability to retag content. If a term changes, the system can update the metadata. A web service provides near real time tagging values and reference data for forms from the taxonomy management system.
                    > > The Portal does not have the capability to retag content. Instead, an XML feed each night from the taxonomy management system. This provides a mapping of preferred terms to GUID and tagging values and reference data for forms.
                    > >
                    > > Search applications use a synonym ring to perform term expansion so that in either case, entering Brochure or Pamphlet will return appropriate documents within each system
                    > >
                    > > On July 15th, the marketing group decides to start calling these materials "Pamphlet"
                    > > Brochure is now designated as a synonym to Pamphlet
                    > > The GUID in the taxonomy management system for Pamphlet is still 111
                    > > Brochure is given a new GUID of 222
                    > > Pamphlet = 111
                    > > Brochure = 222
                    > >
                    > > The DAM retags content with the new preferred term of Pamphlet and 111. No content contains the old term of Brochure. New content is tagged with the new preferred term and old content is retagged with the new preferred term. The user interface will display pamphlet in faceted (attribute based) search.
                    > >
                    > > In the Portal, when terms change, content is not retagged. Drop downs are updated via an XML feed. The new term is associated with the original preferred term GUID.
                    > >
                    > > Since attribute based navigation is displays the new preferred term, the GUID is used to tell the system what term to display in the UI. Therefore if two pieces of content are tagged with different terms but the same GUID, the UI will show the new preferred term as the navigational node.
                    > >
                    > > Some content will contain "Brochure" and GUID 111 and others will have "Pamphlet" and GUID 111 but the UI will display "Pamphlet".
                    > >
                    > > The bottom line is that systems can handle term changes in different ways based on their architecture, functionality and UI
                    > >
                    > > Seth
                    > >
                    > > Seth Earley
                    > > CEO
                    > > _____________________________
                    > > EARLEY & ASSOCIATES, Inc.
                    > > Cell: 781-820-8080
                    > > Email: seth@<mailto:seth@>
                    > > Web: www.earley.com<http://www.earley.com>
                    > >
                    > > Follow me on twitter: sethearley
                    > > Connect with me on LinkedIn: www.linkedin.com/in/sethearley
                    > >
                    > > From: TaxoCoP@yahoogroups.com<mailto:TaxoCoP%40yahoogroups.com> [mailto:TaxoCoP@yahoogroups.com<mailto:TaxoCoP%40yahoogroups.com>] On Behalf Of Clint
                    > > Sent: Tuesday, July 20, 2010 8:21 AM
                    > > To: TaxoCoP@yahoogroups.com<mailto:TaxoCoP%40yahoogroups.com>
                    > > Subject: [TaxoCoP] Re: Use of GUID's instead of actual terms for tagging content
                    > >
                    > >
                    > >
                    > >
                    > > For those that haven't worked with GUIDs before, the Microsoft Product key is a good example of a GUID.
                    > >
                    > > I have been involved with taxo systems in the past that used GUIDs. They are extremely useful and more importantly, powerful. A user can change the hierarchical name of node without any back end work or negative consequences because the GUID is static. A user could change the name of the attribute, but feeds will not be compromised because the GUID is still the same. The Parent-Child relationship is between the GUIDs not the terms. When communicating with DBAs there is no confusion as too what you were referring to because of a 64-bit GUID. They help to minimize the errors of minimum wage data monkeys who don't know how to spell.
                    > >
                    > > In the world of ecommerce, static taxonomies are much closer to myth than reality. GUIDs help stabilize these taxonomies and are priceless.
                    > >
                    > > <i>"the term and the term id is stored which kind of defeats the purpose of having a term id instead of the term"</i> Depending on how it is called by the db having the name stored with the GUID defeats nothing. In fact unless it is an unbreakable link between the two it hurts nothing at all.
                    > >
                    > > I didn't read an actual question in your post so I am unsure if I answered it. Please let me know if there is something else that I can share
                    > >
                    > > Clint Elmore
                    > > Data Governance Czar and Taxonomist
                    > > Sears Holdings
                    > >
                    > > --- In TaxoCoP@yahoogroups.com<mailto:TaxoCoP%40yahoogroups.com><mailto:TaxoCoP%40yahoogroups.com>, Seth Earley <seth@> wrote:
                    > > >
                    > > > I am posting this at the request of a client interested in how other organizations are addressing this issue.
                    > > >
                    > > > We have a client dealing with deployment of a taxonomy management tool where their downstream systems are consuming not the taxonomy term itself, but a global unique identifier that represents the term.
                    > > >
                    > > > The advantage of this is that one can change a taxonomy term and not have to retag content. It is also easier to handle translations of the term.
                    > > >
                    > > > There are a number of nuances to the use cases for changing terms but one issue is how consuming systems handle term id and the actual term.
                    > > >
                    > > > In some systems, the term and the term id is stored which kind of defeats the purpose of having a term id instead of the term. If only the term id is stored, then the relationship to the term needs to be cached and refreshed when the term changes but the concept, represented by the term id, is the same.
                    > > >
                    > > > In some unusual cases, when a document designated as a legal record is tagged, there is a problem with this scenario (since metadata is supposed to be locked down with the legal record).
                    > > >
                    > > > I'd be interested in hearing how others have addressed these areas.
                    > > >
                    > > > Seth
                    > > >
                    > > > Seth Earley
                    > > > CEO
                    > > > _____________________________
                    > > > EARLEY & ASSOCIATES, Inc.
                    > > > Cell:
                    > > > Email: seth@<mailto:seth@>
                    > > > Web: www.earley.com<http://www.earley.com>
                    > > >
                    > > > Follow me on twitter: sethearley
                    > > > Connect with me on LinkedIn: www.linkedin.com/in/sethearley
                    > > >
                    > >
                    >
                  • Donna M. Fritzsche
                    Hi folks, My apologies for this email being sent out twice. (Not sure why it showed up so late - I had resent it after it didn t show up on the original send
                    Message 9 of 15 , Jul 27 8:41 AM
                    • 0 Attachment
                      Hi folks,
                      My apologies for this email being sent out twice.
                       (Not sure why it showed up so late - I had resent it after it didn't show up on the original send date).
                      - Donna


                      -----Original Message-----
                      From: "Donna M. Fritzsche"
                      Sent: Jul 20, 2010 12:37 PM
                      To: TaxoCoP@yahoogroups.com, TaxoCoP@yahoogroups.com
                      Subject: Re: [TaxoCoP] Re: Use of GUID's instead of actual terms for tagging content

                       

                      Hi Seth and Clint,

                      Additional points to consider:
                      1) How are homonyms handled? GUIDs might help in this case
                      2) Will there ever be a need to create crosswalks - in this case the use of GUIDs might make it easier for a game of "telephone" to be started.
                      3) Will there be one authoritative store/mapping that all  databases will access?  - highly recommended
                      4) Debugging/troublesh ooting downstream systems - this is easier if the actual terms are used and not GUIDs
                      5) Efficiency of search, sorting, indexing, etc - there may be computational reasons for using integers/floats of over strings, etc.
                      6) Storing the id and the term may not be a bad idea in some cases, since the IDs may make for a more efficient computation (see above) and memory is generally considered inexpensive)

                      Hope this is helpful,
                      Donna Fritzsche




                      -----Original Message-----
                      From: Clint
                      Sent: Jul 20, 2010 9:20 AM
                      To: TaxoCoP@yahoogroups .com
                      Subject: [TaxoCoP] Re: Use of GUID's instead of actual terms for tagging content

                       



                      For those that haven't worked with GUIDs before, the Microsoft Product key is a good example of a GUID.

                      I have been involved with taxo systems in the past that used GUIDs. They are extremely useful and more importantly, powerful. A user can change the hierarchical name of node without any back end work or negative consequences because the GUID is static. A user could change the name of the attribute, but feeds will not be compromised because the GUID is still the same. The Parent-Child relationship is between the GUIDs not the terms. When communicating with DBAs there is no confusion as too what you were referring to because of a 64-bit GUID. They help to minimize the errors of minimum wage data monkeys who don't know how to spell.

                      In the world of ecommerce, static taxonomies are much closer to myth than reality. GUIDs help stabilize these taxonomies and are priceless.

                      <i>"the term and the term id is stored which kind of defeats the purpose of having a term id instead of the term"</i> Depending on how it is called by the db having the name stored with the GUID defeats nothing. In fact unless it is an unbreakable link between the two it hurts nothing at all.

                      I didn't read an actual question in your post so I am unsure if I answered it. Please let me know if there is something else that I can share

                      Clint Elmore
                      Data Governance Czar and Taxonomist
                      Sears Holdings

                      --- In TaxoCoP@yahoogroups .com, Seth Earley <seth@...> wrote:
                      >
                      > I am posting this at the request of a client interested in how other organizations are addressing this issue.
                      >
                      > We have a client dealing with deployment of a taxonomy management tool where their downstream systems are consuming not the taxonomy term itself, but a global unique identifier that represents the term.
                      >
                      > The advantage of this is that one can change a taxonomy term and not have to retag content. It is also easier to handle translations of the term.
                      >
                      > There are a number of nuances to the use cases for changing terms but one issue is how consuming systems handle term id and the actual term.
                      >
                      > In some systems, the term and the term id is stored which kind of defeats the purpose of having a term id instead of the term. If only the term id is stored, then the relationship to the term needs to be cached and refreshed when the term changes but the concept, represented by the term id, is the same.
                      >
                      > In some unusual cases, when a document designated as a legal record is tagged, there is a problem with this scenario (since metadata is supposed to be locked down with the legal record).
                      >
                      > I'd be interested in hearing how others have addressed these areas.
                      >
                      > Seth
                      >
                      > Seth Earley
                      > CEO
                      > ____________ _________ ________
                      > EARLEY & ASSOCIATES, Inc.
                      > Cell:
                      > Email: seth@...<mailto:seth@ ...>
                      > Web: www.earley.com<http://www.earley. com>
                      >
                      > Follow me on twitter: sethearley
                      > Connect with me on LinkedIn: www.linkedin. com/in/sethearle y
                      >

                    Your message has been successfully submitted and would be delivered to recipients shortly.