Loading ...
Sorry, an error occurred while loading the content.

Re: [TaxoCoP] tagging done by content authors vs professional indexers

Expand Messages
  • Christine Connors
    Hello! I m certain I ve collected appropriate articles over the years, but my hard drive crashed last September and I fear they re all on back up CD-ROMs and
    Message 1 of 24 , Mar 26, 2009
    • 0 Attachment
      Hello!

      I'm certain I've collected appropriate articles over the years, but my hard drive crashed last September and I fear they're all on back up CD-ROMs and not currently indexed by me. They keywords coming to mind are "author generated metadata."

      One interesting article that I was able to find online can be found at http://journals.tdl.org/jodi/article/view/42/45.

      Author-generated Dublin Core Metadata for Web Resources: A Baseline Study in an Organization
      Jane Greenberg, Maria Cristina Pattuelli, Bijan Parsia and W. Davenport Robertson

      It gives results of a study regarding author generated metadata and how it compared to the work of "metadata professionals" using Dublin Core.

      Hope you find it useful!
      Christine Connors
      Global Director, Semantic Technology Solutions
      Dow Jones


      From: nicoleelger <nicoleelger@...>
      To: TaxoCoP@yahoogroups.com
      Sent: Thursday, March 26, 2009 2:32:47 PM
      Subject: [TaxoCoP] tagging done by content authors vs professional indexers

      hi, does anyone know of articles/resources that compare the pros and cons of tagging (using a controlled vocab) done by content authors vs. professional indexers? the content needs tagging to support online display in a variety of formats.

      as someone with an MLIS (who, btw, isn't offering to do tagging because i have another role in this org), i can see lots of cons to having the authors tag, but would like a source other than myself for making this argument.

      the big counter points i'm hearing are "we can't afford it" and "it will take too long" if we do anything other than tagging by the content authors.

      thanks!


    • terrance1951
      I don t know of any studies about this, but I do have experience with tagging online articles by authors and dedicated indexers. We once had authors tag the
      Message 2 of 24 , Mar 27, 2009
      • 0 Attachment
        I don't know of any studies about this, but I do have experience with tagging online articles by authors and dedicated indexers. We once had authors tag the documents they wrote with search terms from a word list, and the results were poor to put it kindly.

        A significant problem was that not all authors were really interested in tagging articles. The result was that the quality of indexing varied widely. The quality was so poor that our clients complained about how difficult and inconsistent search results were using our tagged terms.

        We now index from a taxonomy I have built using trained indexers. The difference is dramatic. At least one client who was threatening to drop us because of poor search results is now very happy.

        Terry Smith

        --- In TaxoCoP@yahoogroups.com, "nicoleelger" <nicoleelger@...> wrote:
        >
        > hi, does anyone know of articles/resources that compare the pros and cons of tagging (using a controlled vocab) done by content authors vs. professional indexers? the content needs tagging to support online display in a variety of formats.
        >
        > as someone with an MLIS (who, btw, isn't offering to do tagging because i have another role in this org), i can see lots of cons to having the authors tag, but would like a source other than myself for making this argument.
        >
        > the big counter points i'm hearing are "we can't afford it" and "it will take too long" if we do anything other than tagging by the content authors.
        >
        > thanks!
        >
      • Ron Daniel
        I asked my friend, Chris Green, about his experiences with this issue at a major magazine publisher. That publisher will remain nameless, we will refer to it
        Message 3 of 24 , Mar 27, 2009
        • 0 Attachment

          I asked my friend, Chris Green, about his experiences with this issue at a major magazine publisher. That publisher will remain nameless, we will refer to it as XYZ Inc. Chris sent the following reply to me, which emphasizes some of Heather’s points and makes some new ones. He does not know of any sources to cite. He gave me permission to publish this to the group, so thanks for that Chris. [I’ve bcc’ed him on this message in order to keep his email address away from spammers.]

           

          Ron

          - - - - -


          About Tagging

          •  Is having some tagging better than having no tagging?  Yes, if it's well done, but poor tagging can be, in many cases, much worse than no tagging.  This cannot be overemphasized.

          •  Tagging should be done in such a way that it serves not only the immediate needs of online content delivery, but also the needs of long-term archiving and of multiple constituencies.

          •  If, for some reason, the situation demands that tagging must be done either locally, by writers and editors, or centrally by dedicated professionals, but not by both, I would advocate using the centralized professionals.


          Rationales for Resistance

          •  At XYZ Inc., there was consistent, strong resistance to the idea of tagging within the Editorial Departments.  These departments included writers, researchers, editors, and copy operations.  The resistance was universal among these groups, and it persisted for at least a decade across all of the titles.  

          While the Editorial groups arguably would have been the greatest beneficiaries of a well-organized, accessible, and well-tagged text archive, their arguments against having them act as taggers included:

          a) It was all they could do to get the print magazines out every week, and they had no available bandwidth for any additional work

          b) They had no knowledge, training, or experience at tagging, and they felt that tagging was an important function that should not be an afterthought left to people who were interested only in getting it over with as quickly as possible

          c) They were the frontline people who created the content that brought in the money and the readers, and that whatever happened or didn't happen to the content after it was published was someone else's problem

          d) While I don't think it ever got far enough for the Writer's Guild to get involved, I suspect the union would have raised serious objections to re-defining the roles of their members without additional training and compensation


          Competing Requirements

          •  In many organizations, there are competing requirements for the tagging of new content.  On the one hand, tagging must be done as quickly as possible, even at the  expense reduced quality, in order to publish the content online promptly and remain competitive.  This argues for local tagging within the editorial process by writers and editors.

          On the other hand, it is also important to tag the content in such a way that it is suitably described for long-term archiving and for use by a much wider range of groups than visitors to the website.

          Generally speaking, the need for speed wins, and the needs of long-term archiving are frequently  ignored as irrelevant.  I believe short-term and long-term tagging need not be mutually exclusive, and that ignoring or dismissing the importance of tagging for long-term archiving is misguided, and a good example of being penny wise and pound foolish.


          Support for Multiple Constituencies

          •  Not surprisingly, editors and writers, the most likely people to do tagging, typically have a very narrow point of view that is based on their own self-interest.  They would quite naturally tend to tag articles according to the way they thought they might search for them later.  This is the editorial equivalent of Steinberg's famous cartoon of a New Yorker's View of the World.

          One of the desirable traits that a trained librarian brings to the task of tagging is an awareness of both the short- and long-term needs of multiple constituencies, including not just the editorial groups but also groups such as web editors and users, the legal department, the publicity and promotion department, the ad sales department, the production groups, senior management, and business partners.  This broadly-useful tagging can dramatically increase the value of an archive and its ROI.

          The fact is that editors and writers are ill-equipped to address these needs effectively, and in most cases, they would be the first to admit it.


          Granularity and Consistency

          •  Tagging by hand, even by librarians, requires a relatively small taxonomy of subject terms if the tagging is to be accomplished in a reasonable amount of time, and, as always, Time is Money.

          Having said that, though, a taxonomy containing a maximum of 100-200 terms and suitable for manual tagging cannot adequately describe articles within corpuses such as those of [names of 2 major magazines deleted].  The granularity is simply too coarse, and the tagging cannot provide the degree of targeted searching that those who go to the trouble of doing this kind of tagging require and expect, especially when supporting multiple constituencies.

          The result sets are often too large to be useful, and the difference between a keyword (subject term) search and a plain old full-text search is frequently not significant enough to warrant the time and expense of tagging.

          Also, the time available to tag each article is relatively short, both for production scheduling reasons and for cost containment, and this typically results in only one or two subject tags per article, which is often far fewer than a more appropriate number.  This limited tagging only compounds the problem with overly coarse granularity.

          One of the biggest problems with manual tagging, even by dedicated professionals using a small taxonomy, is a lack of consistency.  Consistent tagging is absolutely crucial if one is to realize the potential value of subject term tagging.

          The degree of inconsistency created by a team of 8-10 dedicated professionals using a small taxonomy results in some problems with searching, but it will not undermine the value of the archive. 

          However, using many non-professional, non-dedicated taggers, such as editors and writers, is quite likely to add a level of inconsistency that could represent a serious liability.  This is especially true if the writers and editors are spread across multiple publications within the enterprise.


          Named Entity Tagging


          •  While the capturing of routine objective metadata such as issue date and publication name is relatively easy to automate, human judgment is typically required to add subjective tagging.

          While computer-assisted indexing (CAI) can offer significant benefits in this regard, these systems are quite resource-intensive in terms of initial cost, technology requirements, staffing, and maintenance.

          It is important to keep in mind that subjective tagging includes not only the application of subject terms, but also the identification of important named entities and article types. While a taxonomy of subject terms can be kept relatively flat and limited in size and scope, and a list of article types is usually short, tagging a variety of named entity types is a whole different, and much more complicated, problem.

          Named entity tagging is arguably the most important kind of tagging, and can give the biggest bang for the buck in terms of useful search results.  However, the effectiveness of named entity tagging relies almost exclusively on the consistency of the entity names.  This, in turn, requires the creation and maintenance of, and careful adherence to, a set of complex authority files and a detailed manual of best practices. Doing these things is not a simple proposition, to say the least.

          Even with well-designed and managed authority files, well-designed and managed maintenance procedures, and a clear, detailed set of best practices, I would argue that there is simply no way that writers and editors are going to acquire the skills, interest, and knowledge to do consistent, effective named entity tagging.  This requires a group of dedicated, knowledgeable professionals.

          I have seen first-hand what a lack of consistency in named entity tags can do to search results, and it is ugly.  Without the requisite management structures in place, I would suggest avoiding named entity tagging altogether, even though it has the promise of being extremely valuable, and not having it significantly reduces the usefulness of an archive.


          The Cost of Poor Tagging

          •  It's important to keep in mind that cleaning-up a badly-tagged archive is likely to be more time-consuming and expensive than simply starting over again from scratch, an approach which is difficult to sell because the keepers of the purse strings have been soured on tagging as a result of their bad experience with the first attempt at tagging.


          An Argument in Favor of Author/Editor Tagging

          •  The one reasonably compelling argument I can think of in favor of local tagging, as opposed to centralized tagging, is that the writer and the editor already know, in detail, what an article is about, and what named entities are important.  

          In contrast, given the typical time and cost constraints, centralized taggers must quickly skim each article and then tag it.  They may or may not have sufficient expertise in the subject domain to tag it appropriately, and so, not surprisingly, their tagging of any given article may be somewhat less than optimal.


          A Suggested Compromise

          There is really no reason why tagging must be an all-local or all-central effort.  It should be possible to take advantage of both groups’ strengths, minimize the effects of their weaknesses, get very good to excellent tagging quality, and still manage time and cost.  This would not be the fastest or cheapest solution, nor the longest and most expensive.

          I suggest that having the local domain experts, such as writers and editors, do an initial round of very basic subject and named entity tagging during the editorial process.  This would serve to help get the content online quickly, and would form the basis for a level of tagging suitable for long-term archiving and wider distribution.

          Subsequently, the central tagging group would be responsible for ensuring consistency, adherence to the authority files, supplemental subject term and named entity tagging, and ensuring accessibility over the long term to a wide audience.

          Neither group would bear the full burden of tagging, and the result would be better than if tagging were done by one group only.

          Hope this helps.

          Best,

          -Chris

        • Brandon Smith
          I think this is a very important read for anyone thinking about keywording their photographs or digital images. It s mainly addressing tagging word content,
          Message 4 of 24 , Mar 28, 2009
          • 0 Attachment
            I think this is a very important read for anyone thinking about keywording their photographs or digital images. It's mainly addressing tagging word content, but the main point is that good quality tagging is different skill from creating the image or content.

            An idea that jumped out at me is that perhaps the best way to index photographs would be to include a paragraph describing the photo in the metadata that could be searched through full text techniques.

            --- In TaxoCoP@yahoogroups.com, Ron Daniel <rdaniel@...> wrote:
            >
            > I asked my friend, Chris Green, about his experiences with this issue at a major magazine publisher. That publisher will remain nameless, we will refer to it as XYZ Inc. Chris sent the following reply to me, which emphasizes some of Heather's points and makes some new ones. He does not know of any sources to cite. He gave me permission to publish this to the group, so thanks for that Chris. [I've bcc'ed him on this message in order to keep his email address away from spammers.]
            >
            >
            >
            > Ron
            >
            > - - - - -
            >
            > About Tagging
            >
            > * Is having some tagging better than having no tagging? Yes, if it's well done, but poor tagging can be, in many cases, much worse than no tagging. This cannot be overemphasized.
            >
            > * Tagging should be done in such a way that it serves not only the immediate needs of online content delivery, but also the needs of long-term archiving and of multiple constituencies.
            >
            > * If, for some reason, the situation demands that tagging must be done either locally, by writers and editors, or centrally by dedicated professionals, but not by both, I would advocate using the centralized professionals.
            >
            >
            > Rationales for Resistance
            >
            > * At XYZ Inc., there was consistent, strong resistance to the idea of tagging within the Editorial Departments. These departments included writers, researchers, editors, and copy operations. The resistance was universal among these groups, and it persisted for at least a decade across all of the titles.
            >
            > While the Editorial groups arguably would have been the greatest beneficiaries of a well-organized, accessible, and well-tagged text archive, their arguments against having them act as taggers included:
            >
            > a) It was all they could do to get the print magazines out every week, and they had no available bandwidth for any additional work
            >
            > b) They had no knowledge, training, or experience at tagging, and they felt that tagging was an important function that should not be an afterthought left to people who were interested only in getting it over with as quickly as possible
            >
            > c) They were the frontline people who created the content that brought in the money and the readers, and that whatever happened or didn't happen to the content after it was published was someone else's problem
            >
            > d) While I don't think it ever got far enough for the Writer's Guild to get involved, I suspect the union would have raised serious objections to re-defining the roles of their members without additional training and compensation
            >
            >
            > Competing Requirements
            >
            > * In many organizations, there are competing requirements for the tagging of new content. On the one hand, tagging must be done as quickly as possible, even at the expense reduced quality, in order to publish the content online promptly and remain competitive. This argues for local tagging within the editorial process by writers and editors.
            >
            > On the other hand, it is also important to tag the content in such a way that it is suitably described for long-term archiving and for use by a much wider range of groups than visitors to the website.
            >
            > Generally speaking, the need for speed wins, and the needs of long-term archiving are frequently ignored as irrelevant. I believe short-term and long-term tagging need not be mutually exclusive, and that ignoring or dismissing the importance of tagging for long-term archiving is misguided, and a good example of being penny wise and pound foolish.
            >
            >
            > Support for Multiple Constituencies
            >
            > * Not surprisingly, editors and writers, the most likely people to do tagging, typically have a very narrow point of view that is based on their own self-interest. They would quite naturally tend to tag articles according to the way they thought they might search for them later. This is the editorial equivalent of Steinberg's famous cartoon of a New Yorker's View of the World.
            >
            > One of the desirable traits that a trained librarian brings to the task of tagging is an awareness of both the short- and long-term needs of multiple constituencies, including not just the editorial groups but also groups such as web editors and users, the legal department, the publicity and promotion department, the ad sales department, the production groups, senior management, and business partners. This broadly-useful tagging can dramatically increase the value of an archive and its ROI.
            >
            > The fact is that editors and writers are ill-equipped to address these needs effectively, and in most cases, they would be the first to admit it.
            >
            >
            > Granularity and Consistency
            >
            > * Tagging by hand, even by librarians, requires a relatively small taxonomy of subject terms if the tagging is to be accomplished in a reasonable amount of time, and, as always, Time is Money.
            >
            > Having said that, though, a taxonomy containing a maximum of 100-200 terms and suitable for manual tagging cannot adequately describe articles within corpuses such as those of [names of 2 major magazines deleted]. The granularity is simply too coarse, and the tagging cannot provide the degree of targeted searching that those who go to the trouble of doing this kind of tagging require and expect, especially when supporting multiple constituencies.
            >
            > The result sets are often too large to be useful, and the difference between a keyword (subject term) search and a plain old full-text search is frequently not significant enough to warrant the time and expense of tagging.
            >
            > Also, the time available to tag each article is relatively short, both for production scheduling reasons and for cost containment, and this typically results in only one or two subject tags per article, which is often far fewer than a more appropriate number. This limited tagging only compounds the problem with overly coarse granularity.
            >
            > One of the biggest problems with manual tagging, even by dedicated professionals using a small taxonomy, is a lack of consistency. Consistent tagging is absolutely crucial if one is to realize the potential value of subject term tagging.
            >
            > The degree of inconsistency created by a team of 8-10 dedicated professionals using a small taxonomy results in some problems with searching, but it will not undermine the value of the archive.
            >
            > However, using many non-professional, non-dedicated taggers, such as editors and writers, is quite likely to add a level of inconsistency that could represent a serious liability. This is especially true if the writers and editors are spread across multiple publications within the enterprise.
            >
            >
            > Named Entity Tagging
            >
            > * While the capturing of routine objective metadata such as issue date and publication name is relatively easy to automate, human judgment is typically required to add subjective tagging.
            >
            > While computer-assisted indexing (CAI) can offer significant benefits in this regard, these systems are quite resource-intensive in terms of initial cost, technology requirements, staffing, and maintenance.
            >
            > It is important to keep in mind that subjective tagging includes not only the application of subject terms, but also the identification of important named entities and article types. While a taxonomy of subject terms can be kept relatively flat and limited in size and scope, and a list of article types is usually short, tagging a variety of named entity types is a whole different, and much more complicated, problem.
            >
            > Named entity tagging is arguably the most important kind of tagging, and can give the biggest bang for the buck in terms of useful search results. However, the effectiveness of named entity tagging relies almost exclusively on the consistency of the entity names. This, in turn, requires the creation and maintenance of, and careful adherence to, a set of complex authority files and a detailed manual of best practices. Doing these things is not a simple proposition, to say the least.
            >
            > Even with well-designed and managed authority files, well-designed and managed maintenance procedures, and a clear, detailed set of best practices, I would argue that there is simply no way that writers and editors are going to acquire the skills, interest, and knowledge to do consistent, effective named entity tagging. This requires a group of dedicated, knowledgeable professionals.
            >
            > I have seen first-hand what a lack of consistency in named entity tags can do to search results, and it is ugly. Without the requisite management structures in place, I would suggest avoiding named entity tagging altogether, even though it has the promise of being extremely valuable, and not having it significantly reduces the usefulness of an archive.
            >
            >
            > The Cost of Poor Tagging
            >
            > * It's important to keep in mind that cleaning-up a badly-tagged archive is likely to be more time-consuming and expensive than simply starting over again from scratch, an approach which is difficult to sell because the keepers of the purse strings have been soured on tagging as a result of their bad experience with the first attempt at tagging.
            >
            >
            > An Argument in Favor of Author/Editor Tagging
            >
            > * The one reasonably compelling argument I can think of in favor of local tagging, as opposed to centralized tagging, is that the writer and the editor already know, in detail, what an article is about, and what named entities are important.
            >
            > In contrast, given the typical time and cost constraints, centralized taggers must quickly skim each article and then tag it. They may or may not have sufficient expertise in the subject domain to tag it appropriately, and so, not surprisingly, their tagging of any given article may be somewhat less than optimal.
            >
            >
            > A Suggested Compromise
            >
            > There is really no reason why tagging must be an all-local or all-central effort. It should be possible to take advantage of both groups' strengths, minimize the effects of their weaknesses, get very good to excellent tagging quality, and still manage time and cost. This would not be the fastest or cheapest solution, nor the longest and most expensive.
            >
            > I suggest that having the local domain experts, such as writers and editors, do an initial round of very basic subject and named entity tagging during the editorial process. This would serve to help get the content online quickly, and would form the basis for a level of tagging suitable for long-term archiving and wider distribution.
            >
            > Subsequently, the central tagging group would be responsible for ensuring consistency, adherence to the authority files, supplemental subject term and named entity tagging, and ensuring accessibility over the long term to a wide audience.
            >
            > Neither group would bear the full burden of tagging, and the result would be better than if tagging were done by one group only.
            >
            > Hope this helps.
            >
            > Best,
            >
            > -Chris
            >
          • Jami J
            When I interned at CNN s library while finishing my MLIS, they cataloged their moving imaging collection in much they way you described: by using a controlled
            Message 5 of 24 , Mar 28, 2009
            • 0 Attachment
              When I interned at CNN's library while finishing my MLIS, they cataloged their moving imaging collection in much they way you described: by using a controlled vocabulary and by including a full-text searchable content description paragraph. It worked very well for their needs.

              --- On Sat, 3/28/09, Brandon Smith <redwoodtwig@...> wrote:
              From: Brandon Smith <redwoodtwig@...>
              Subject: [TaxoCoP] Re: tagging done by content authors vs professional indexers
              To: TaxoCoP@yahoogroups.com
              Date: Saturday, March 28, 2009, 6:57 AM

              I think this is a very important read for anyone thinking about keywording their photographs or digital images. It's mainly addressing tagging word content, but the main point is that good quality tagging is different skill from creating the image or content.

              An idea that jumped out at me is that perhaps the best way to index photographs would be to include a paragraph describing the photo in the metadata that could be searched through full text techniques.

              --- In TaxoCoP@yahoogroups .com, Ron Daniel <rdaniel@... > wrote:
              >
              > I asked my friend, Chris Green, about his experiences with this issue at a major magazine publisher. That publisher will remain nameless, we will refer to it as XYZ Inc. Chris sent the following reply to me, which emphasizes some of Heather's points and makes some new ones. He does not know of any sources to cite. He gave me permission to publish this to the group, so thanks for that Chris. [I've bcc'ed him on this message in order to keep his email address away from spammers.]
              >
              >
              >
              > Ron
              >
              > - - - - -
              >
              > About Tagging
              >
              > * Is having some tagging better than having no tagging? Yes, if it's well done, but poor tagging can be, in many cases, much worse than no tagging. This cannot be overemphasized.
              >
              > * Tagging should be done in such a way that it serves not only the immediate needs of online content delivery, but also the needs of long-term archiving and of multiple constituencies.
              >
              > * If, for some reason, the situation demands that tagging must be done either locally, by writers and editors, or centrally by dedicated professionals, but not by both, I would advocate using the centralized professionals.
              >
              >
              > Rationales for Resistance
              >
              > * At XYZ Inc., there was consistent, strong resistance to the idea of tagging within the Editorial Departments. These departments included writers, researchers, editors, and copy operations. The resistance was universal among these groups, and it persisted for at least a decade across all of the titles.
              >
              > While the Editorial groups arguably would have been the greatest beneficiaries of a well-organized, accessible, and well-tagged text archive, their arguments against having them act as taggers included:
              >
              > a) It was all they could do to get the print magazines out every week, and they had no available bandwidth for any additional work
              >
              > b) They had no knowledge, training, or experience at tagging, and they felt that tagging was an important function that should not be an afterthought left to people who were interested only in getting it over with as quickly as possible
              >
              > c) They were the frontline people who created the content that brought in the money and the readers, and that whatever happened or didn't happen to the content after it was published was someone else's problem
              >
              > d) While I don't think it ever got far enough for the Writer's Guild to get involved, I suspect the union would have raised serious objections to re-defining the roles of their members without additional training and compensation
              >
              >
              > Competing Requirements
              >
              > * In many organizations, there are competing requirements for the tagging of new content. On the one hand, tagging must be done as quickly as possible, even at the expense reduced quality, in order to publish the content online promptly and remain competitive. This argues for local tagging within the editorial process by writers and editors.
              >
              > On the other hand, it is also important to tag the content in such a way that it is suitably described for long-term archiving and for use by a much wider range of groups than visitors to the website.
              >
              > Generally speaking, the need for speed wins, and the needs of long-term archiving are frequently ignored as irrelevant. I believe short-term and long-term tagging need not be mutually exclusive, and that ignoring or dismissing the importance of tagging for long-term archiving is misguided, and a good example of being penny wise and pound foolish.
              >
              >
              > Support for Multiple Constituencies
              >
              > * Not surprisingly, editors and writers, the most likely people to do tagging, typically have a very narrow point of view that is based on their own self-interest. They would quite naturally tend to tag articles according to the way they thought they might search for them later. This is the editorial equivalent of Steinberg's famous cartoon of a New Yorker's View of the World.
              >
              > One of the desirable traits that a trained librarian brings to the task of tagging is an awareness of both the short- and long-term needs of multiple constituencies, including not just the editorial groups but also groups such as web editors and users, the legal department, the publicity and promotion department, the ad sales department, the production groups, senior management, and business partners. This broadly-useful tagging can dramatically increase the value of an archive and its ROI.
              >
              > The fact is that editors and writers are ill-equipped to address these needs effectively, and in most cases, they would be the first to admit it.
              >
              >
              > Granularity and Consistency
              >
              > * Tagging by hand, even by librarians, requires a relatively small taxonomy of subject terms if the tagging is to be accomplished in a reasonable amount of time, and, as always, Time is Money.
              >
              > Having said that, though, a taxonomy containing a maximum of 100-200 terms and suitable for manual tagging cannot adequately describe articles within corpuses such as those of [names of 2 major magazines deleted]. The granularity is simply too coarse, and the tagging cannot provide the degree of targeted searching that those who go to the trouble of doing this kind of tagging require and expect, especially when supporting multiple constituencies.
              >
              > The result sets are often too large to be useful, and the difference between a keyword (subject term) search and a plain old full-text search is frequently not significant enough to warrant the time and expense of tagging.
              >
              > Also, the time available to tag each article is relatively short, both for production scheduling reasons and for cost containment, and this typically results in only one or two subject tags per article, which is often far fewer than a more appropriate number. This limited tagging only compounds the problem with overly coarse granularity.
              >
              > One of the biggest problems with manual tagging, even by dedicated professionals using a small taxonomy, is a lack of consistency. Consistent tagging is absolutely crucial if one is to realize the potential value of subject term tagging.
              >
              > The degree of inconsistency created by a team of 8-10 dedicated professionals using a small taxonomy results in some problems with searching, but it will not undermine the value of the archive.
              >
              > However, using many non-professional, non-dedicated taggers, such as editors and writers, is quite likely to add a level of inconsistency that could represent a serious liability. This is especially true if the writers and editors are spread across multiple publications within the enterprise.
              >
              >
              > Named Entity Tagging
              >
              > * While the capturing of routine objective metadata such as issue date and publication name is relatively easy to automate, human judgment is typically required to add subjective tagging.
              >
              > While computer-assisted indexing (CAI) can offer significant benefits in this regard, these systems are quite resource-intensive in terms of initial cost, technology requirements, staffing, and maintenance.
              >
              > It is important to keep in mind that subjective tagging includes not only the application of subject terms, but also the identification of important named entities and article types. While a taxonomy of subject terms can be kept relatively flat and limited in size and scope, and a list of article types is usually short, tagging a variety of named entity types is a whole different, and much more complicated, problem.
              >
              > Named entity tagging is arguably the most important kind of tagging, and can give the biggest bang for the buck in terms of useful search results. However, the effectiveness of named entity tagging relies almost exclusively on the consistency of the entity names. This, in turn, requires the creation and maintenance of, and careful adherence to, a set of complex authority files and a detailed manual of best practices. Doing these things is not a simple proposition, to say the least.
              >
              > Even with well-designed and managed authority files, well-designed and managed maintenance procedures, and a clear, detailed set of best practices, I would argue that there is simply no way that writers and editors are going to acquire the skills, interest, and knowledge to do consistent, effective named entity tagging. This requires a group of dedicated, knowledgeable professionals.
              >
              > I have seen first-hand what a lack of consistency in named entity tags can do to search results, and it is ugly. Without the requisite management structures in place, I would suggest avoiding named entity tagging altogether, even though it has the promise of being extremely valuable, and not having it significantly reduces the usefulness of an archive.
              >
              >
              > The Cost of Poor Tagging
              >
              > * It's important to keep in mind that cleaning-up a badly-tagged archive is likely to be more time-consuming and expensive than simply starting over again from scratch, an approach which is difficult to sell because the keepers of the purse strings have been soured on tagging as a result of their bad experience with the first attempt at tagging.
              >
              >
              > An Argument in Favor of Author/Editor Tagging
              >
              > * The one reasonably compelling argument I can think of in favor of local tagging, as opposed to centralized tagging, is that the writer and the editor already know, in detail, what an article is about, and what named entities are important.
              >
              > In contrast, given the typical time and cost constraints, centralized taggers must quickly skim each article and then tag it. They may or may not have sufficient expertise in the subject domain to tag it appropriately, and so, not surprisingly, their tagging of any given article may be somewhat less than optimal.
              >
              >
              > A Suggested Compromise
              >
              > There is really no reason why tagging must be an all-local or all-central effort. It should be possible to take advantage of both groups' strengths, minimize the effects of their weaknesses, get very good to excellent tagging quality, and still manage time and cost. This would not be the fastest or cheapest solution, nor the longest and most expensive.
              >
              > I suggest that having the local domain experts, such as writers and editors, do an initial round of very basic subject and named entity tagging during the editorial process. This would serve to help get the content online quickly, and would form the basis for a level of tagging suitable for long-term archiving and wider distribution.
              >
              > Subsequently, the central tagging group would be responsible for ensuring consistency, adherence to the authority files, supplemental subject term and named entity tagging, and ensuring accessibility over the long term to a wide audience.
              >
              > Neither group would bear the full burden of tagging, and the result would be better than if tagging were done by one group only.
              >
              > Hope this helps.
              >
              > Best,
              >
              > -Chris
              >


            • Christine Connors
              This is a great thread indeed! Something Brandon said though struck me, and is in line with some other thoughts I ve been having. We all immediately presumed
              Message 6 of 24 , Mar 28, 2009
              • 0 Attachment
                This is a great thread indeed! Something Brandon said though struck me, and is in line with some other thoughts I've been having. We all immediately presumed that we were talking about tagging with WORDS. What if I want to tag with images or sounds (or, when the technology is available to the masses, odors/aromas)? We can do geo-coding, but that's really just another form of tagging with a string.

                It is possible to do via content negotiation; has anyone tried it?

                Cheers,
                Christine


                From: Brandon Smith <redwoodtwig@...>
                To: TaxoCoP@yahoogroups.com
                Sent: Saturday, March 28, 2009 9:57:23 AM
                Subject: [TaxoCoP] Re: tagging done by content authors vs professional indexers

                I think this is a very important read for anyone thinking about keywording their photographs or digital images. It's mainly addressing tagging word content, but the main point is that good quality tagging is different skill from creating the image or content.

                An idea that jumped out at me is that perhaps the best way to index photographs would be to include a paragraph describing the photo in the metadata that could be searched through full text techniques.


              • Matt Moore
                Hello, Any one interested in image (& esp. photo) metadata should visit here: http://www.phmdc.org/ Most of the papers from the 2007 & 2008 conferences are
                Message 7 of 24 , Mar 28, 2009
                • 0 Attachment
                  Hello,

                  Any one interested in image (& esp. photo) metadata should visit here: http://www.phmdc.org/

                  Most of the papers from the 2007 & 2008 conferences are available and make interesting reading (at least to someone like me they do).

                  Regards,

                  Matt

                • Matt Moore
                  Hello, A completely different spin on this topic can be found here: http://video.google.com/videoplay?docid=-8246463980976635143 and here:
                  Message 8 of 24 , Mar 28, 2009
                  • 0 Attachment
                    Hello,

                    A completely different spin on this topic can be found here: http://video.google.com/videoplay?docid=-8246463980976635143 and here: http://www.cs.cmu.edu/~biglou/

                    Regards,

                    Matt

                  • Seth Maislin
                    So much has been said on this topic already, but clarification is needed. There aren t only two options for who tags documents. There are three: the authors,
                    Message 9 of 24 , Mar 28, 2009
                    • 0 Attachment

                      So much has been said on this topic already, but clarification is needed. There aren’t only two options for who tags documents. There are three: the authors, professional indexers, and “somebody else.” There are many circumstances in which tags must be written by a third-party and, let’s be honest, the whole premise behind things like folksonomies and clouds and social tagging is that the people who provide tags can really be anyone. With social tagging, majority rules, and in isolated circumstances that can be acceptable.

                       

                      It’s not hard to find the disadvantages and advantages. The challenge is in aligning them with your business needs.

                       

                      Disadvantages of using authors:

                      - mismatch with the (imperfect) language of everyday users;

                      - lack of understanding of how tags function and co-exist;

                      - lack of availability, energy, interest, etc.;

                      - potential misalignment with business needs.

                       

                      Disadvantages of using non-indexer third parties:

                      - lack of understanding of how tags function and co-exist;

                      - inconsistent availability and dwindling interest;

                      - potential for error caused by indifference, ignorance, or malice;

                      - added concerns over security, privacy, and intellectual property;

                      - increased tools requirements;

                      - misalignment with business needs.

                       

                      Disadvantages of using professional indexers:

                      - cost.

                       

                      Don’t be surprised. As with anything in life, the only disadvantage ever to using the right people for the right job – indexers for your indexing – is cost. Depending on your needs, however, indexers can be expensive. Many here say it’s money well spent, but only you can decide that. Paying for quality indexing now might prevent problems and save money later, but steady cash flow and debt minimization are valid counterarguments.

                       

                      Indeed, there is no one best indexing approach. Maybe you need a hybrid approach, like using indexer-managed assistive tagging to aid your author taggers, or enabling social tagging but under high scrutiny. Whatever your needs, you MUST devise a tagging implementation that balances your business needs with the needs of your users and the nature of your content. And in all cases, getting an expert opinion from a professional indexer is your best, first step. After all, they are the right people for the job.

                       

                      - Seth

                      - - - - - - - - -

                      Seth Maislin, Senior Taxonomist

                      Earley & Associates

                      sethm@...

                      http://www.earley.com

                       

                       

                    • Patrick Lambe
                      Doesn t all this depend on the purposes the metadata is intended to serve? Where precision of metadata is critical, the quality needs to be looked after,
                      Message 10 of 24 , Mar 29, 2009
                      • 0 Attachment
                        Doesn't all this depend on the purposes the metadata is intended to
                        serve?

                        Where precision of metadata is critical, the quality needs to be
                        looked after, whether by professionals, business rules or context-
                        assigned tags. Where it's not, or where serendipity is an intended
                        benefit, more varied techniques can be combined. I liked Seth
                        (Maislin's) idea of a combination of strategies.

                        P

                        Patrick Lambe

                        weblog: www.greenchameleon.com
                        website: www.straitsknowledge.com
                        book: www.organisingknowledge.com

                        Have you seen our KM Method Cards? http://www.straitsknowledge.com/store/
                      • David Riecks
                        ... Christine: Actually I didn t presume text only.... but did assume their might be text within a container (PDF, Powerpoint, Illustrations), in addition to
                        Message 11 of 24 , Mar 29, 2009
                        • 0 Attachment
                          At 08:41 AM 3/28/2009, Christine Connors wrote:
                          >We all immediately presumed that we were talking about tagging with
                          >WORDS. What if I want to tag with images or sounds (or, when the
                          >technology is available to the masses, odors/aromas)? We can do
                          >geo-coding, but that's really just another form of tagging with a string.

                          Christine:

                          Actually I didn't presume text only.... but did assume their might be
                          text within a container (PDF, Powerpoint, Illustrations), in addition
                          to images, video, and other things besides text documents.

                          I work with software that understands that a WAV audio file that has
                          the same name as a proprietary RAW or JPEG file from a digital camera
                          needs to stay with that file, but haven't ever tried to create
                          situations where I then tag that audio and images with a text document.

                          There are specific needs (such as with model and property releases)
                          where I like to be able to create a relationship between the release
                          and the image file with those people or buildings within the frame.
                          However that can usually be accomplished, by giving the printed
                          release (or a digital facsimile of it) a number that either is the
                          same as the image, or tied the number of the release to one of the
                          metadata fields in the image.

                          Embedding the GPS coordinates within the actual JPEG or in the RAW
                          file (or a sidecar) isn't that difficult, however, it's much more
                          useful if that information is translated (reverse geo-encoded), to
                          determine the precise location, city, state, country, etc., as those
                          are much easier to search.

                          Images, and other media objects that don't have lots of easily
                          available text require a significantly different approach to tagging
                          or indexing, as images aren't made of text, they are made of pixels.

                          David

                          --
                          David Riecks (that's "i" before "e", but the "e" is silent)
                          Need Keywords for your database? Get the Controlled Vocabulary Solution
                          http://controlledvocabulary.com/products/ support for a dozen of the
                          most popular imaging applications from Adobe Bridge to Photo Mechanic.
                        • Christine Connors
                          Thumbs up for the hybrids!!! Authors, indexers, publishers, machines - all working together! Best of breed for tagging. ... Christine
                          Message 12 of 24 , Mar 30, 2009
                          • 0 Attachment
                            Thumbs up for the hybrids!!! Authors, indexers, publishers, machines - all working together!  Best of breed for tagging.

                            :)
                            Christine


                            From: Patrick Lambe <plambe@...>
                            To: TaxoCoP@yahoogroups.com
                            Sent: Sunday, March 29, 2009 9:53:50 PM
                            Subject: Re: [TaxoCoP] Re:tagging done by content authors vs professional indexers

                            Doesn't all this depend on the purposes the metadata is intended to
                            serve?

                            Where precision of metadata is critical, the quality needs to be
                            looked after, whether by professionals, business rules or context-
                            assigned tags. Where it's not, or where serendipity is an intended
                            benefit, more varied techniques can be combined. I liked Seth
                            (Maislin's) idea of a combination of strategies.

                            P

                            Patrick Lambe

                            weblog: www.greenchameleon. com
                            website: www.straitsknowledg e.com
                            book: www.organisingknowl edge.com

                            Have you seen our KM Method Cards? http://www.straitsk nowledge. com/store/


                          • Rob Page
                            ... It seems like tagging with non-traditional tags (e.g., an aroma) is more of a user experience issue (for the indexer/tagger). In the end it will all
                            Message 13 of 24 , Mar 30, 2009
                            • 0 Attachment
                              On Mar 28, 2009, at 10:41 AM, Christine Connors wrote:
                              > This is a great thread indeed! Something Brandon said though struck
                              > me, and is in line with some other thoughts I've been having. We all
                              > immediately presumed that we were talking about tagging with
                              > WORDS. What if I want to tag with images or sounds (or, when the
                              > technology is available to the masses, odors/aromas)? We can do
                              > geo-coding, but that's really just another form of tagging with a
                              > string.

                              It seems like tagging with "non-traditional" tags (e.g., an aroma) is
                              more of a user experience issue (for the indexer/tagger). In the end
                              it will all need to be reduced to some sort of data for storage
                              analysis.

                              > It is possible to do via content negotiation; has anyone tried it?

                              What do you mean by content negotiation?

                              We allow content producers to relate any content to any content. The
                              link itself carries basic semantics. For example an MP3 can be
                              related to a story with a link type of either "spoken_audio_version"
                              or "audio_interview." Our use cases for this data center around
                              presentation but we're excited about the semantic possibilities as
                              well (e.g., what if the spoken audio asset was related to a person
                              object).

                              --
                              Rob Page V: 540 361 1710
                              Zope Corporation F: 703 995 0412
                            • Christine Connors
                              Hi - By content negotiation I mean the mechanism in the web protocol that lets you name various forms of a resource the same thing, and then use your HTTP
                              Message 14 of 24 , Mar 31, 2009
                              • 0 Attachment
                                Hi -

                                By content negotiation I mean the mechanism in the web protocol that lets you name various forms of a resource the same thing, and then use your HTTP request to retrieve the correct form. So, basically, I can have a persistent URI with the text label that represents something, an image of it, a video, audio file etc all at the same place. See http://en.wikipedia.org/wiki/Content_negotiation and http://httpd.apache.org/docs/1.3/content-negotiation.html.

                                Why do I care? So I can tag something with a non-text resource, but still retrieve it using text-based search. Yes, it's user-experience, but isn't the idea to give the user what works best for them?

                                Cheers,
                                Christine



                                From: Rob Page <rob.page@...>
                                To: TaxoCoP@yahoogroups.com
                                Sent: Monday, March 30, 2009 2:49:45 PM
                                Subject: Re: Non-text tagging [WAS Re: [TaxoCoP] Re: tagging done by content authors vs professional indexers]

                                On Mar 28, 2009, at 10:41 AM, Christine Connors wrote:
                                > This is a great thread indeed! Something Brandon said though struck
                                > me, and is in line with some other thoughts I've been having. We all
                                > immediately presumed that we were talking about tagging with
                                > WORDS. What if I want to tag with images or sounds (or, when the
                                > technology is available to the masses, odors/aromas) ? We can do
                                > geo-coding, but that's really just another form of tagging with a
                                > string.

                                It seems like tagging with "non-traditional" tags (e.g., an aroma) is
                                more of a user experience issue (for the indexer/tagger) . In the end
                                it will all need to be reduced to some sort of data for storage
                                analysis.

                                > It is possible to do via content negotiation; has anyone tried it?

                                What do you mean by content negotiation?

                                We allow content producers to relate any content to any content. The
                                link itself carries basic semantics. For example an MP3 can be
                                related to a story with a link type of either "spoken_audio_ version"
                                or "audio_interview. " Our use cases for this data center around
                                presentation but we're excited about the semantic possibilities as
                                well (e.g., what if the spoken audio asset was related to a person
                                object).

                                --
                                Rob Page V: 540 361 1710
                                Zope Corporation F: 703 995 0412


                              • Christine Connors
                                Hi - I m not quite sure we re talking apples to apples here. But I hear what you re saying; and for current retrieval mechanisms the text containers attached
                                Message 15 of 24 , Mar 31, 2009
                                • 0 Attachment
                                  Hi -

                                  I'm not quite sure we're talking apples to apples here. But I hear what you're saying; and for current retrieval mechanisms the text containers attached to non-text resources are critical. Indexing pixels, as it were is far from everyday technoglogy.

                                  It's always bugged me that we so often place the metadata in files separate from the resource being indexed - what happens if those links break? It's not like a book, where we could glue in or write in the metadata. Just another random thought.

                                  Regards,
                                  Christine


                                  From: David Riecks <david@...>
                                  To: TaxoCoP@yahoogroups.com
                                  Sent: Monday, March 30, 2009 2:13:16 AM
                                  Subject: Re: {Disarmed} Non-text tagging [WAS Re: [TaxoCoP] Re: tagging done by content authors vs professional indexers]

                                  At 08:41 AM 3/28/2009, Christine Connors wrote:

                                  >We all immediately presumed that we were talking about tagging with
                                  >WORDS. What if I want to tag with images or sounds (or, when the
                                  >technology is available to the masses, odors/aromas) ? We can do
                                  >geo-coding, but that's really just another form of tagging with a string.

                                  Christine:

                                  Actually I didn't presume text only.... but did assume their might be
                                  text within a container (PDF, Powerpoint, Illustrations) , in addition
                                  to images, video, and other things besides text documents.

                                  I work with software that understands that a WAV audio file that has
                                  the same name as a proprietary RAW or JPEG file from a digital camera
                                  needs to stay with that file, but haven't ever tried to create
                                  situations where I then tag that audio and images with a text document.

                                  There are specific needs (such as with model and property releases)
                                  where I like to be able to create a relationship between the release
                                  and the image file with those people or buildings within the frame.
                                  However that can usually be accomplished, by giving the printed
                                  release (or a digital facsimile of it) a number that either is the
                                  same as the image, or tied the number of the release to one of the
                                  metadata fields in the image.

                                  Embedding the GPS coordinates within the actual JPEG or in the RAW
                                  file (or a sidecar) isn't that difficult, however, it's much more
                                  useful if that information is translated (reverse geo-encoded) , to
                                  determine the precise location, city, state, country, etc., as those
                                  are much easier to search.

                                  Images, and other media objects that don't have lots of easily
                                  available text require a significantly different approach to tagging
                                  or indexing, as images aren't made of text, they are made of pixels.

                                  David

                                  --
                                  David Riecks (that's "i" before "e", but the "e" is silent)
                                  Need Keywords for your database? Get the Controlled Vocabulary Solution
                                  http://controlledvo cabulary. com/products/ support for a dozen of the
                                  most popular imaging applications from Adobe Bridge to Photo Mechanic.


                                • David Riecks
                                  ... Christine: With many proprietary file formats, the use of sidecar files is a simple way to share textual information about a non-text resource. As you
                                  Message 16 of 24 , Apr 3, 2009
                                  • 0 Attachment
                                    At 07:42 PM 3/31/2009, Christine Connors wrote:
                                    It's always bugged me that we so often place the metadata in files separate from the resource being indexed - what happens if those links break? It's not like a book, where we could glue in or write in the metadata. Just another random thought.

                                    Christine:

                                    With many proprietary file formats, the use of sidecar files is a simple way to share textual information about a non-text resource. As you point out, there can be serious consequences if this is the only place the data is stored, and the sidecar file is lost. However, with some non-text resources or specific file formats, this isn't the only way.

                                    I have worked with digital images since the early 90's and use standards promoted by the International Press Telecommunication Council (IPTC) to store a wide variety of information within the file itself. There are a wide variety of professional image management tools that can both read and write metadata using two IPTC Standards, including Photoshop, Bridge, Lightroom, Expression Media, Photo Mechanic, Breeze Browser, IDimager, Extensis Portfolio, and Canto's MediaDex and Cumulus. In addition, there are freeware tools like Irfanview, XnView, and even Google's Picasa.

                                    There are limits to the number of fields in the older standard, but the new version is based on XMP and the latest version of the schema contains fields to represent information deemed essential by the Press, as well as Stock Photography, and Cultural Heritage communities.

                                    You can read more about these IPTC standards at: http://www.iptc.org/cms/site/index.html;jsessionid=aJ6G0by3_-Nc?channel=CH0089

                                    I am currently project leader for the Stock Artists Alliance, Photo Metadata Project ( http://www.stockartistsalliance.org/photometadata-project), which will be launching a website and a 10 city tour to promote the use of embedded photo metadata. I'll post information about that schedule as soon as it's availalble for those that may be interested.

                                    Hope that helps.

                                    David

                                    --
                                    David Riecks  (that's "i" before "e", but the "e" is silent)
                                    Need Keywords for your database? Get the Controlled Vocabulary Solution
                                    http://controlledvocabulary.com/products/ support for a dozen of the
                                    most popular imaging applications from Adobe Bridge to Photo Mechanic.

                                  • Torrie Hodgson
                                    Sometimes we need to index content that is owned by someone else that we can t force into using our indexing practices, or it resides on servers that don t
                                    Message 17 of 24 , Apr 6, 2009
                                    • 0 Attachment
                                      Sometimes we need to index content that is owned by someone else that we can't force into using our indexing practices, or it resides on servers that don't have room for additional metadata (schema or disk space). There are probably even more similar scenarios in addition to indexing non-textual content where the metadata needs to reside outside the indexed file itself.
                                       
                                      The tricky part is staying informed about where the indexed content resides, what the data schema is, and getting informed whenever there are changes planned. (The key word is "planned" so that the solution can be put into place before something breaks.)
                                       
                                      Thanks,
                                      Torrie Thomas

                                      On Fri, Apr 3, 2009 at 8:09 PM, David Riecks <david@...> wrote:

                                      At 07:42 PM 3/31/2009, Christine Connors wrote:

                                      It's always bugged me that we so often place the metadata in files separate from the resource being indexed - what happens if those links break? It's not like a book, where we could glue in or write in the metadata. Just another random thought.

                                      Christine:

                                      With many proprietary file formats, the use of sidecar files is a simple way to share textual information about a non-text resource. As you point out, there can be serious consequences if this is the only place the data is stored, and the sidecar file is lost. However, with some non-text resources or specific file formats, this isn't the only way.

                                      I have worked with digital images since the early 90's and use standards promoted by the International Press Telecommunication Council (IPTC) to store a wide variety of information within the file itself. There are a wide variety of professional image management tools that can both read and write metadata using two IPTC Standards, including Photoshop, Bridge, Lightroom, Expression Media, Photo Mechanic, Breeze Browser, IDimager, Extensis Portfolio, and Canto's MediaDex and Cumulus. In addition, there are freeware tools like Irfanview, XnView, and even Google's Picasa.

                                      There are limits to the number of fields in the older standard, but the new version is based on XMP and the latest version of the schema contains fields to represent information deemed essential by the Press, as well as Stock Photography, and Cultural Heritage communities.

                                      You can read more about these IPTC standards at: http://www.iptc.org/cms/site/index.html;jsessionid=aJ6G0by3_-Nc?channel=CH0089

                                      I am currently project leader for the Stock Artists Alliance, Photo Metadata Project ( http://www.stockartistsalliance.org/photometadata-project), which will be launching a website and a 10 city tour to promote the use of embedded photo metadata. I'll post information about that schedule as soon as it's availalble for those that may be interested.

                                      Hope that helps.

                                      David

                                      --
                                      David Riecks  (that's "i" before "e", but the "e" is silent)
                                      Need Keywords for your database? Get the Controlled Vocabulary Solution
                                      http://controlledvocabulary.com/products/ support for a dozen of the
                                      most popular imaging applications from Adobe Bridge to Photo Mechanic.




                                      --
                                      Torrie Thomas
                                      torriehodgson@...
                                    Your message has been successfully submitted and would be delivered to recipients shortly.