Loading ...
Sorry, an error occurred while loading the content.

RE: [TaxoCoP] The 'Google versus Taxonomy' issue...

Expand Messages
  • Seth Earley
    Hi Brendan, I hate when that happens.... :-) Another colleague recently told me all of these messages were ending up in their spam folder. And I thought they
    Message 1 of 25 , Jan 7, 2006
    View Source
    • 0 Attachment
      Hi Brendan,
       
      I hate when that happens....  :-)
       
      Another colleague recently told me all of these messages were ending up in their spam folder.  And I thought they just didn't like me.  (It was actually Theresa Regli who is the person I debated taxo issues with in the video Bob Doyle posted.  That was her excuse for not participating in the CoP discussions.  Lets see if she comes on board now... <smile>)
       
      I think the point here is one of disambiguation - distinguishing from terms that have  generic meaning versus a more nuanced meaning in a specific context. 
       
      I was just talking to someone at a large insurance company who is considering the Google appliance for full text search of "typical" types of information - terms that have a straightforward meaning and in contexts that are not specific to a process or the particular needs of an audience.  They still use controlled vocabularies for more precise metadata searches when the differences between documents are more subtle. 
       
      A great example of metadata driven navigation and presentation is at www.cabot-corp.com .  They serve thousands of applications with a material called "carbon black" which is just really refined soot.  A search on their site for "carbon black" is not going to be very meaningful.  On the other hand, one can navigate very precisely to a specific document based on metadata describing product, market, and application.  
       
      Here is a link to a couple of slides that I use in my talks to illustrate this: http://www.earley.com/Parametric_Search.ppt
       
      One is the example of www.pcconnection.com, another relates to a technology firm and the document repositories that served consultants and the last is from Cabot Corporation.
       
      The last slide illustrates locating this document in "three dimensional space" with these three parameters. But this is really "n dimensional space" since we can have as many parameters describing a document as we want. 
       
      In most of these cases, searching on "typical" terms and using full text search would not return meaningful results.  Part of the reason for this is that users do not always think in precise enough terms.  They will use a very broad term and expect specific results not recognizing that it is simply not possible to retrieve an exact result when using an overly broad search term.
       
      So a controlled vocabulary also helps to force the user to describe what they are looking for using more exacting descriptors.  
       
      I tell the story of a consultant looking for "strategy for an insurance company to outsource their call center to India".  The term he searched on?  "deliverable"... 
       
      No matter how good it is, Google will not return a satisfactory search result if users don't think precisely enough.  Part of the value of the taxonomy is to help people understand how to characterize what they need. 
       
      Seth
       

      Seth Earley

      Earley & Associates, Inc

      781-444-0287

      781-820-8080 cell

      Next taxo conference call January 25th, 2 PM EST
      "Semantic technologies"

      Registration and agenda at www.earley.com/events.htm
      Taxonomy Community of Practice

      http://finance.groups.yahoo.com/group/TaxoCoP

      -----Original Message-----
      From: TaxoCoP@yahoogroups.com [mailto:TaxoCoP@yahoogroups.com]On Behalf Of Brendan Quinn
      Sent: Friday, January 06, 2006 8:03 AM
      To: TaxoCoP@yahoogroups.com
      Subject: RE: [TaxoCoP] The 'Google versus Taxonomy' issue...

      Hi all, I'd forgotten I was subscribed to this list, I saw about 500 messages in the folder when I remembered to look... definitely some kind of information management issue going on there :-)
       
      This is an interesting discussion, I find it's always useful to go back to first principles and remember why these techniques and technologies are useful in the first place.
       
      I agree with Patrick's points that taxonomies / controlled vocabularies give you more than just "findability", and I think I also agree that if simple keyword-search-based "findability" is all you need then a decent site search function is all you need. And that for most people, there's more to it than just findability. But I thought it would be worth delving into just what those other uses might be.
       
      One obvious use is metadata-based publishing: a real example of ours is the Isle of Man website, http://www.bbc.co.uk/isleofman/ which is created mostly automatically based on the location metadata assigned to content from nearby regions. We don't actually have an Isle of Man office, but we have enough content about the area to justify a small automated site. Obviously this is useful, and while you could say an automated search for "isle of man" could perhaps produce the same content, it would be fraught with difficulties (not the least because those three words are very generic, and may not even appear in a story about the town of Douglas (another search nightmare!) or "the isle" as it's known locally.
       
      We also use our metadata fields to create subject-specific RSS feeds and more. We will be growing this capability quite a lot this year.
       
      Another might be a more complicated search: we produce the UK version of the sitcom "The Office" and have a website about it. If we wanted to aggregate content from around the rest of the BBC to link from http://www.bbc.co.uk/comedy/theoffice/ or simply allow people to search for it then a lot of spurious results come up: on the bbc.co.uk-wide site search, the top result for a search on "the office" (not counting the thesaurus-based "best links") is a news story about a fire in an office block.
       
      Of course, if you don't need either of these types of use cases, then by all means, don't worry about controlled vocabularies, just get in a search engine!
       
      Does anyone have any other examples of uses for controlled vocabulary usage that search engines can't do?
       
      Brendan (who will hopefully get more involved in the conversation now :-)
      --
      Brendan Quinn | Technical Architect | bbc.co.uk
      Broadcast Centre BC5 B6, Media Village, 201 Wood Lane London W12 7TP
      Brendan.Quinn@... | +44 (0)20 800 85097 | +44 (0)7900 847 358


      From: TaxoCoP@yahoogroups.com [mailto:TaxoCoP@yahoogroups.com] On Behalf Of Patrick Lambe
      Sent: 04 January 2006 12:17
      To: TaxoCoP@yahoogroups.com
      Subject: RE: [TaxoCoP] The 'Google versus Taxonomy' issue...

      If your objective is just findability, then it's true you may want to exclude other management activities (but be sure you don't want those other management capabilities!)
       
      A taxonomy can also assist in findability - by helping people to browse using the taxonomy, or using the taxonomy to inform the navigation structure of a website. We've found that some types of people prefer to find by browsing, others prefer keyword search - this varies a lot depending on prior experiences (bad browsing experience pushes people towards a keyword preference and vice versa) but the pattern we see lies in the 60:40 range - either way.
       
      [On Verity, I've seen situations where it's been implemented without a structured taxonomy being designed for it. It generates its own best guess taxonomy, but without good structuring by the client organisation this yields very variable results. It doesn't surprise me that search results suffer as a result. Is this the case with Dept of Trsnaportation?]
       
      But if you only want to retrieve, and everybody wants Google and nothing else, who's to argue? Why not try to find out if that's the case? Do some user research?
       
      The main thing to note about taxonomy work in this is that it does yield benefits in a number of directions, NOT just retrieval... so if you're going to invest in it, it's a good idea to try to exploit its full value... which might make life appear overly-complicated for the "retrieve-only" projects!
       
      Patrick

      <TaxoCoP@yahoogroups.com> wrote:
      Seth said: Can you point us to the sources?
       
      Reported by colleague based on experiences at US Department of Transportation. Additional comments: "The controlled vocabulary work was largely lost when the Google appliances were purchase and put into replace Verity engines. The Google took the search results from being 30 to 40 percent helpful to over 80 percent when surveying users."

      -- Patrick Lambe Principal Consultant Straits Knowledge www.straitsknowledge.com Tel. 97542165
      http://www.bbc.co.uk/

      This e-mail (and any attachments) is confidential and may contain
      personal views which are not the views of the BBC unless specifically
      stated.
      If you have received it in error, please delete it from your system.
      Do not use, copy or disclose the information in any way nor act in
      reliance on it and notify the sender immediately. Please note that the
      BBC monitors e-mails sent or received.
      Further communication will signify your consent to this.
    • Seth Earley
      Hi Jordan, What search engine are you using? You can make those kinds of weighting adjustments with tools like Verity. I would defer to others with regard to
      Message 2 of 25 , Jan 8, 2006
      View Source
      • 0 Attachment
        Hi Jordan,
         
        What search engine are you using?  You can make those kinds of weighting adjustments with tools like Verity.  I would defer to others with regard to the exact mechanism.
         
        Seth
         
        Hi all,

        I have a question about term downweighting versus stop words. 

        I'm trying to fix the problem where a search on our site for "lumber company" returns unrelated categories with "company" in the title based on our search algorithm. I could do something new, and define a set of terms which would have lower ranking in terms of relevancy or the quicker solution would be to add these terms to our stop word list of terms we ignore (like "the" and "and").  My concern about having them as stop-words would be a search like "retail clothing" if we make "retail" a stopword then the search would be just "clothing" which would then return our "wholesale clothing" category, and would appear to the user to ignore their intent.

        Anyone have experience with this issue, advice?

        Thanks-

        Jordan

      • Marcia Morante
        Jordan, your search engine is probably using the default OR operator between words. You can avoid this by some behind the scenes parsing, making the two
        Message 3 of 25 , Jan 9, 2006
        View Source
        • 0 Attachment
          Jordan, your search engine is probably using the default OR operator between words.  You can avoid this by some behind the scenes parsing, making the two search terms into a proximity statement; i.e., "word 1 must be next to word 2 in the order specified" OR "word 1 must be in the same sentence as word 2 in any order" OR ...... many configurations of this.  Check the search engine manual for the syntax that it accepts. 
           
          Good luck.
           
          Marcia Morante
          KCurve, Inc.
          (718)881-5915 - office
          (917)821-2087 - mobile
          http://kcurve.com
           


          From: TaxoCoP@yahoogroups.com [mailto:TaxoCoP@yahoogroups.com] On Behalf Of Seth Earley
          Sent: Sunday, January 08, 2006 10:08 PM
          To: TaxoCoP@yahoogroups.com
          Subject: RE: [TaxoCoP] Term Downweighting

          Hi Jordan,
           
          What search engine are you using?  You can make those kinds of weighting adjustments with tools like Verity.  I would defer to others with regard to the exact mechanism.
           
          Seth
           
          Hi all,

          I have a question about term downweighting versus stop words. 

          I'm trying to fix the problem where a search on our site for "lumber company" returns unrelated categories with "company" in the title based on our search algorithm. I could do something new, and define a set of terms which would have lower ranking in terms of relevancy or the quicker solution would be to add these terms to our stop word list of terms we ignore (like "the" and "and").  My concern about having them as stop-words would be a search like "retail clothing" if we make "retail" a stopword then the search would be just "clothing" which would then return our "wholesale clothing" category, and would appear to the user to ignore their intent.

          Anyone have experience with this issue, advice?

          Thanks-

          Jordan

        • Richard Beatch
          Jordan, The appropriate response to your question will depend entirely on your search engine. While Marcia is probably correct that your search engine is
          Message 4 of 25 , Jan 9, 2006
          View Source
          • 0 Attachment
            Jordan,

            The appropriate response to your question will depend entirely on
            your search engine. While Marcia is probably correct that your
            search engine is defaulting to a Boolean OR rather than a Boolean
            AND between terms (most do, for good reasons), cooking the query may
            be overkill and a much more complicated solution than the problem
            requires. For example, with Verity, it is fairly easy to modify the
            ranking algorithm such that terms that are near each other result in
            higher rankings. Many other search engines give similar
            functionality, although, all of them handle it a little bit
            differently.

            --Richard

            --- In TaxoCoP@yahoogroups.com, Marcia Morante <marcia@k...> wrote:
            >
            > Jordan, your search engine is probably using the default OR
            operator between
            > words. You can avoid this by some behind the scenes parsing,
            making the two
            > search terms into a proximity statement; i.e., "word 1 must be
            next to word
            > 2 in the order specified" OR "word 1 must be in the same sentence
            as word 2
            > in any order" OR ...... many configurations of this. Check the
            search
            > engine manual for the syntax that it accepts.
            >
            > Good luck.
            >
            > Marcia Morante
            > KCurve, Inc.
            > (718)881-5915 - office
            > (917)821-2087 - mobile
            > http://kcurve.com
            >
            >
            > _____
            >
            > From: TaxoCoP@yahoogroups.com [mailto:TaxoCoP@yahoogroups.com] On
            Behalf Of
            > Seth Earley
            > Sent: Sunday, January 08, 2006 10:08 PM
            > To: TaxoCoP@yahoogroups.com
            > Subject: RE: [TaxoCoP] Term Downweighting
            >
            >
            > Hi Jordan,
            >
            > What search engine are you using? You can make those kinds of
            weighting
            > adjustments with tools like Verity. I would defer to others with
            regard to
            > the exact mechanism.
            >
            > Seth
            >
            >
            > Hi all,
            >
            > I have a question about term downweighting versus stop words.
            >
            > I'm trying to fix the problem where a search on our site
            for "lumber
            > company" returns unrelated categories with "company" in the title
            based on
            > our search algorithm. I could do something new, and define a set
            of terms
            > which would have lower ranking in terms of relevancy or the
            quicker solution
            > would be to add these terms to our stop word list of terms we
            ignore (like
            > "the" and "and"). My concern about having them as stop-words
            would be a
            > search like "retail clothing" if we make "retail" a stopword then
            the search
            > would be just "clothing" which would then return our "wholesale
            clothing"
            > category, and would appear to the user to ignore their intent.
            >
            > Anyone have experience with this issue, advice?
            >
            > Thanks-
            >
            > Jordan
            >
            >
            >
            >
            >
            > _____
            >
            > YAHOO! GROUPS LINKS
            >
            >
            >
            > * Visit your group "TaxoCoP
            <http://groups.yahoo.com/group/TaxoCoP> "
            > on the web.
            >
            >
            > * To unsubscribe from this group, send an email to:
            > TaxoCoP-unsubscribe@yahoogroups.com
            > <mailto:TaxoCoP-unsubscribe@yahoogroups.com?subject=Unsubscribe>
            >
            >
            > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of
            Service
            > <http://docs.yahoo.com/info/terms/> .
            >
            >
            > _____
            >
          • Christine Connors
            Hi Jordan - It is possible to do what you re asking, but the right method should be determined by your use cases. If you re using Verity, I can give you half a
            Message 5 of 25 , Jan 9, 2006
            View Source
            • 0 Attachment
              Hi Jordan -

              It is possible to do what you're asking, but the right method should be determined by your use cases. If you're using Verity, I can give you half a dozen ways to accomplish this. Some are in the VQL (Topics) and some are indexing options. Ping me offline and we can set up a time to talk it through.

              In addition to Marchia's examples, remeber that some engines let you override stop words with a special character. For example, Google lets you use the "+" sign to force include a stop word. In engines with a classification module, you could either make "company" a stop word, or better, tell the engine to rank the word "company" lower in all queries.

              Also in Verity (my obvious experiential bias!), if you have a topic defined for "lumber company" then the engine can invoke the topic, and the business rules you assign that topic, regardless of whether the user consciously browsed a taxonomy OR simply happened to enter those keywords in the search box. So you would simply assign a rule to that topic that say forces the phrase to appear in a certain order, have certain Capitalization, does not stem, or also returns "wood company." Whatever you need. I would suspect that other classification engines do this as well.

              Let me know if I can help!
              Christine

              CJMConnors@...



              Jordan Cassel <jordan_cassel@...> wrote:
              Hi all,

              I have a question about term downweighting versus stop words. 

              I'm trying to fix the problem where a search on our site for "lumber company" returns unrelated categories with "company" in the title based on our search algorithm. I could do something new, and define a set of terms which would have lower ranking in terms of relevancy or the quicker solution would be to add these terms to our stop word list of terms we ignore (like "the" and "and").  My concern about having them as stop-words would be a search like "retail clothing" if we make "retail" a stopword then the search would be just "clothing" which would then return our "wholesale clothing" category, and would appear to the user to ignore their intent.

              Anyone have experience with this issue, advice?

              Thanks-

              Jordan

              Yahoo! Photos � Showcase holiday pictures in hardcover
              Photo Books. You design it and we�ll bind it!


              Yahoo! Photos – Showcase holiday pictures in hardcover
              Photo Books. You design it and we’ll bind it!
            • Jordan Cassel
              Thanks everyone for your comments on this issue. A few have asked which search engine we use, it s FAST Search & Transfer. I m pretty new to it. Marcia Morante
              Message 6 of 25 , Jan 9, 2006
              View Source
              • 0 Attachment
                Thanks everyone for your comments on this issue. A few have asked which search engine we use, it's FAST Search & Transfer. I'm pretty new to it.

                Marcia Morante <marcia@...> wrote:
                Jordan, your search engine is probably using the default OR operator between words.  You can avoid this by some behind the scenes parsing, making the two search terms into a proximity statement; i.e., "word 1 must be next to word 2 in the order specified" OR "word 1 must be in the same sentence as word 2 in any order" OR ...... many configurations of this.  Check the search engine manual for the syntax that it accepts. 
                 
                Good luck.
                 
                Marcia Morante
                KCurve, Inc.
                (718)881-5915 - office
                (917)821-2087 - mobile
                http://kcurve.com
                 


                From: TaxoCoP@yahoogroups.com [mailto:TaxoCoP@yahoogroups.com] On Behalf Of Seth Earley
                Sent: Sunday, January 08, 2006 10:08 PM
                To: TaxoCoP@yahoogroups.com
                Subject: RE: [TaxoCoP] Term Downweighting

                Hi Jordan,
                 
                What search engine are you using?  You can make those kinds of weighting adjustments with tools like Verity.  I would defer to others with regard to the exact mechanism.
                 
                Seth
                 
                Hi all,

                I have a question about term downweighting versus stop words. 

                I'm trying to fix the problem where a search on our site for "lumber company" returns unrelated categories with "company" in the title based on our search algorithm. I could do something new, and define a set of terms which would have lower ranking in terms of relevancy or the quicker solution would be to add these terms to our stop word list of terms we ignore (like "the" and "and").  My concern about having them as stop-words would be a search like "retail clothing" if we make "retail" a stopword then the search would be just "clothing" which would then return our "wholesale clothing" category, and would appear to the user to ignore their intent.

                Anyone have experience with this issue, advice?

                Thanks-

                Jordan


                Yahoo! Photos – Showcase holiday pictures in hardcover
                Photo Books. You design it and we’ll bind it!
              Your message has been successfully submitted and would be delivered to recipients shortly.