Loading ...
Sorry, an error occurred while loading the content.

Dealing with your company's name in content / search terms

Expand Messages
  • Lee Romero
    Hi all - on an intranet, it s very likely that one of the most common words in the content being indexed for your search is the name of your company. I work
    Message 1 of 4 , May 14, 2012
    • 0 Attachment
      Hi all - on an intranet, it's very likely that one of the most common
      words in the content being indexed for your search is the name of your
      company. I work for Deloitte (technically, Deloitte Touche Tohmatsu
      Limited, but it's commonly referred to as "Deloitte" even though
      that's not completely accurate) and so the word "Deloitte" appears in
      pretty much just about every piece of content that's indexed in our
      search.

      The effect of this is that in our search, "Deloitte" kinds of behaves
      like a stop word. Not technically, but it likely adds so little in
      terms of differentiation of what a user is looking for that it might
      as well be a stop word.

      The challenge I see is that many search terms used by users might also
      contain that word. Often with just one more word - "Deloitte
      Consulting", or "About Deloitte", etc.

      My question: Do you have any good strategies for how to improve the
      search relevance for searches that use very common terms in your
      content, such as your company's name?

      Are manually-managed best bets (or similar functionality) the only alternative?

      Thanks!
      Lee Romero
    • Seth Earley
      Hi Lee, I was thinking Best Bets as I was reading. That would be my approach. Either that or a content model with those terms tagged in specific metadata
      Message 2 of 4 , May 14, 2012
      • 0 Attachment

        Hi Lee,

         

        I was thinking Best Bets as I was reading.  That would be my approach.   Either that or a content model with those terms tagged in specific metadata fields with searches scoped or limited to those fields as opposed to full text (“Deloitte” would be a stop word in full text search)

         

        That would be my approach

         

        Seth

         

        Seth Earley

        CEO
        _____________________________

        EARLEY & ASSOCIATES, Inc.
        Cell: 781-820-8080

        Email: seth@...  

        Web: www.earley.com

         

        Follow me on twitter: sethearley

        Connect with me on  LinkedIn: www.linkedin.com/in/sethearley   

         

        From: SearchCoP@yahoogroups.com [mailto:SearchCoP@yahoogroups.com] On Behalf Of Lee Romero
        Sent: Monday, May 14, 2012 7:08 AM
        To: searchcop
        Subject: [SearchCoP] Dealing with your company's name in content / search terms

         

         

        Hi all - on an intranet, it's very likely that one of the most common
        words in the content being indexed for your search is the name of your
        company. I work for Deloitte (technically, Deloitte Touche Tohmatsu
        Limited, but it's commonly referred to as "Deloitte" even though
        that's not completely accurate) and so the word "Deloitte" appears in
        pretty much just about every piece of content that's indexed in our
        search.

        The effect of this is that in our search, "Deloitte" kinds of behaves
        like a stop word. Not technically, but it likely adds so little in
        terms of differentiation of what a user is looking for that it might
        as well be a stop word.

        The challenge I see is that many search terms used by users might also
        contain that word. Often with just one more word - "Deloitte
        Consulting", or "About Deloitte", etc.

        My question: Do you have any good strategies for how to improve the
        search relevance for searches that use very common terms in your
        content, such as your company's name?

        Are manually-managed best bets (or similar functionality) the only alternative?

        Thanks!
        Lee Romero

      • Walter Underwood
        First, tf.idf does this for you, automatically. IDF weights terms by how common or rare they are in your documents. So, you may need to do nothing. Second, IDF
        Message 3 of 4 , May 14, 2012
        • 0 Attachment
          First, tf.idf does this for you, automatically. IDF weights terms by how common or rare they are in your documents.

          So, you may need to do nothing.

          Second, IDF for phrases is very good for this, in fact, good for relevance in general. If a phrase is rare and selective, it will have a high IDF, even if it is made of common words. I don't know of any engine that has phrase IDF by default. Ultraseek did have it, but I don't think that is sold any more.

          Solr dismax and edismax can be configured to have a higher weight for phrase matches. This may help.

          If you can handle this algorithmically, do that. Best Bets are labor-intensive, because they need to be re-checked and updated by hand regularly. You do NOT want a stale best bet.

          wunder
          Walter Underwood
          former Infoseek, Inktomi, Verity, Netflix
          now Chegg Search Guy

          On May 14, 2012, at 5:58 AM, Seth Earley wrote:

           

          Hi Lee,

           

          I was thinking Best Bets as I was reading.  That would be my approach.   Either that or a content model with those terms tagged in specific metadata fields with searches scoped or limited to those fields as opposed to full text (“Deloitte” would be a stop word in full text search)

           

          That would be my approach

           

          Seth

           

          Seth Earley

          CEO
          _____________________________

          EARLEY & ASSOCIATES, Inc.
          Cell: 781-820-8080

          Email: seth@...  

          Web: www.earley.com

           

          Follow me on twitter: sethearley

          Connect with me on  LinkedIn: www.linkedin.com/in/sethearley   

           

          From: SearchCoP@yahoogroups.com [mailto:SearchCoP@yahoogroups.com] On Behalf Of Lee Romero
          Sent: Monday, May 14, 2012 7:08 AM
          To: searchcop
          Subject: [SearchCoP] Dealing with your company's name in content / search terms

           

           

          Hi all - on an intranet, it's very likely that one of the most common
          words in the content being indexed for your search is the name of your
          company. I work for Deloitte (technically, Deloitte Touche Tohmatsu
          Limited, but it's commonly referred to as "Deloitte" even though
          that's not completely accurate) and so the word "Deloitte" appears in
          pretty much just about every piece of content that's indexed in our
          search.

          The effect of this is that in our search, "Deloitte" kinds of behaves
          like a stop word. Not technically, but it likely adds so little in
          terms of differentiation of what a user is looking for that it might
          as well be a stop word.

          The challenge I see is that many search terms used by users might also
          contain that word. Often with just one more word - "Deloitte
          Consulting", or "About Deloitte", etc.

          My question: Do you have any good strategies for how to improve the
          search relevance for searches that use very common terms in your
          content, such as your company's name?

          Are manually-managed best bets (or similar functionality) the only alternative?

          Thanks!
          Lee Romero






        • Lee Romero
          Thanks for the replies, Seth and Walter. Walter - in this situation, if I m understanding it correctly, the tf.idf will be thwarted to some extent because the
          Message 4 of 4 , May 14, 2012
          • 0 Attachment
            Thanks for the replies, Seth and Walter.

            Walter - in this situation, if I'm understanding it correctly, the tf.idf will be thwarted to some extent because the word in question is going to be very common throughout the whole corpus of content being indexed.  So the weight of a company name is likely very little (I think) because it is in many documents, though in any one document it might not be very common (occurring in cover pages/ titles, in footers, etc.).  Can you elaborate in case I'm missing something?

            Seth - the idea of having a stop word apply to the full content and not to the metadata is interesting.  I'll have to see if we can do that with our engine; we're using Coveo - anyone happen to know whether stop words can be defined that only apply to content?

            Thanks again for your help.  Anyone else have any thoughts or ideas?

            Regards
            Lee Romero

            On Mon, May 14, 2012 at 11:33 AM, Walter Underwood <wunder@...> wrote:


            First, tf.idf does this for you, automatically. IDF weights terms by how common or rare they are in your documents.

            So, you may need to do nothing.

            Second, IDF for phrases is very good for this, in fact, good for relevance in general. If a phrase is rare and selective, it will have a high IDF, even if it is made of common words. I don't know of any engine that has phrase IDF by default. Ultraseek did have it, but I don't think that is sold any more.

            Solr dismax and edismax can be configured to have a higher weight for phrase matches. This may help.

            If you can handle this algorithmically, do that. Best Bets are labor-intensive, because they need to be re-checked and updated by hand regularly. You do NOT want a stale best bet.

            wunder
            Walter Underwood
            former Infoseek, Inktomi, Verity, Netflix
            now Chegg Search Guy

            On May 14, 2012, at 5:58 AM, Seth Earley wrote:

             

            Hi Lee,

             

            I was thinking Best Bets as I was reading.  That would be my approach.   Either that or a content model with those terms tagged in specific metadata fields with searches scoped or limited to those fields as opposed to full text (“Deloitte” would be a stop word in full text search)

             

            That would be my approach

             

            Seth

             

            Seth Earley

            CEO
            _____________________________

            EARLEY & ASSOCIATES, Inc.
            Cell: 781-820-8080

            Email: seth@...  

            Web: www.earley.com

             

            Follow me on twitter: sethearley

            Connect with me on  LinkedIn: www.linkedin.com/in/sethearley   

             

            From: SearchCoP@yahoogroups.com [mailto:SearchCoP@yahoogroups.com] On Behalf Of Lee Romero
            Sent: Monday, May 14, 2012 7:08 AM
            To: searchcop
            Subject: [SearchCoP] Dealing with your company's name in content / search terms

             

             

            Hi all - on an intranet, it's very likely that one of the most common
            words in the content being indexed for your search is the name of your
            company. I work for Deloitte (technically, Deloitte Touche Tohmatsu
            Limited, but it's commonly referred to as "Deloitte" even though
            that's not completely accurate) and so the word "Deloitte" appears in
            pretty much just about every piece of content that's indexed in our
            search.

            The effect of this is that in our search, "Deloitte" kinds of behaves
            like a stop word. Not technically, but it likely adds so little in
            terms of differentiation of what a user is looking for that it might
            as well be a stop word.

            The challenge I see is that many search terms used by users might also
            contain that word. Often with just one more word - "Deloitte
            Consulting", or "About Deloitte", etc.

            My question: Do you have any good strategies for how to improve the
            search relevance for searches that use very common terms in your
            content, such as your company's name?

            Are manually-managed best bets (or similar functionality) the only alternative?

            Thanks!
            Lee Romero



          Your message has been successfully submitted and would be delivered to recipients shortly.