431Re: [SearchCoP] Dealing with your company's name in content / search terms
- May 14, 2012Thanks for the replies, Seth and Walter.Walter - in this situation, if I'm understanding it correctly, the tf.idf will be thwarted to some extent because the word in question is going to be very common throughout the whole corpus of content being indexed. So the weight of a company name is likely very little (I think) because it is in many documents, though in any one document it might not be very common (occurring in cover pages/ titles, in footers, etc.). Can you elaborate in case I'm missing something?Seth - the idea of having a stop word apply to the full content and not to the metadata is interesting. I'll have to see if we can do that with our engine; we're using Coveo - anyone happen to know whether stop words can be defined that only apply to content?Thanks again for your help. Anyone else have any thoughts or ideas?RegardsLee RomeroOn Mon, May 14, 2012 at 11:33 AM, Walter Underwood <wunder@...> wrote:
First, tf.idf does this for you, automatically. IDF weights terms by how common or rare they are in your documents.So, you may need to do nothing.Second, IDF for phrases is very good for this, in fact, good for relevance in general. If a phrase is rare and selective, it will have a high IDF, even if it is made of common words. I don't know of any engine that has phrase IDF by default. Ultraseek did have it, but I don't think that is sold any more.Solr dismax and edismax can be configured to have a higher weight for phrase matches. This may help.If you can handle this algorithmically, do that. Best Bets are labor-intensive, because they need to be re-checked and updated by hand regularly. You do NOT want a stale best bet.wunderWalter Underwoodformer Infoseek, Inktomi, Verity, Netflixnow Chegg Search GuyOn May 14, 2012, at 5:58 AM, Seth Earley wrote:
I was thinking Best Bets as I was reading. That would be my approach. Either that or a content model with those terms tagged in specific metadata fields with searches scoped or limited to those fields as opposed to full text (“Deloitte” would be a stop word in full text search)
That would be my approach
EARLEY & ASSOCIATES, Inc.
Follow me on twitter: sethearley
Connect with me on LinkedIn: www.linkedin.com/in/sethearley
Hi all - on an intranet, it's very likely that one of the most common
words in the content being indexed for your search is the name of your
company. I work for Deloitte (technically, Deloitte Touche Tohmatsu
Limited, but it's commonly referred to as "Deloitte" even though
that's not completely accurate) and so the word "Deloitte" appears in
pretty much just about every piece of content that's indexed in our
The effect of this is that in our search, "Deloitte" kinds of behaves
like a stop word. Not technically, but it likely adds so little in
terms of differentiation of what a user is looking for that it might
as well be a stop word.
The challenge I see is that many search terms used by users might also
contain that word. Often with just one more word - "Deloitte
Consulting", or "About Deloitte", etc.
My question: Do you have any good strategies for how to improve the
search relevance for searches that use very common terms in your
content, such as your company's name?
Are manually-managed best bets (or similar functionality) the only alternative?
- << Previous post in topic