168Re: Methodology for assessing results from multiple search engines?
- May 5, 2008
Out of the box relevancy is a difficult thing to judge and only one relevancy factor to consider out of many. I'm guessing that there is no search engine with good-out-of-the-box relevancy unless you are crawling a medium-sized set of high quality, web-only content - and then would that test data realistically represent your company's content? There are so many issues with relevancy that are enterprise specific and can only be ferreted out over time.
- When you initially crawl web content your probably going to get all sorts of anomanalies such as the infinite calanders, duplicate pages, looping pages, error pages (that don't return HTTP errors) and others.
- Users will expect home pages to rank highly yet most home pages don't have a lot of meaningful content or metadata.
- Users are going to expect Google-like results from your enterprise content - so right from the start your survey results are going to be skewed by unrealistic expectations.
- You may find that, even with all of the cool sounding Beysian pattern matching and other wizbang algorithms, that keyword density is still going to overtake relevency and return documents that seem completely irrelevant. For example, we have a departmental web site called "Engineering Test Equipment." Those three words are so overly used in so many documents that the sheer number of times those three words appear in 10,000's of documents outweighs the exact combination of those three words that show up on the web site title once.
Another (possibly better) factor to consider when selecting a search engine vendor is:
- What tools are avalable to create edited content (Best Bets)?
- What tuning tools are available to remove duplicates via the similar URLs and/or based upon the content?
- What tools are available for weighting documents based upon content or URL?
- Will the search engine allow you to adjust relevancy based on distance from the root? (method for ranking homepages higher - works well on some web sites but not others)
- Will the search engine allow you to exclude/include pages based upon parts of the URL, the hostname or content (ie, a list of stop words)?
- What tools/parameters allow you to adjust the relevancy during the query (to broaden or narrow the focus of the query)? For example, we found out that searching for employee data works better with the "relevancy" parameter set to 50. Web content works okay at 70. Product/parts data works better by logically ANDing the search terms together.
Lastly, there is going to be a lot of value you can add in your user interface design that will likely have the biggest impact on relevancy. This is where you can focus the user's activity to provide better relevancy based upon YOUR business and tailored to you business. I heard someone at the ESS talk about guiding your user into an "Advanced Search" without them explicitly using advanced search. That is a great suggestion!
Here are some ways that you're going to improve search relevancy based upon your user interface design:
- Pick your your search paradigm well, whether it be direct, navigational, faceted, contextual or relational or universal (after 2.5 years I don't think we've gotten this figured out yet)
- Providing a single, high-level navigation based upon content type is going to yield better relevancy than a single search box for everything and trying to federate relevancy (in my opinion).
- Grouping content via a taxonomy or categorization
- Allow site/content owners to contribute Best Bets
- Applying search tactically to search-related problems
- Take a "perpetual beta" approach by releasing frequent iterations of the GUI to determine what works and what doesn't work.
I completely agree with Avi's approach so don't misunderstand me. You've got to evaluate relevancy and performance relative to each vendor. I'm just suggesting that good relevancy will best be achieved by the hard work you do after you deploy the platform. Therefore the tools and resources provided by the vendor will be key to tuning the engine to your specific needs.
I think Rennie Walker from Wells Fargo that was one of the people declaring the concept of relevancy as dead. Certainly to the end user relevancy is king but Rennie is right on. The concept of search engine relevancy within the enterprise is tainted by the success of consumer search technologies.
One Relevancy To Rule Them All, One Relevancy To Find Them, One Relevancy To Bring Them All And In The Darkness Bind Them!
- << Previous post in topic Next post in topic >>