231Re: Methodology for assessing results from multiple search engines?
- Aug 18, 2008Avi and all,
I know it's been quite a while since you posted this, but I've been
looking through old posts and trying to determine how to compare the
relevancy of different vendors for in-house testing.
I was wondering if you could expand on "Relevance Ranking - compare
with current clicks." Do you have a suggested way of doing this?
We're in the process of putting together an RFP for new search
software. Since we're a state government (Oregon), we have to go
through a public bid process, which requires assigning scores. We're
planning to do in-house testing of the top vendors and have several
criteria we're planning to use to score vendors. But we're having a
difficult time coming up with a scoring methodology for
Thank you in advance for any advice,
--- In SearchCoP@yahoogroups.com, "Avi Rappoport" <analyst@...> wrote:
> I think it's much easier to compare results than to evaluate just
one search engine. Here's
> my standard process:
> - Create a test suite (use existing search logs if possible)
> -- Simple and complex queries
> -- Spelling, typing and vocabulary errors
> -- Force matching edge-case issues - many matches, few matches, no
> -- Save results pages as HTML for later checking
> - Analyze the differences among them
> -- Variations in indexing
> -- Retrieval & response time
> -- Relevance Ranking - compare with current clicks
> I find using the current search reports for popular queries and
looking at the popular
> results from those queries to be extremely important, as it takes
out most of my personal
> biases and expectations.
> However, I wouldn't give a single number for "relevance" as there's
no way to measure that
> properly. But you may be able to say that one search is relatively
better than another
> within some categories. For example, when I did the article for
Network World, I
> discovered one search engine (no longer sold) that was significantly
worse than the others
> in most kinds of search.
> I hope that helps,
> --- In SearchCoP@yahoogroups.com, "Lee Romero" <pekadad@> wrote:
> > Hi all - I'm familiar with the general process of evaluating software
> > packages against requirements. I'm currently looking at that problem
> > with search engines (as hinted at in my previous posts about the
> > Google Search Appliance).
> > Question for you all - Obviously, search result relevance is one of
> > those requirements that you would use in evaluating a search engine.
> > Do you have any methodology for assessing a search engine in terms of
> > how it ranks results for particular keywords? It's obviously
> > impossible to answer the question, "Does it always get relevance
> > exactly right for all possible searches?" because A) you never know
> > what particular searches users might use and B) each user's
> > expectations of relevance are different (I'm sure there are other
> > things as well that make it impossible to do this :-) ).
> > Anyway - I'm trying to figure out a way to generally compare the
> > results from one engine against another.
> > One thought would be to identify the top searches already used by
> > users (say the top 20 or 50 or 100 or however many you want to deal
> > with). Then ask some type of random sampling of users (how to do that
> > is another question) to try each of those searches for each search
> > engine and provide an assessment of how good the results returned
> > were. (Maybe ask them to score on a scale of 0 = nothing relevant, 1 =
> > a few relevant results, 5 = more relevant results, 10 = all relevant
> > results - forcing them to use that to ensure a spread of the numbers?)
> > Then you can average the perception across each search term to score
> > that search term. Then average across all of the search terms to get
> > a general "score" for relevance for the engine?
> > Seems like a lot of holes in that idea - it's relatively easy to
> > identify the searches, but is it fair to constraint to those? How to
> > identify a test population? Does averaging scores (either one of the
> > two averages above) make sense?
> > Thanks for your insights!
> > Lee Romero
- << Previous post in topic Next post in topic >>