Loading ...
Sorry, an error occurred while loading the content.

231Re: Methodology for assessing results from multiple search engines?

Expand Messages
  • crystalkoregon
    Aug 18, 2008
    • 0 Attachment
      Avi and all,

      I know it's been quite a while since you posted this, but I've been
      looking through old posts and trying to determine how to compare the
      relevancy of different vendors for in-house testing.

      I was wondering if you could expand on "Relevance Ranking - compare
      with current clicks." Do you have a suggested way of doing this?

      We're in the process of putting together an RFP for new search
      software. Since we're a state government (Oregon), we have to go
      through a public bid process, which requires assigning scores. We're
      planning to do in-house testing of the top vendors and have several
      criteria we're planning to use to score vendors. But we're having a
      difficult time coming up with a scoring methodology for

      Thank you in advance for any advice,
      Crystal Knapp

      --- In SearchCoP@yahoogroups.com, "Avi Rappoport" <analyst@...> wrote:
      > I think it's much easier to compare results than to evaluate just
      one search engine. Here's
      > my standard process:
      > - Create a test suite (use existing search logs if possible)
      > -- Simple and complex queries
      > -- Spelling, typing and vocabulary errors
      > -- Force matching edge-case issues - many matches, few matches, no
      > -- Save results pages as HTML for later checking
      > - Analyze the differences among them
      > -- Variations in indexing
      > -- Retrieval & response time
      > -- Relevance Ranking - compare with current clicks
      > I find using the current search reports for popular queries and
      looking at the popular
      > results from those queries to be extremely important, as it takes
      out most of my personal
      > biases and expectations.
      > However, I wouldn't give a single number for "relevance" as there's
      no way to measure that
      > properly. But you may be able to say that one search is relatively
      better than another
      > within some categories. For example, when I did the article for
      Network World, I
      > discovered one search engine (no longer sold) that was significantly
      worse than the others
      > in most kinds of search.
      > I hope that helps,
      > Avi
      > --- In SearchCoP@yahoogroups.com, "Lee Romero" <pekadad@> wrote:
      > >
      > > Hi all - I'm familiar with the general process of evaluating software
      > > packages against requirements. I'm currently looking at that problem
      > > with search engines (as hinted at in my previous posts about the
      > > Google Search Appliance).
      > >
      > > Question for you all - Obviously, search result relevance is one of
      > > those requirements that you would use in evaluating a search engine.
      > > Do you have any methodology for assessing a search engine in terms of
      > > how it ranks results for particular keywords? It's obviously
      > > impossible to answer the question, "Does it always get relevance
      > > exactly right for all possible searches?" because A) you never know
      > > what particular searches users might use and B) each user's
      > > expectations of relevance are different (I'm sure there are other
      > > things as well that make it impossible to do this :-) ).
      > >
      > > Anyway - I'm trying to figure out a way to generally compare the
      > > results from one engine against another.
      > >
      > > One thought would be to identify the top searches already used by
      > > users (say the top 20 or 50 or 100 or however many you want to deal
      > > with). Then ask some type of random sampling of users (how to do that
      > > is another question) to try each of those searches for each search
      > > engine and provide an assessment of how good the results returned
      > > were. (Maybe ask them to score on a scale of 0 = nothing relevant, 1 =
      > > a few relevant results, 5 = more relevant results, 10 = all relevant
      > > results - forcing them to use that to ensure a spread of the numbers?)
      > >
      > > Then you can average the perception across each search term to score
      > > that search term. Then average across all of the search terms to get
      > > a general "score" for relevance for the engine?
      > >
      > > Seems like a lot of holes in that idea - it's relatively easy to
      > > identify the searches, but is it fair to constraint to those? How to
      > > identify a test population? Does averaging scores (either one of the
      > > two averages above) make sense?
      > >
      > > Thanks for your insights!
      > > Lee Romero
      > >
    • Show all 12 messages in this topic