Recommendation for multiple word search using ysearch
- What is the recommended way to query for news articles using multiple words? I tried using plus signs and double quotes to restrict results to include the sub-phrase but yahoo is really doing some kind of keyword search - returning any article with any of the words I provide in quotes.Search Query Term = "Purdue University Shooting"http://yboss.yahooapis.com/ysearch/news?q=%22Purdue%20University%20Shooting%22&format=xmlSearch Query Term = Purdue+University+Shootinghttp://yboss.yahooapis.com/ysearch/news?q=Purdue%2BUniversity%2BShooting&format=xmlAlthough I do not NEED all those words, I would want the articles which have more of them in there higher up in the search results. Should I be trying to use AND or OR? I have read on forum those are not strictly followed either.Thanks for any suggestions.
- On Wed, Jan 29, 2014 at 1:20 PM, LynnA <lynn_bassler@...> wrote:What is the recommended way to query for news articles using multiple words? I tried using plus signs and double quotes to restrict results to include the sub-phrase but yahoo is really doing some kind of keyword search - returning any article with any of the words I provide in quotes.This is a question that keeps being asked, as as far as I can tell, nobody has come up with a definitive answer.Lately, what I have been doing is to- Put every word between parens- AND all of that- Then do post-processing on the Yahoo results, based on the content of the abstract. Basically, prioritise hits whose abstract contains the exact expression searched
For example, the query for the exact expression "hello world" would beq=((hello)AND(world))Of course, you have to follow the escaping rules outlined in the Yahoo Boss documentation. My experience with that is that Yahoo will naturally tend to focus on pages that contain "hello" and "world" in close proximity, and that it will provide abstracts where those two words appear consecutively. You can then inspect the abstract of each hit and discard those for which the abstract does not contain "hello world" (or put those hits at the end of the list).
Oh, and a piece of advice. Whatever you do, write a couple of automated unit tests that query for say, a dozen different examples of the kinds of queries you are interested in, and assert that the results returned by Yahoo is mostly what you expect.
For example, one of my apps needs to search for exact expressions that appear on an exact web site and in a specific language. So I have unit tests that search for different exact expressions, on different web sites, and different languages and makes sure that at least say, 80% of the hits returned are:
a) On the requested siteb) AND are in the right languageb) AND contain the exact expression in the abstractThese unit tests have saved us on several occasions where we found out that something had changed in the way that YB behaves, at which point we had to find a slightly different way of formulating our queries.
Sounds as if you have had to build your own result parsing/analyzing rules since the behavior of the API is either not clearly defined, or it is dynamic in its implementation. Keeps us on our toes.
Thank you for response, I know it is not 'just me' then, and I haven't missed something obvious.
- On Thu, Jan 30, 2014 at 1:08 PM, <lynn_bassler@...> wrote:
Sounds as if you have had to build your own result parsing/analyzing rules since the behavior of the API is either not clearly defined, or it is dynamic in its implementation.Yes, but those rules are pretty simple to write. The hardest one was to write a way to check if the content of the abstract contains the exact expression, with say, give or take 5 characters between each word. But that's just a simple regexp.Alain