Loading ...
Sorry, an error occurred while loading the content.

Recommendation for multiple word search using ysearch

Expand Messages
  • LynnA
    What is the recommended way to query for news articles using multiple words?   I tried using plus signs and double quotes to restrict results to include the
    Message 1 of 4 , Jan 29, 2014
    • 0 Attachment
      What is the recommended way to query for news articles using multiple words?   I tried using plus signs and double quotes to restrict results to include the sub-phrase but yahoo is really doing some kind of keyword search - returning any article with any of the words I provide in quotes.

      Search Query Term = "Purdue University Shooting"
      http://yboss.yahooapis.com/ysearch/news?q=%22Purdue%20University%20Shooting%22&format=xml

      Search Query Term = Purdue+University+Shooting
      http://yboss.yahooapis.com/ysearch/news?q=Purdue%2BUniversity%2BShooting&format=xml

      Although I do not NEED all those words, I would want the articles which have more of them in there higher up in the search results.    Should I be trying to use AND or OR?  I have read on forum those are not strictly followed either.

      Thanks for any suggestions.
    • Alain Désilets
      ... This is a question that keeps being asked, as as far as I can tell, nobody has come up with a definitive answer. Lately, what I have been doing is to - Put
      Message 2 of 4 , Jan 30, 2014
      • 0 Attachment



        On Wed, Jan 29, 2014 at 1:20 PM, LynnA <lynn_bassler@...> wrote:
         

        What is the recommended way to query for news articles using multiple words?   I tried using plus signs and double quotes to restrict results to include the sub-phrase but yahoo is really doing some kind of keyword search - returning any article with any of the words I provide in quotes.

        This is a question that keeps being asked, as as far as I can tell, nobody has come up with a definitive answer.

        Lately, what I have been doing is to
        - Put every word between parens
        - AND all of that
        - Then do post-processing on the Yahoo results, based on the content of the abstract. Basically, prioritise hits whose abstract contains the exact expression searched

        For example, the query for the exact expression "hello world" would be

           q=((hello)AND(world))

        Of course, you have to follow the escaping rules outlined in the Yahoo Boss documentation. My experience with that is that Yahoo will naturally tend to focus on pages that contain "hello" and "world" in close proximity, and that it will provide abstracts where those two words appear consecutively. You can then inspect the abstract of each hit and discard those for which the abstract does not contain "hello world" (or put those hits at the end of the list).

        Oh, and a piece of advice. Whatever you do, write a couple of automated unit tests that query for say, a dozen different examples of the kinds of queries you are interested in, and assert that the results returned by Yahoo is mostly what you expect.

        For example, one of my apps needs to search for exact expressions that appear on an exact web site and in a specific language. So I have unit tests that search for different exact expressions, on different web sites, and different languages and makes sure that at least say, 80% of the hits returned are:

        a) On the requested site
        b) AND are in the right language
        b) AND contain the exact expression in the abstract

        These unit tests have saved us on several occasions where we found out that something had changed in the way that YB behaves, at which point we had to find a slightly different way of formulating our queries.

        Good luck.

         
        Alain Désilets
        Owner, Alpaca Technologies
        alpacatechnologies.com
      • lynn_bassler
        Alain - Sounds as if you have had to build your own result parsing/analyzing rules since the behavior of the API is either not clearly defined, or it is
        Message 3 of 4 , Jan 30, 2014
        • 0 Attachment

          Alain -

          Sounds as if you have had to build your own result parsing/analyzing rules since the behavior of the API is either not clearly defined, or it is dynamic in its implementation.  Keeps us on our toes.

          Thank you for response, I know it is not 'just me' then, and I haven't missed something obvious.

          Lynn

        • Alain Désilets
          ... Yes, but those rules are pretty simple to write. The hardest one was to write a way to check if the content of the abstract contains the exact expression,
          Message 4 of 4 , Jan 30, 2014
          • 0 Attachment



            On Thu, Jan 30, 2014 at 1:08 PM, <lynn_bassler@...> wrote:
             

            Alain -

            Sounds as if you have had to build your own result parsing/analyzing rules since the behavior of the API is either not clearly defined, or it is dynamic in its implementation.  


            Yes, but those rules are pretty simple to write. The hardest one was to write a way to check if the content of the abstract contains the exact expression, with say, give or take 5 characters between each word. But that's just a simple regexp.
             
            Alain
          Your message has been successfully submitted and would be delivered to recipients shortly.