Loading ...
Sorry, an error occurred while loading the content.

Re: [hackers-il] More Google Fun

Expand Messages
  • Nadav Har'El
    ... Orna, the words on that page, and whether s appears in it or not, are quite irrelevant. It s the words in the *link text* to that page that matters, as
    Message 1 of 7 , Nov 27, 2002
    • 0 Attachment
      On Tue, Nov 26, 2002, Orna Agmon wrote about "Re: [hackers-il] More Google Fun":
      > You are right about the 's in that page, but I believe that it is the
      > 's "Richard M. Stallman's article ". You can see that if
      > you go to the cached version google gives:
      > It highlights the searched words.

      Orna, the words on that page, and whether "'s" appears in it or not,
      are quite irrelevant. It's the words in the *link text* to that page that
      matters, as people already said.

      For a longer explanation, here's a part from a message I sent this list
      in May 2001:
      --------------------

      If you're interested to find out how Google works, take a look at
      http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm

      It's an article titled "The Anatomy of a Large-Scale Hypertextual Web
      Search Engine" which explains how Google works. This talk was given in the 7th
      WWW conference (in 1998), by Sergey Brin and Lawrence Page - then two
      CS graduate students in Stanford, and now Google executives.

      The idea is, to put it simply, google doesn't care only about a web-page
      and the words in it - it puts a lot of emphasis on how many (and which)
      other pages link to that page, and what text is used to link to this page.
      So for example, even a very popular condom page (assuming such a page
      exists ;)) is not likely to be linked as A HREF="..." the latex page /A.
      So searching for latex, you are likely to get a page which mentions latex
      (preferably in the title), but more importantly - that many other pages
      (preferably themselves "quality pages" - see the full article for more about
      that) link to this page with the word "latex" in the link page.

      The Google article has a varient of this "latex" test - the "Bill Clinton"
      test. They say how a person search for "Bill Clinton" on a contemporary
      search engine got, as the first result, a page containing only the 3 words:
      "Bill Clinton Suck". After all, 2/3 of all the text of that page was the
      search terms, so it must be highly relevant. Right? Wrong :) Google finds,
      as a first result, whitehouse.gov, because it works by finding the
      Clinton-related site that most people would care enough about to link to,
      and that is of course not the 3-word page.



      --
      Nadav Har'El | Wednesday, Nov 27 2002, 22 Kislev 5763
      nyh@... |-----------------------------------------
      Phone: +972-53-245868, ICQ 13349191 |Bureaucracy, n: A method for transforming
      http://nadav.harel.org.il |energy into solid waste.
    Your message has been successfully submitted and would be delivered to recipients shortly.