Re: [hackers-il] More Google Fun
- On Tue, Nov 26, 2002, Orna Agmon wrote about "Re: [hackers-il] More Google Fun":
> You are right about the 's in that page, but I believe that it is theOrna, the words on that page, and whether "'s" appears in it or not,
> 's "Richard M. Stallman's article ". You can see that if
> you go to the cached version google gives:
> It highlights the searched words.
are quite irrelevant. It's the words in the *link text* to that page that
matters, as people already said.
For a longer explanation, here's a part from a message I sent this list
in May 2001:
If you're interested to find out how Google works, take a look at
It's an article titled "The Anatomy of a Large-Scale Hypertextual Web
Search Engine" which explains how Google works. This talk was given in the 7th
WWW conference (in 1998), by Sergey Brin and Lawrence Page - then two
CS graduate students in Stanford, and now Google executives.
The idea is, to put it simply, google doesn't care only about a web-page
and the words in it - it puts a lot of emphasis on how many (and which)
other pages link to that page, and what text is used to link to this page.
So for example, even a very popular condom page (assuming such a page
exists ;)) is not likely to be linked as A HREF="..." the latex page /A.
So searching for latex, you are likely to get a page which mentions latex
(preferably in the title), but more importantly - that many other pages
(preferably themselves "quality pages" - see the full article for more about
that) link to this page with the word "latex" in the link page.
The Google article has a varient of this "latex" test - the "Bill Clinton"
test. They say how a person search for "Bill Clinton" on a contemporary
search engine got, as the first result, a page containing only the 3 words:
"Bill Clinton Suck". After all, 2/3 of all the text of that page was the
search terms, so it must be highly relevant. Right? Wrong :) Google finds,
as a first result, whitehouse.gov, because it works by finding the
Clinton-related site that most people would care enough about to link to,
and that is of course not the 3-word page.
Nadav Har'El | Wednesday, Nov 27 2002, 22 Kislev 5763
Phone: +972-53-245868, ICQ 13349191 |Bureaucracy, n: A method for transforming
http://nadav.harel.org.il |energy into solid waste.