Sphider - a lightweight search engine in PHP
Sphider is a lightweight web spider and search engine written in PHP, using
MySQL as its back end database. It is suitable for adding search
functionality to small or medium sites (up to around 100,000 pages). It also
works great as a tool for site analysis - finding broken links, gathering
statistics about the site etc.
Spidering and indexing
Full text indexing.
Can index both static and dynamic pages.
Finds links in <a href=...>, <frame ...>, <area ...> and <meta ...> tags,
Respects robots.txt protocol.
Follows server side redirections.
Allows spidering to be limited by depth (ie maximum number of clicks from
the starting page), by (sub)domain or by directory.
Supports indexing of pdf and doc files (using external binaries for file
Allows resuming paused spidering.
Possbility to exclude common words from being indexed.
Sophisticated administrator interface
Supports AND, OR and phrase searches
Supports excluding words (by putting a '-' in front of a word, any page
including the word will be omitted from the results).
Option to add and group sites into categories
Possibility to limit searching to a given category and its subcategories.
"Did you mean" search suggestion on mistyped queries.
Context-sensitive auto-completion on search terms (la Google Suggest)
Word stemming for english (searching for "run" finds "running", "runs" etc)
Size and speed
Sphider uses regular expressions to extract links from webpages, so indexing
is not particularly fast. Searching is quite fast, if the database size is
reasonable. Code base is very small, probably making it the smallest search
engine with such functionality out there.
[Non-text portions of this message have been removed]