Loading ...
Sorry, an error occurred while loading the content.

Re: Need help to develop Search Engine

Expand Messages
  • Vinu Thomas
    How much data are you looking at searching? If the data is large, PHP and a database backend may not be the right tool for the job. As Jegan mentioned you
    Message 1 of 7 , Sep 19, 2008
    • 0 Attachment
      How much data are you looking at searching? If the data is large, PHP
      and a database backend may not be the right tool for the job. As
      Jegan mentioned you could look at one of Apache's Lucene variants to
      do the search for you. Here's an article from IBM showing you how to
      do this using PHP and Apache Solr.
      http://www.ibm.com/developerworks/library/os-php-apachesolr/index.html

      Regards,
      Vinu

      --- In bang-phpug@yahoogroups.com, "Prakash RR" <iddrrp@...> wrote:
      >
      > Dear Jegan, Thats built on Java. I need it in PHP or Perl or Python
      with
      > MySQL or Postgres as back end...
      >
      > Prakash
      >
      > On Fri, Sep 19, 2008 at 10:27 AM, jegan <a.jegan@...> wrote:
      >
      > > look at http://lucene.apache.org/nutch/about.html
      > > a apache open source search engine which u can integrate with ur
      > > application ...
      > >
      > > Jegan
      > >
      > >
      > > On Thu, Sep 18, 2008 at 1:24 PM, Prakash RR
      <iddrrp@...<iddrrp%40gmail.com>>
      > > wrote:
      > >
      > > > Dear Reader,
      > > > My MD told me to develop a complete Search Engine for our client
      > > websites.
      > > > I have to create this application in PHP or Perl or Python and
      database
      > > > should be MySQL or Post gre SQL. I know PHP and MySQL...
      > > >
      > > > 1. Data has to be stored in one single location (around 35 websites)
      > > > 2. Custom search in each client's website which has to get data
      from that
      > > > single location
      > > > 3. Performance should be high
      > > > 4. No data redundancy
      > > > 5. Crawler should check websites for at least twice in a month
      to find
      > > > new or updated content simultaneously in all websites (threading)
      > > > 6. Ranking the crawled page
      > > > 7. Provision to get results from more than 1 website at a time.
      > > >
      > > > I am good in PHP and MySQL.
      > > > I developed a Crawler, URL Normalizer and HTML Parser in PHP.
      But all are
      > > > not multi threading. Even i will found some logic to remove
      duplicate
      > > > content, to find new updates happened in web pages.
      > > >
      > > > I need guidance on the following
      > > >
      > > > - Organizing the web pages (cache, if possible)
      > > > - Ranking web pages
      > > > - Multi threading all the operations
      > > > - Performance checking
      > > > - Do I need to learn Perl or Python to perform some operations
      > > >
      > > > Please guide me...
      > > > Thanks in advance to all.
      > > >
      > > > Thanks and Regards,
      > > > Prakash
      > > >
      > > > [Non-text portions of this message have been removed]
      > > >
      > > >
      > > >
      > >
      > > [Non-text portions of this message have been removed]
      > >
      > >
      > >
      >
      >
      > [Non-text portions of this message have been removed]
      >
    Your message has been successfully submitted and would be delivered to recipients shortly.