Loading ...
Sorry, an error occurred while loading the content.

Need help to develop Search Engine

Expand Messages
  • Prakash RR
    Dear Reader, My MD told me to develop a complete Search Engine for our client websites. I have to create this application in PHP or Perl or Python and database
    Message 1 of 7 , Sep 18, 2008
    • 0 Attachment
      Dear Reader,
      My MD told me to develop a complete Search Engine for our client websites.
      I have to create this application in PHP or Perl or Python and database
      should be MySQL or Post gre SQL. I know PHP and MySQL...


      1. Data has to be stored in one single location (around 35 websites)
      2. Custom search in each client's website which has to get data from that
      single location
      3. Performance should be high
      4. No data redundancy
      5. Crawler should check websites for at least twice in a month to find
      new or updated content simultaneously in all websites (threading)
      6. Ranking the crawled page
      7. Provision to get results from more than 1 website at a time.


      I am good in PHP and MySQL.
      I developed a Crawler, URL Normalizer and HTML Parser in PHP. But all are
      not multi threading. Even i will found some logic to remove duplicate
      content, to find new updates happened in web pages.

      I need guidance on the following

      - Organizing the web pages (cache, if possible)
      - Ranking web pages
      - Multi threading all the operations
      - Performance checking
      - Do I need to learn Perl or Python to perform some operations

      Please guide me...
      Thanks in advance to all.

      Thanks and Regards,
      Prakash


      [Non-text portions of this message have been removed]
    • Shiva Kumar Mallikarjun
      Hi, look into zend_search_lucene library. ... From: Prakash RR Subject: [bang-phpug] Need help to develop Search Engine To:
      Message 2 of 7 , Sep 18, 2008
      • 0 Attachment
        Hi,

        look into zend_search_lucene library.


        --- On Thu, 18/9/08, Prakash RR <iddrrp@...> wrote:
        From: Prakash RR <iddrrp@...>
        Subject: [bang-phpug] Need help to develop Search Engine
        To: yws-search-general@yahoogroups.com, bang-phpug@yahoogroups.com
        Date: Thursday, 18 September, 2008, 7:54 AM











        Dear Reader,

        My MD told me to develop a complete Search Engine for our client websites..

        I have to create this application in PHP or Perl or Python and database

        should be MySQL or Post gre SQL. I know PHP and MySQL...



        1. Data has to be stored in one single location (around 35 websites)

        2. Custom search in each client's website which has to get data from that

        single location

        3. Performance should be high

        4. No data redundancy

        5. Crawler should check websites for at least twice in a month to find

        new or updated content simultaneously in all websites (threading)

        6. Ranking the crawled page

        7. Provision to get results from more than 1 website at a time.



        I am good in PHP and MySQL.

        I developed a Crawler, URL Normalizer and HTML Parser in PHP. But all are

        not multi threading. Even i will found some logic to remove duplicate

        content, to find new updates happened in web pages.



        I need guidance on the following



        - Organizing the web pages (cache, if possible)

        - Ranking web pages

        - Multi threading all the operations

        - Performance checking

        - Do I need to learn Perl or Python to perform some operations



        Please guide me...

        Thanks in advance to all.



        Thanks and Regards,

        Prakash



        [Non-text portions of this message have been removed]



























        Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/

        [Non-text portions of this message have been removed]
      • Viswanath Somanchi
        Using PERL will be better idea to develop search engine. It is efficient tha PHP in writing crawlers and other stuff related to search engine. U can get lot of
        Message 3 of 7 , Sep 18, 2008
        • 0 Attachment
          Using PERL will be better idea to develop search engine. It is efficient tha
          PHP in writing crawlers and other stuff related to search engine.
          U can get lot of reusable code from CPAN.

          On Thu, Sep 18, 2008 at 1:24 PM, Prakash RR <iddrrp@...> wrote:

          > Dear Reader,
          > My MD told me to develop a complete Search Engine for our client websites.
          > I have to create this application in PHP or Perl or Python and database
          > should be MySQL or Post gre SQL. I know PHP and MySQL...
          >
          > 1. Data has to be stored in one single location (around 35 websites)
          > 2. Custom search in each client's website which has to get data from that
          > single location
          > 3. Performance should be high
          > 4. No data redundancy
          > 5. Crawler should check websites for at least twice in a month to find
          > new or updated content simultaneously in all websites (threading)
          > 6. Ranking the crawled page
          > 7. Provision to get results from more than 1 website at a time.
          >
          > I am good in PHP and MySQL.
          > I developed a Crawler, URL Normalizer and HTML Parser in PHP. But all are
          > not multi threading. Even i will found some logic to remove duplicate
          > content, to find new updates happened in web pages.
          >
          > I need guidance on the following
          >
          > - Organizing the web pages (cache, if possible)
          > - Ranking web pages
          > - Multi threading all the operations
          > - Performance checking
          > - Do I need to learn Perl or Python to perform some operations
          >
          > Please guide me...
          > Thanks in advance to all.
          >
          > Thanks and Regards,
          > Prakash
          >
          > [Non-text portions of this message have been removed]
          >
          >
          >


          [Non-text portions of this message have been removed]
        • jegan
          look at http://lucene.apache.org/nutch/about.html a apache open source search engine which u can integrate with ur application ... Jegan ... [Non-text
          Message 4 of 7 , Sep 18, 2008
          • 0 Attachment
            look at http://lucene.apache.org/nutch/about.html
            a apache open source search engine which u can integrate with ur
            application ...

            Jegan

            On Thu, Sep 18, 2008 at 1:24 PM, Prakash RR <iddrrp@...> wrote:

            > Dear Reader,
            > My MD told me to develop a complete Search Engine for our client websites.
            > I have to create this application in PHP or Perl or Python and database
            > should be MySQL or Post gre SQL. I know PHP and MySQL...
            >
            > 1. Data has to be stored in one single location (around 35 websites)
            > 2. Custom search in each client's website which has to get data from that
            > single location
            > 3. Performance should be high
            > 4. No data redundancy
            > 5. Crawler should check websites for at least twice in a month to find
            > new or updated content simultaneously in all websites (threading)
            > 6. Ranking the crawled page
            > 7. Provision to get results from more than 1 website at a time.
            >
            > I am good in PHP and MySQL.
            > I developed a Crawler, URL Normalizer and HTML Parser in PHP. But all are
            > not multi threading. Even i will found some logic to remove duplicate
            > content, to find new updates happened in web pages.
            >
            > I need guidance on the following
            >
            > - Organizing the web pages (cache, if possible)
            > - Ranking web pages
            > - Multi threading all the operations
            > - Performance checking
            > - Do I need to learn Perl or Python to perform some operations
            >
            > Please guide me...
            > Thanks in advance to all.
            >
            > Thanks and Regards,
            > Prakash
            >
            > [Non-text portions of this message have been removed]
            >
            >
            >


            [Non-text portions of this message have been removed]
          • Prakash RR
            Dear Jegan, Thats built on Java. I need it in PHP or Perl or Python with MySQL or Postgres as back end... Prakash ... [Non-text portions of this message have
            Message 5 of 7 , Sep 19, 2008
            • 0 Attachment
              Dear Jegan, Thats built on Java. I need it in PHP or Perl or Python with
              MySQL or Postgres as back end...

              Prakash

              On Fri, Sep 19, 2008 at 10:27 AM, jegan <a.jegan@...> wrote:

              > look at http://lucene.apache.org/nutch/about.html
              > a apache open source search engine which u can integrate with ur
              > application ...
              >
              > Jegan
              >
              >
              > On Thu, Sep 18, 2008 at 1:24 PM, Prakash RR <iddrrp@...<iddrrp%40gmail.com>>
              > wrote:
              >
              > > Dear Reader,
              > > My MD told me to develop a complete Search Engine for our client
              > websites.
              > > I have to create this application in PHP or Perl or Python and database
              > > should be MySQL or Post gre SQL. I know PHP and MySQL...
              > >
              > > 1. Data has to be stored in one single location (around 35 websites)
              > > 2. Custom search in each client's website which has to get data from that
              > > single location
              > > 3. Performance should be high
              > > 4. No data redundancy
              > > 5. Crawler should check websites for at least twice in a month to find
              > > new or updated content simultaneously in all websites (threading)
              > > 6. Ranking the crawled page
              > > 7. Provision to get results from more than 1 website at a time.
              > >
              > > I am good in PHP and MySQL.
              > > I developed a Crawler, URL Normalizer and HTML Parser in PHP. But all are
              > > not multi threading. Even i will found some logic to remove duplicate
              > > content, to find new updates happened in web pages.
              > >
              > > I need guidance on the following
              > >
              > > - Organizing the web pages (cache, if possible)
              > > - Ranking web pages
              > > - Multi threading all the operations
              > > - Performance checking
              > > - Do I need to learn Perl or Python to perform some operations
              > >
              > > Please guide me...
              > > Thanks in advance to all.
              > >
              > > Thanks and Regards,
              > > Prakash
              > >
              > > [Non-text portions of this message have been removed]
              > >
              > >
              > >
              >
              > [Non-text portions of this message have been removed]
              >
              >
              >


              [Non-text portions of this message have been removed]
            • jegan
              HI, I did a proj like this in where the java part nutch web crawler run background as a cron job regualrly and update database .And php mysql will get the
              Message 6 of 7 , Sep 19, 2008
              • 0 Attachment
                HI,
                I did a proj like this in where the java part nutch web crawler run
                background as a cron job regualrly and update database .And php mysql will
                get the data from it. Just do some research on this . try 2 use nutch -
                famous open source web search engine .Also think other possiblities.

                Jegan a

                On Fri, Sep 19, 2008 at 5:33 PM, Prakash RR <iddrrp@...> wrote:

                > Dear Jegan, Thats built on Java. I need it in PHP or Perl or Python with
                > MySQL or Postgres as back end...
                >
                > Prakash
                >
                >
                > On Fri, Sep 19, 2008 at 10:27 AM, jegan <a.jegan@...<a.jegan%40gmail.com>>
                > wrote:
                >
                > > look at http://lucene.apache.org/nutch/about.html
                > > a apache open source search engine which u can integrate with ur
                > > application ...
                > >
                > > Jegan
                > >
                > >
                > > On Thu, Sep 18, 2008 at 1:24 PM, Prakash RR <iddrrp@...<iddrrp%40gmail.com>
                > <iddrrp%40gmail.com>>
                >
                > > wrote:
                > >
                > > > Dear Reader,
                > > > My MD told me to develop a complete Search Engine for our client
                > > websites.
                > > > I have to create this application in PHP or Perl or Python and database
                > > > should be MySQL or Post gre SQL. I know PHP and MySQL...
                > > >
                > > > 1. Data has to be stored in one single location (around 35 websites)
                > > > 2. Custom search in each client's website which has to get data from
                > that
                > > > single location
                > > > 3. Performance should be high
                > > > 4. No data redundancy
                > > > 5. Crawler should check websites for at least twice in a month to find
                > > > new or updated content simultaneously in all websites (threading)
                > > > 6. Ranking the crawled page
                > > > 7. Provision to get results from more than 1 website at a time.
                > > >
                > > > I am good in PHP and MySQL.
                > > > I developed a Crawler, URL Normalizer and HTML Parser in PHP. But all
                > are
                > > > not multi threading. Even i will found some logic to remove duplicate
                > > > content, to find new updates happened in web pages.
                > > >
                > > > I need guidance on the following
                > > >
                > > > - Organizing the web pages (cache, if possible)
                > > > - Ranking web pages
                > > > - Multi threading all the operations
                > > > - Performance checking
                > > > - Do I need to learn Perl or Python to perform some operations
                > > >
                > > > Please guide me...
                > > > Thanks in advance to all.
                > > >
                > > > Thanks and Regards,
                > > > Prakash
                > > >
                > > > [Non-text portions of this message have been removed]
                > > >
                > > >
                > > >
                > >
                > > [Non-text portions of this message have been removed]
                > >
                > >
                > >
                >
                > [Non-text portions of this message have been removed]
                >
                >
                >


                [Non-text portions of this message have been removed]
              • Vinu Thomas
                How much data are you looking at searching? If the data is large, PHP and a database backend may not be the right tool for the job. As Jegan mentioned you
                Message 7 of 7 , Sep 19, 2008
                • 0 Attachment
                  How much data are you looking at searching? If the data is large, PHP
                  and a database backend may not be the right tool for the job. As
                  Jegan mentioned you could look at one of Apache's Lucene variants to
                  do the search for you. Here's an article from IBM showing you how to
                  do this using PHP and Apache Solr.
                  http://www.ibm.com/developerworks/library/os-php-apachesolr/index.html

                  Regards,
                  Vinu

                  --- In bang-phpug@yahoogroups.com, "Prakash RR" <iddrrp@...> wrote:
                  >
                  > Dear Jegan, Thats built on Java. I need it in PHP or Perl or Python
                  with
                  > MySQL or Postgres as back end...
                  >
                  > Prakash
                  >
                  > On Fri, Sep 19, 2008 at 10:27 AM, jegan <a.jegan@...> wrote:
                  >
                  > > look at http://lucene.apache.org/nutch/about.html
                  > > a apache open source search engine which u can integrate with ur
                  > > application ...
                  > >
                  > > Jegan
                  > >
                  > >
                  > > On Thu, Sep 18, 2008 at 1:24 PM, Prakash RR
                  <iddrrp@...<iddrrp%40gmail.com>>
                  > > wrote:
                  > >
                  > > > Dear Reader,
                  > > > My MD told me to develop a complete Search Engine for our client
                  > > websites.
                  > > > I have to create this application in PHP or Perl or Python and
                  database
                  > > > should be MySQL or Post gre SQL. I know PHP and MySQL...
                  > > >
                  > > > 1. Data has to be stored in one single location (around 35 websites)
                  > > > 2. Custom search in each client's website which has to get data
                  from that
                  > > > single location
                  > > > 3. Performance should be high
                  > > > 4. No data redundancy
                  > > > 5. Crawler should check websites for at least twice in a month
                  to find
                  > > > new or updated content simultaneously in all websites (threading)
                  > > > 6. Ranking the crawled page
                  > > > 7. Provision to get results from more than 1 website at a time.
                  > > >
                  > > > I am good in PHP and MySQL.
                  > > > I developed a Crawler, URL Normalizer and HTML Parser in PHP.
                  But all are
                  > > > not multi threading. Even i will found some logic to remove
                  duplicate
                  > > > content, to find new updates happened in web pages.
                  > > >
                  > > > I need guidance on the following
                  > > >
                  > > > - Organizing the web pages (cache, if possible)
                  > > > - Ranking web pages
                  > > > - Multi threading all the operations
                  > > > - Performance checking
                  > > > - Do I need to learn Perl or Python to perform some operations
                  > > >
                  > > > Please guide me...
                  > > > Thanks in advance to all.
                  > > >
                  > > > Thanks and Regards,
                  > > > Prakash
                  > > >
                  > > > [Non-text portions of this message have been removed]
                  > > >
                  > > >
                  > > >
                  > >
                  > > [Non-text portions of this message have been removed]
                  > >
                  > >
                  > >
                  >
                  >
                  > [Non-text portions of this message have been removed]
                  >
                Your message has been successfully submitted and would be delivered to recipients shortly.