Loading ...
Sorry, an error occurred while loading the content.

Extracting data from a Web-Page

Expand Messages
  • S Stephen
    Hi All I would like to programmatically extract data from a web-page. (The intention is to, say, populate a DB.) Can anyone advise me on what technologies can
    Message 1 of 2 , Apr 1, 2004
    • 0 Attachment
      Hi All

      I would like to programmatically extract data from a web-page. (The
      intention is to, say, populate a DB.)

      Can anyone advise me on what technologies can be used to do this
      reliably and how exactly ?

      TIA

      Steven
    • Chad Martin
      ... The big issue is the format of the data on the page, and how easy it is to get the pages you want. For instance, if the URL contains all the parameters of
      Message 2 of 2 , Apr 1, 2004
      • 0 Attachment
        S Stephen wrote:
        > I would like to programmatically extract data from a web-page. (The
        > intention is to, say, populate a DB.)

        The big issue is the format of the data on the page, and how easy it is
        to get the pages you want. For instance, if the URL contains all the
        parameters of the cgi script that you need, and you can decipher the
        pattern, you can usually write (or use a program to write) a shell
        script to download all the pages using wget. man wget will give you
        some details.

        After that, I usually write a Perl script to parse the downloaded HTML
        files and output tab-delimited text. Perl is great for this kind of
        task because of its rich support for regular expressions.

        After you have all the data in tab-delimited text, it's just a matter of
        importing it into your DB of choice.

        Chad Martin
      Your message has been successfully submitted and would be delivered to recipients shortly.