Loading ...
Sorry, an error occurred while loading the content.

Re: scraping javascript sites, colorado example

Expand Messages
  • John Labovitz
    ... There are starting to be some good Ruby libraries for screen-scraping, too. There s a simple but good version of WWW::Mechanize (you can find it via the
    Message 1 of 5 , Mar 5 9:21 PM
    • 0 Attachment
      On Mar 3, 2005, at 10:09 PM, Neal McBurnett wrote:

      > Here are some spidering/scraping resources I've stumbled upon via
      > google.

      There are starting to be some good Ruby libraries for screen-scraping,
      too.

      There's a simple but good version of WWW::Mechanize (you can find it
      via the 'gems' Ruby library if you have that installed). And REXML is
      a fantastic XML parsing library, with XPath built in so you don't have
      to do so much procedure stuff as you do with some of the Perl modules.

      This won't help much with the Javascript mess, though. (And yes, I've
      found similar awful cruft in dealing with scraping financial services
      sites. I think it must be output of some middle-ware app that folks
      use to make web sites. I had to deal with one recently that had *no*
      way of navigating via regular HTML; only Javascript links! Truly
      annoying.)

      --
      John Labovitz
      Macintosh support, research, and software development
      John Labovitz Consulting, LLC
      johnl@... | +1 503.949.3492 |
      www.johnlabovitz.com/consulting
    Your message has been successfully submitted and would be delivered to recipients shortly.