Loading ...
Sorry, an error occurred while loading the content.

"FEAR-less Site Scraping", in Perl, at O'Reilly

Expand Messages
  • Neal McBurnett
    Since scraping information is a common need in the govtrack space, when I saw this I thought it might be of interest:
    Message 1 of 2 , Jun 10, 2006
    View Source
    • 0 Attachment
      Since scraping information is a common need in the govtrack space,
      when I saw this I thought it might be of interest:

      http://www.perl.com/pub/a/2006/06/01/fear-api.html
      by Yung-chung Lin
      June 01, 2006

      Thanks for all the wonderful work out there to increase the
      transparency of the government, etc.

      Neal McBurnett http://mcburnett.org/neal/
      Signed and/or sealed mail encouraged. GPG/PGP Keyid: 2C9EBA60
    • Ryan Rarick
      Hmm. Interesting article. Though I feel FEAR::API should probably not be used for this project, at least from my perspective. For example, the author states in
      Message 2 of 2 , Jun 14, 2006
      View Source
      • 0 Attachment

        Hmm.  Interesting article.  Though I feel FEAR::API should probably not be used for this project, at least from my perspective.  For example, the author states in the second paragraph of the documentation, "However, this module violates probably every single rule of any Perl coding standards. Please stop here if you don't want to see the yucky code."

        It does appear to be a noble pursuit by the author to reduce code size, but at the cost of complexity.

        Although, this particular package does have some cool features, such as tabbed content.

        For those of us using Perl for the Screen Scraping, for maintainability reasons, I think we should stick to what's easily readable and maintainable.

        What I like to do is use WWW::Mechanize and HTML::TokeParser and store all the common code in a package which I then reference using objects.  In doing this, I'm able to keep the logical flow of the code separate from the physical flow of the code in a way that I can read in a high level manner which helps me in keeping focused on the intended direction of the script.  So basically, I put the skeleton of the script in the script part and fill in the fleshy details in the package.  That's just my preference though.

        And when I get my Internet Access back at home (maybe by the end of the month - we're in the middle of switching phone companies - still), I'll be able to finish the DB part of my code.

        -Ryan


        From: Neal McBurnett <neal@...>
        Reply-To: govtrack@yahoogroups.com
        To: govtrack@yahoogroups.com
        Subject: [govtrack] "FEAR-less Site Scraping", in Perl, at O'Reilly
        Date: Sat, 10 Jun 2006 18:07:23 -0600 (MDT)

        Since scraping information is a common need in the govtrack space,
        when I saw this I thought it might be of interest:

        http://www.perl. com/pub/a/ 2006/06/01/ fear-api. html
        by Yung-chung Lin
        June 01, 2006

        Thanks for all the wonderful work out there to increase the
        transparency of the government, etc.

        Neal McBurnett http://mcburnett. org/neal/
        Signed and/or sealed mail encouraged. GPG/PGP Keyid: 2C9EBA60


      Your message has been successfully submitted and would be delivered to recipients shortly.