RE: [govtrack] "FEAR-less Site Scraping", in Perl, at O'Reilly
Hmm. Interesting article. Though I feel FEAR::API should probably not be used for this project, at least from my perspective. For example, the author states in the second paragraph of the documentation, "However, this module violates probably every single rule of any Perl coding standards. Please stop here if you don't want to see the yucky code."
It does appear to be a noble pursuit by the author to reduce code size, but at the cost of complexity.
Although, this particular package does have some cool features, such as tabbed content.
For those of us using Perl for the Screen Scraping, for maintainability reasons, I think we should stick to what's easily readable and maintainable.
What I like to do is use WWW::Mechanize and HTML::TokeParser and store all the common code in a package which I then reference using objects. In doing this, I'm able to keep the logical flow of the code separate from the physical flow of the code in a way that I can read in a high level manner which helps me in keeping focused on the intended direction of the script. So basically, I put the skeleton of the script in the script part and fill in the fleshy details in the package. That's just my preference though.
And when I get my Internet Access back at home (maybe by the end of the month - we're in the middle of switching phone companies - still), I'll be able to finish the DB part of my code.
From: Neal McBurnett <neal@...>
Subject: [govtrack] "FEAR-less Site Scraping", in Perl, at O'Reilly
Date: Sat, 10 Jun 2006 18:07:23 -0600 (MDT)
Since scraping information is a common need in the govtrack space,
when I saw this I thought it might be of interest:
http://www.perl. com/pub/a/ 2006/06/01/ fear-api. html
by Yung-chung Lin
June 01, 2006
Thanks for all the wonderful work out there to increase the
transparency of the government, etc.
Neal McBurnett http://mcburnett. org/neal/
Signed and/or sealed mail encouraged. GPG/PGP Keyid: 2C9EBA60