Re: GovTrack Goes Open Source
- Very exciting news!
Currently I'm running my own scrappers, but love the idea of putting my efforts into an open
- I've just begun the learning curve to start building some screen
scrapers of my own.
How many people here are using or building scrapers? And how many
might be interested in sharing experience/work/code?
At 12:20 PM 4/3/2008, you wrote:
>Very exciting news!Neil
>Currently I'm running my own scrappers, but love the idea of putting
>my efforts into an open
Anyhow here's a proof for the philosophy of Solipsism. I stopped
reading the newspaper, and there stopped being news. Even on CNN for
these past two weeks they are doing only reruns! The Democrats'
so-called debates certainly aren't "news." The White House's
stonewalling in the last two weeks is like their stonewalling for the
last two years. Starlets drive drunk and are sent to rehab and then
they drive drunk again.
So, here's my advice, if you don't like what's happening in the news
just twist your lips in a disdainful smile and pay no attention. Now,
at last, you can turn to Spinoza.
-- Thomas Disch
- I've built a few web crawlers/scrapers, in the past in perl but
recently in java. I've found the webharvest project to be a very
useful tool for standalone use or from java.
On Thu, Apr 3, 2008 at 4:32 PM, Neil Rest <NeilRest@...> wrote:
> I've just begun the learning curve to start building some screen
> scrapers of my own.
> How many people here are using or building scrapers? And how many
> might be interested in sharing experience/work/code?
> At 12:20 PM 4/3/2008, you wrote:
> >Very exciting news!
> >Currently I'm running my own scrappers, but love the idea of putting
> >my efforts into an open
> >source scrapper.
- I would second, or third, or fourth that. Putting govt. scrapers in the wild would be something I'd be very interested in contributing to. As for tools, Perl/Mechanize is pretty formidable, but I've really come to love Ruby/Hpricot of late. DOM-based parsing is so slick when it works.On Apr 3, 2008, at 8:05 PM, Gabe Hamilton wrote: