GovTrack Goes Open Source
- GovTrack Wants You! (Picture me instead of Uncle Sam pointing at you.)
Today I am making GovTrack officially totally open source. Now, the
benefits of open source only come if other people help me develop the
site, so I hope my time spent opening up the site isn't for nothing!
Basically there are three components to GovTrack:
The website front-end, i.e. the system that generates the HTML pages of
the site. This is newly open source.
The legislative database, i.e. the XML files. This has been and
continues to be public domain.
The website back-end, which is the collection of screen-scraping Perl
scripts that create and update the legislative database. I previously
posted some of these files publicly, but now I am licensing them under
an open source license. I also will eventually post all of the scripts
publicly. It takes more time to do this because of Perl module hell,
dependencies on some external files, and some API keys in the files.
The front-end and back-end are licensed under the new GNU AGPL license,
which basically means that you cannot modify the files without making
the modifications publicly available. This is intended to prevent
commercial services from gaining any advantage from the source files
that they couldn't already get from using the legislative database directly.
More details are here, including steps for you to get set up to run an
instance of GovTrack on your own computer so you can make modifications
on your own:
The steps have not been tested much, so your mileage may vary. But
comments are of course welcome.
Again, I really hope I can get a hand improving the site!
- Josh Tauberer
"Yields falsehood when preceded by its quotation! Yields
falsehood when preceded by its quotation!" Achilles to
Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
- Very exciting news!
Currently I'm running my own scrappers, but love the idea of putting my efforts into an open
- I've just begun the learning curve to start building some screen
scrapers of my own.
How many people here are using or building scrapers? And how many
might be interested in sharing experience/work/code?
At 12:20 PM 4/3/2008, you wrote:
>Very exciting news!Neil
>Currently I'm running my own scrappers, but love the idea of putting
>my efforts into an open
Anyhow here's a proof for the philosophy of Solipsism. I stopped
reading the newspaper, and there stopped being news. Even on CNN for
these past two weeks they are doing only reruns! The Democrats'
so-called debates certainly aren't "news." The White House's
stonewalling in the last two weeks is like their stonewalling for the
last two years. Starlets drive drunk and are sent to rehab and then
they drive drunk again.
So, here's my advice, if you don't like what's happening in the news
just twist your lips in a disdainful smile and pay no attention. Now,
at last, you can turn to Spinoza.
-- Thomas Disch
- I've built a few web crawlers/scrapers, in the past in perl but
recently in java. I've found the webharvest project to be a very
useful tool for standalone use or from java.
On Thu, Apr 3, 2008 at 4:32 PM, Neil Rest <NeilRest@...> wrote:
> I've just begun the learning curve to start building some screen
> scrapers of my own.
> How many people here are using or building scrapers? And how many
> might be interested in sharing experience/work/code?
> At 12:20 PM 4/3/2008, you wrote:
> >Very exciting news!
> >Currently I'm running my own scrappers, but love the idea of putting
> >my efforts into an open
> >source scrapper.
- I would second, or third, or fourth that. Putting govt. scrapers in the wild would be something I'd be very interested in contributing to. As for tools, Perl/Mechanize is pretty formidable, but I've really come to love Ruby/Hpricot of late. DOM-based parsing is so slick when it works.On Apr 3, 2008, at 8:05 PM, Gabe Hamilton wrote: