FYI - The project below is one day going to replace the current
GovTrack raw data. This is just an early heads-up. I don't plan to
discontinue any of the existing data, but perhaps in six months it
will be considered deprecated (=still operational but not
recommended) in favor of the new data model.
-------- Original Message --------
been working for the last month or two with Josh Tauberer (of GovTrack.us
and Derek Willis on a project to produce a public domain scraper
and dataset from THOMAS.gov
, the official source for
legislative information for the US Congress.
It's a reasonably well documented set of Python scripts, which
you can find here:
We just hit a great milestone - it gets everything important
that THOMAS has on bills, back to the year THOMAS starts (1973).
all of this data in bulk, and I've worked it
into Sunlight's pipeline, so that searches for
bills in Scout
use data collected directly from this
The data and code are all hosted on Github on a "unitedstates
organization, which is right now co-owned by me, Josh, and Derek
- the intent is to have this all exist in a common space. To the
extent that the code needs a license at all, I'm using a public
that should at least be sufficient for the US (other suggestions
There's other great stuff in this organization, too - Josh
made an amazing donation of his legislator
, and converted it to YAML for easy reuse. I've
worked that dataset into Sunlight's products already as well.
I've also moved my legal citation
into this organization -- and my colleague
Thom Neale has an in-progress parser for the
, to convert it from binary typesetting codes
Github's organization structure actually makes possible a
very neat commons. I'm hoping this model proves useful, both
for us and for the public.
You received this message because you are subscribed to the Google
Groups "sunlightlabs" group.
To post to this group, send email to
To unsubscribe from this group, send email to
For more options, visit this group at