Re: [govtrack] Greetings
- Hey, Ryan. Thanks for introducing yourself on the list.
Michigan seems particularly well organized. Three people working on it
might be overkill. But, it's all good (unless you all want that bounty
for the same state...).
If you need a hand with anything, let me know. Otherwise it sounds like
you'll have it under control.
My general advice has been (and I'll post this on the wiki)
- Put together a list of the legislators, with the starting and
ending dates of their roles in the legislature, and give everyone an ID
- Screen-scrape bill status into XML files (not RDF since we don't
know what the final schemas will be)
- For going from legislator names to IDs, what I've found is:
- You'll always get a last name that you can search in a database
- Sometimes hypenated last names don't always show up the same way,
so sometimes last names have to be mangled a bit.
- Last names will be ambiguous, but the way first and middle names
show up is very variable, so you need to narrow down the possible
matches based on whatever name information is given (initial versus
whole name, middle name, Jr/Sr. suffixes, etc.)
- There may still be ambiguity: Two people with idential names that
served at different times in the legislature. You'll need to eliminate
some of the possible matches based on date information in the document.
e.g. Bill sponsors come from just the set of legislatures serving in
the legislature at the time the bill was introduced. (Well, unless bill
sponsorship can be amended.)
At some point I'll try to package up my Perl routine that does the
name-to-ID lookup, but it's not in a form now that would be useful to share.
- Joshua Tauberer
** Nothing Unreal Exists **
Ryan Rarick wrote:
> Hello,syndicated on
> My name is Ryan. I'm new to the group. I saw the blog post
> MonoLogue and decided to give a go with helping out. I'm currentlyGrand
> Rapids, Michigan and am the QA lead where I work - which happens toIt's
> be a dotNet shop. As a part of my daily work, I write web crawling
> scripts in Perl.
> If you'd like a hand with the Michigan legislature site, let me know.
> one of the most straightforward government sites I've seen in aThey
> put everything together at: http://legislature.michigan.govdifficult
> I took a look around last night for about a half hour to see how
> things would be and Virginia looks like it'd be easily done also.to be
> Otherwise, I could start work on all the other states. :o) - I plan
> around for a while.
> My time is somewhat limited as I have a 2 month old who likes to keep
> me busy. I should have at least a few hours this weekend to start
> Oh, and I hope to use a fusion of C# and Perl in developing my
> scraping apps.