Loading ...
Sorry, an error occurred while loading the content.
 

Re: [govtrack] Greetings

Expand Messages
  • Joshua Tauberer / GovTrack.us
    Hey, Ryan. Thanks for introducing yourself on the list. Michigan seems particularly well organized. Three people working on it might be overkill. But, it s
    Message 1 of 7 , Jun 11, 2005
      Hey, Ryan. Thanks for introducing yourself on the list.

      Michigan seems particularly well organized. Three people working on it
      might be overkill. But, it's all good (unless you all want that bounty
      for the same state...).

      If you need a hand with anything, let me know. Otherwise it sounds like
      you'll have it under control.

      My general advice has been (and I'll post this on the wiki)
      - Put together a list of the legislators, with the starting and
      ending dates of their roles in the legislature, and give everyone an ID
      - Screen-scrape bill status into XML files (not RDF since we don't
      know what the final schemas will be)
      - For going from legislator names to IDs, what I've found is:
      - You'll always get a last name that you can search in a database
      of legislators
      - Sometimes hypenated last names don't always show up the same way,
      so sometimes last names have to be mangled a bit.
      - Last names will be ambiguous, but the way first and middle names
      show up is very variable, so you need to narrow down the possible
      matches based on whatever name information is given (initial versus
      whole name, middle name, Jr/Sr. suffixes, etc.)
      - There may still be ambiguity: Two people with idential names that
      served at different times in the legislature. You'll need to eliminate
      some of the possible matches based on date information in the document.
      e.g. Bill sponsors come from just the set of legislatures serving in
      the legislature at the time the bill was introduced. (Well, unless bill
      sponsorship can be amended.)

      At some point I'll try to package up my Perl routine that does the
      name-to-ID lookup, but it's not in a form now that would be useful to share.

      --
      - Joshua Tauberer

      http://taubz.for.net

      ** Nothing Unreal Exists **


      Ryan Rarick wrote:
      > Hello,
      >
      > My name is Ryan. I'm new to the group. I saw the blog post
      syndicated on
      > MonoLogue and decided to give a go with helping out. I'm currently
      > in
      Grand
      > Rapids, Michigan and am the QA lead where I work - which happens to
      > be a dotNet shop. As a part of my daily work, I write web crawling
      > scripts in Perl.
      >
      > If you'd like a hand with the Michigan legislature site, let me know.
      >
      It's
      > one of the most straightforward government sites I've seen in a
      > while.
      They
      > put everything together at: http://legislature.michigan.gov
      >
      > I took a look around last night for about a half hour to see how
      difficult
      > things would be and Virginia looks like it'd be easily done also.
      >
      > Otherwise, I could start work on all the other states. :o) - I plan
      to be
      > around for a while.
      >
      > My time is somewhat limited as I have a 2 month old who likes to keep
      > me busy. I should have at least a few hours this weekend to start
      > parsing.
      >
      > Oh, and I hope to use a fusion of C# and Perl in developing my
      > scraping apps.
      >
      > Ryan
    Your message has been successfully submitted and would be delivered to recipients shortly.