Loading ...
Sorry, an error occurred while loading the content.

Re: [govtrack] Normalizing vote type fields

Expand Messages
  • Eric Mill
    That sounds like a good approach. When I was doing this myself, I also contemplated normalizing it into two fields: vote_type, and vote_on. So a vote on an
    Message 1 of 4 , Feb 17, 2011
    • 0 Attachment
      That sounds like a good approach. When I was doing this myself, I also
      contemplated normalizing it into two fields: vote_type, and vote_on.
      So a vote on an amendment would have a vote_type of "passage" and a
      vote_on of "amendment". And a vote to start debate on a Supreme Court
      nomination would be a vote_type of "cloture" and a vote_on of
      "nomination".

      I gave up when I realized I didn't have enough information available
      to me in GovTrack's data (or in House/Senate feeds) to establish that
      a vote was a cloture vote on a nomination. Maybe you have more
      information, or a better handle on the problem though?

      -- Eric

      On Wed, Feb 16, 2011 at 6:48 PM, Josh Tauberer <tauberer@...> wrote:
      > The type is also repeated in the <question> element.
      >
      > For whatever reason, the scrapers make the <question> be TYPE + " :" +
      > BILL/AMENDMENT/OTHER in the House or TYPE + " (" + MATTER + ")" in the
      > Senate. At some point the arbitrary colon-versus-parens difference should be
      > removed as well.
      >
      > Since the <type> is also present in the <question>, I think it makes sense
      > to normalize one and leave the other. What I'm intending right now is:
      >
      > Normalize the <type> element by:
      >
      > 1) Removing differences in capitalization and phrasing which would be
      > irrelevant to anyone but a parliamentarian. So "On the Amendment" and "On
      > agreeing to the amendment" both become "On the Amendment".
      >
      > 2) Removing part specifications in divided questions and the like. "On
      > Agreeing to Article X of the Concurrent Resolution" becomes "On Agreeing to
      > the Concurrent Resolution (Part)" and "On Adoption of the fifth portion of
      > the divided question" becomes "On Part of the Divided Question".
      >
      > Second, I will add a "category" attribute that collapses similar types.
      > Right now I have: passage, passage-suspension, passage-part, amendment,
      > nomination, procedural, cloture, and other. "passage-part" is for "On
      > Agreeing to Article X of the Concurrent Resolution". "other" is so far only
      > for divided questions.
      >
      > With these categories, when the type is passage, passage-suspension,
      > amendment, and cloture, there's nothing informative in the <question>
      > element that's not indicated by this category (except for the bill or
      > amendment number, which is also indicated in the bill/amendment elements),
      > so you can get away with displaying a fixed, user-friendly string for each
      > of these categories rather than the original type.
      >
      > But it's still open for debate.
      >
      > - Josh Tauberer
      > - CivicImpulse / GovTrack.us
      >
      > http://razor.occams.info | www.govtrack.us | civicimpulse.com
      >
      > "Members of both sides are reminded not to use guests of the
      > House as props."
      >
      > On 02/16/2011 10:29 AM, Eric Mill wrote:
      >>
      >> I've actually been doing some normalization using that field as well,
      >> for our Real Time Congress API. It only works for a subset of them, of
      >> course, but my regexes are pretty straightforward:
      >>
      >> https://github.com/sunlightlabs/realtimecongress/blob/master/tasks/utils.rb#L69
      >>
      >> You might want to consider retaining the original textual vote type,
      >> and adding a new field with the standard vote type, so that a) it's
      >> backwards compatible, and b) it lets people see the original text and
      >> come up with their own normalization strategy if they want.
      >>
      >> In RTC, I leave the original text as the "roll_type" field, and then
      >> the normalized field as "vote_type". The feed of votes contains voice
      >> votes, as pulled from GovTrack's<vote>  tags on bills, whose
      >> "vote_type" is always set to "passage", and which do not have a
      >> "roll_type".
      >>
      >> -- Eric
      >>
      >> On Wed, Feb 16, 2011 at 10:20 AM, Josh Tauberer<tauberer@...>
      >>  wrote:
      >>>
      >>> Hey, all.
      >>>
      >>> I am thinking of normalizing the "type" field in roll call vote files.
      >>> Most won't change but I'll correct typos in the upstream data sources
      >>> and I may revise some (like "Call of the House" to "Quorum Call" where
      >>> appropriate). I'll also add a category attribute to the element to group
      >>> similar vote types together (on passage, suspend the rules and pass, on
      >>> agreeing to the resolution, on the amendment, etc. all labeled e.g.
      >>> "passage").
      >>>
      >>> This is fair warning that I'll be changing that field on existing data
      >>> real soon now. If that'll cause any problems for you (it really
      >>> shouldn't), speak now!
      >>>
      >>> --
      >>> - Josh Tauberer
      >>> - CivicImpulse / GovTrack.us
      >>>
      >>> http://razor.occams.info | www.govtrack.us | civicimpulse.com
      >>>
      >>> "Members of both sides are reminded not to use guests of the
      >>> House as props."
      >>>
      >>>
      >>>
      >>> ------------------------------------
      >>>
      >>> Yahoo! Groups Links
      >>>
      >>>
      >>>
      >>>
      >
    Your message has been successfully submitted and would be delivered to recipients shortly.