Loading ...
Sorry, an error occurred while loading the content.
 

Re: [govtrack] Normalizing vote type fields

Expand Messages
  • Eric Mill
    I ve actually been doing some normalization using that field as well, for our Real Time Congress API. It only works for a subset of them, of course, but my
    Message 1 of 4 , Feb 16, 2011
      I've actually been doing some normalization using that field as well,
      for our Real Time Congress API. It only works for a subset of them, of
      course, but my regexes are pretty straightforward:
      https://github.com/sunlightlabs/realtimecongress/blob/master/tasks/utils.rb#L69

      You might want to consider retaining the original textual vote type,
      and adding a new field with the standard vote type, so that a) it's
      backwards compatible, and b) it lets people see the original text and
      come up with their own normalization strategy if they want.

      In RTC, I leave the original text as the "roll_type" field, and then
      the normalized field as "vote_type". The feed of votes contains voice
      votes, as pulled from GovTrack's <vote> tags on bills, whose
      "vote_type" is always set to "passage", and which do not have a
      "roll_type".

      -- Eric

      On Wed, Feb 16, 2011 at 10:20 AM, Josh Tauberer <tauberer@...> wrote:
      > Hey, all.
      >
      > I am thinking of normalizing the "type" field in roll call vote files.
      > Most won't change but I'll correct typos in the upstream data sources
      > and I may revise some (like "Call of the House" to "Quorum Call" where
      > appropriate). I'll also add a category attribute to the element to group
      > similar vote types together (on passage, suspend the rules and pass, on
      > agreeing to the resolution, on the amendment, etc. all labeled e.g.
      > "passage").
      >
      > This is fair warning that I'll be changing that field on existing data
      > real soon now. If that'll cause any problems for you (it really
      > shouldn't), speak now!
      >
      > --
      > - Josh Tauberer
      > - CivicImpulse / GovTrack.us
      >
      > http://razor.occams.info | www.govtrack.us | civicimpulse.com
      >
      > "Members of both sides are reminded not to use guests of the
      > House as props."
      >
      >
      >
      > ------------------------------------
      >
      > Yahoo! Groups Links
      >
      >
      >
      >
    • Josh Tauberer
      The type is also repeated in the element. For whatever reason, the scrapers make the be TYPE + : + BILL/AMENDMENT/OTHER in the House
      Message 2 of 4 , Feb 16, 2011
        The type is also repeated in the <question> element.

        For whatever reason, the scrapers make the <question> be TYPE + " :" +
        BILL/AMENDMENT/OTHER in the House or TYPE + " (" + MATTER + ")" in the
        Senate. At some point the arbitrary colon-versus-parens difference
        should be removed as well.

        Since the <type> is also present in the <question>, I think it makes
        sense to normalize one and leave the other. What I'm intending right now is:

        Normalize the <type> element by:

        1) Removing differences in capitalization and phrasing which would be
        irrelevant to anyone but a parliamentarian. So "On the Amendment" and
        "On agreeing to the amendment" both become "On the Amendment".

        2) Removing part specifications in divided questions and the like. "On
        Agreeing to Article X of the Concurrent Resolution" becomes "On Agreeing
        to the Concurrent Resolution (Part)" and "On Adoption of the fifth
        portion of the divided question" becomes "On Part of the Divided Question".

        Second, I will add a "category" attribute that collapses similar types.
        Right now I have: passage, passage-suspension, passage-part, amendment,
        nomination, procedural, cloture, and other. "passage-part" is for "On
        Agreeing to Article X of the Concurrent Resolution". "other" is so far
        only for divided questions.

        With these categories, when the type is passage, passage-suspension,
        amendment, and cloture, there's nothing informative in the <question>
        element that's not indicated by this category (except for the bill or
        amendment number, which is also indicated in the bill/amendment
        elements), so you can get away with displaying a fixed, user-friendly
        string for each of these categories rather than the original type.

        But it's still open for debate.

        - Josh Tauberer
        - CivicImpulse / GovTrack.us

        http://razor.occams.info | www.govtrack.us | civicimpulse.com

        "Members of both sides are reminded not to use guests of the
        House as props."

        On 02/16/2011 10:29 AM, Eric Mill wrote:
        > I've actually been doing some normalization using that field as well,
        > for our Real Time Congress API. It only works for a subset of them, of
        > course, but my regexes are pretty straightforward:
        > https://github.com/sunlightlabs/realtimecongress/blob/master/tasks/utils.rb#L69
        >
        > You might want to consider retaining the original textual vote type,
        > and adding a new field with the standard vote type, so that a) it's
        > backwards compatible, and b) it lets people see the original text and
        > come up with their own normalization strategy if they want.
        >
        > In RTC, I leave the original text as the "roll_type" field, and then
        > the normalized field as "vote_type". The feed of votes contains voice
        > votes, as pulled from GovTrack's<vote> tags on bills, whose
        > "vote_type" is always set to "passage", and which do not have a
        > "roll_type".
        >
        > -- Eric
        >
        > On Wed, Feb 16, 2011 at 10:20 AM, Josh Tauberer<tauberer@...> wrote:
        >> Hey, all.
        >>
        >> I am thinking of normalizing the "type" field in roll call vote files.
        >> Most won't change but I'll correct typos in the upstream data sources
        >> and I may revise some (like "Call of the House" to "Quorum Call" where
        >> appropriate). I'll also add a category attribute to the element to group
        >> similar vote types together (on passage, suspend the rules and pass, on
        >> agreeing to the resolution, on the amendment, etc. all labeled e.g.
        >> "passage").
        >>
        >> This is fair warning that I'll be changing that field on existing data
        >> real soon now. If that'll cause any problems for you (it really
        >> shouldn't), speak now!
        >>
        >> --
        >> - Josh Tauberer
        >> - CivicImpulse / GovTrack.us
        >>
        >> http://razor.occams.info | www.govtrack.us | civicimpulse.com
        >>
        >> "Members of both sides are reminded not to use guests of the
        >> House as props."
        >>
        >>
        >>
        >> ------------------------------------
        >>
        >> Yahoo! Groups Links
        >>
        >>
        >>
        >>
      • Eric Mill
        That sounds like a good approach. When I was doing this myself, I also contemplated normalizing it into two fields: vote_type, and vote_on. So a vote on an
        Message 3 of 4 , Feb 17, 2011
          That sounds like a good approach. When I was doing this myself, I also
          contemplated normalizing it into two fields: vote_type, and vote_on.
          So a vote on an amendment would have a vote_type of "passage" and a
          vote_on of "amendment". And a vote to start debate on a Supreme Court
          nomination would be a vote_type of "cloture" and a vote_on of
          "nomination".

          I gave up when I realized I didn't have enough information available
          to me in GovTrack's data (or in House/Senate feeds) to establish that
          a vote was a cloture vote on a nomination. Maybe you have more
          information, or a better handle on the problem though?

          -- Eric

          On Wed, Feb 16, 2011 at 6:48 PM, Josh Tauberer <tauberer@...> wrote:
          > The type is also repeated in the <question> element.
          >
          > For whatever reason, the scrapers make the <question> be TYPE + " :" +
          > BILL/AMENDMENT/OTHER in the House or TYPE + " (" + MATTER + ")" in the
          > Senate. At some point the arbitrary colon-versus-parens difference should be
          > removed as well.
          >
          > Since the <type> is also present in the <question>, I think it makes sense
          > to normalize one and leave the other. What I'm intending right now is:
          >
          > Normalize the <type> element by:
          >
          > 1) Removing differences in capitalization and phrasing which would be
          > irrelevant to anyone but a parliamentarian. So "On the Amendment" and "On
          > agreeing to the amendment" both become "On the Amendment".
          >
          > 2) Removing part specifications in divided questions and the like. "On
          > Agreeing to Article X of the Concurrent Resolution" becomes "On Agreeing to
          > the Concurrent Resolution (Part)" and "On Adoption of the fifth portion of
          > the divided question" becomes "On Part of the Divided Question".
          >
          > Second, I will add a "category" attribute that collapses similar types.
          > Right now I have: passage, passage-suspension, passage-part, amendment,
          > nomination, procedural, cloture, and other. "passage-part" is for "On
          > Agreeing to Article X of the Concurrent Resolution". "other" is so far only
          > for divided questions.
          >
          > With these categories, when the type is passage, passage-suspension,
          > amendment, and cloture, there's nothing informative in the <question>
          > element that's not indicated by this category (except for the bill or
          > amendment number, which is also indicated in the bill/amendment elements),
          > so you can get away with displaying a fixed, user-friendly string for each
          > of these categories rather than the original type.
          >
          > But it's still open for debate.
          >
          > - Josh Tauberer
          > - CivicImpulse / GovTrack.us
          >
          > http://razor.occams.info | www.govtrack.us | civicimpulse.com
          >
          > "Members of both sides are reminded not to use guests of the
          > House as props."
          >
          > On 02/16/2011 10:29 AM, Eric Mill wrote:
          >>
          >> I've actually been doing some normalization using that field as well,
          >> for our Real Time Congress API. It only works for a subset of them, of
          >> course, but my regexes are pretty straightforward:
          >>
          >> https://github.com/sunlightlabs/realtimecongress/blob/master/tasks/utils.rb#L69
          >>
          >> You might want to consider retaining the original textual vote type,
          >> and adding a new field with the standard vote type, so that a) it's
          >> backwards compatible, and b) it lets people see the original text and
          >> come up with their own normalization strategy if they want.
          >>
          >> In RTC, I leave the original text as the "roll_type" field, and then
          >> the normalized field as "vote_type". The feed of votes contains voice
          >> votes, as pulled from GovTrack's<vote>  tags on bills, whose
          >> "vote_type" is always set to "passage", and which do not have a
          >> "roll_type".
          >>
          >> -- Eric
          >>
          >> On Wed, Feb 16, 2011 at 10:20 AM, Josh Tauberer<tauberer@...>
          >>  wrote:
          >>>
          >>> Hey, all.
          >>>
          >>> I am thinking of normalizing the "type" field in roll call vote files.
          >>> Most won't change but I'll correct typos in the upstream data sources
          >>> and I may revise some (like "Call of the House" to "Quorum Call" where
          >>> appropriate). I'll also add a category attribute to the element to group
          >>> similar vote types together (on passage, suspend the rules and pass, on
          >>> agreeing to the resolution, on the amendment, etc. all labeled e.g.
          >>> "passage").
          >>>
          >>> This is fair warning that I'll be changing that field on existing data
          >>> real soon now. If that'll cause any problems for you (it really
          >>> shouldn't), speak now!
          >>>
          >>> --
          >>> - Josh Tauberer
          >>> - CivicImpulse / GovTrack.us
          >>>
          >>> http://razor.occams.info | www.govtrack.us | civicimpulse.com
          >>>
          >>> "Members of both sides are reminded not to use guests of the
          >>> House as props."
          >>>
          >>>
          >>>
          >>> ------------------------------------
          >>>
          >>> Yahoo! Groups Links
          >>>
          >>>
          >>>
          >>>
          >
        Your message has been successfully submitted and would be delivered to recipients shortly.